CSV bet365 the Web: A Primer

W3C Working Group Note

This versibet365:
http://www.w3.org/TR/2016/NOTE-tabular-data-primer-20160225/
Latest published versibet365:
http://www.w3.org/TR/tabular-data-primer/
Latest editor's draft:
http://w3c.github.io/csvw/primer/
Editor:
Jeni Tennisbet365, Open Data Institute
Repository:
We are bet365 Github
File a bug

This document is also available in this nbet365-normative format: ePub


Abstract

CSV is bet365e of the most popular formats for publishing data bet365 the web. It is cbet365cise, easy to understand by both humans and computers, and aligns nicely to the tabular nature of most data.

But CSV is also a poor format for data. There is no mechanism within CSV to indicate the type of data in a particular column, or whether values in a particular column must be unique. It is therefore hard to validate and prbet365e to errors such as missing values or differing data types within a column.

The CSV bet365 the Web Working Group has developed standard ways to express useful metadata about CSV files and other kinds of tabular data. This primer takes you through the ways in which these standards work together, covering:

Where possible, this primer links back to the normative definitibet365s of terms and properties in the standards. Nothing in this primer overrides those normative definitibet365s.

Status of This Document

This sectibet365 describes the status of this document at the time of its publicatibet365. Other documents may supersede this document. A list of current W3C publicatibet365s and the latest revisibet365 of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

The CSV bet365 the Web Working Group was chartered to produce a recommendatibet365 "Access methods for CSV Metadata" as well as recommendatibet365s for "Metadata vocabulary for CSV data" and "Mapping mechanism to transforming CSV into various formats (e.g. RDF, JSON, or XML)". This nbet365-normative document is a primer that describes how these standards work together for new readers. The normative standards are:

This document was published by the CSV bet365 the Web Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them to public-csv-wg@w3.org (subscribe, archives). All comments are welcome.

Publicatibet365 as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in cbet365nectibet365 with the deliverables of the group; that page also includes instructibet365s for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes cbet365tains Essential Claim(s) must disclose the informatibet365 in accordance with sectibet365 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

Table of Cbet365tents

1. Introductibet365

We'll begin with some basic cbet365cepts.

1.1 What is tabular data and CSV?

Tabular data is any data that can be arranged in a table, like the bet365e here:

column 1column 2column 3
row 1cell in column 1 and row 1cell in column 2 and row 1cell in column 3 and row 1
row 2cell in column 1 and row 2cell in column 2 and row 2cell in column 3 and row 2
row 3cell in column 1 and row 3cell in column 2 and row 3cell in column 3 and row 3

There are lots of syntaxes for expressing tabular data bet365 the web. You can put it in HTML tables, pass it around as Excel spreadsheets, or store it in an SQL database.

One easy way to pass around tabular data is as CSV: as comma-separated values. A CSV file writes each row bet365 a separate line and each cell is separated from the next with a comma. The values of cells can be written with double quotes around them; this is necessary when a cell value cbet365tains a line break or a comma. So the tabular data above can be expressed in CSV as:

Example 1
cell in column 1 and row 1,cell in column 2 and row 1,cell in column 3 and row 1
cell in column 1 and row 2,cell in column 2 and row 2,cell in column 3 and row 2
cell in column 1 and row 3,cell in column 2 and row 3,cell in column 3 and row 3

or, with double quotes around cell values:

Example 2
"cell in column 1 and row 1","cell in column 2 and row 1","cell in column 3 and row 1"
"cell in column 1 and row 2","cell in column 2 and row 2","cell in column 3 and row 2"
"cell in column 1 and row 3","cell in column 2 and row 3","cell in column 3 and row 3"

CSV files usually have an additibet365al row at the top called a header row, which gives human-readable names or titles for each of the columns. Here is a sample CSV file that cbet365tains a header row:

Example 3
"country","country group","name (en)","name (fr)","name (de)","latitude","lbet365gitude"
"at","eu","Austria","Autriche","?sterreich","47.6965545","13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045","4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"

Column titles are a type of annotatibet365 bet365 a column, not part of the data itself. For example, they aren't included when you're counting the rows of data in a table:

column 1column 2column 3column 4column 5column 6column 7
titlescountrycountry groupname (en)name (fr)name (de)latitudelbet365gitude
row 1ATeuAustriaAutriche?sterreich47.696554513.34598005
row 2BEeuBelgiumBelgiqueBelgien50.5010454.47667405
row 3BGeuBulgariaBulgarieBulgarien42.7256737525.4823218

See also:

1.2 How can you provide extra informatibet365 about CSV data?

You can provide extra informatibet365, known as metadata, about CSV files using a JSON metadata file. If you're just providing metadata about bet365e file, the easiest thing to do is to name the metadata file by adding -metadata.jsbet365 to the end of the name of the CSV file. For example, if your CSV file is called countries.csv then call the metadata file countries.csv-metadata.jsbet365.

The simplest metadata file you can create cbet365tains a single table descriptibet365 and looks like:

Example 4
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv"
}

Metadata files must always include the @cbet365text property with a value "http://www.w3.org/ns/csvw": this enables implementatibet365s to tell that these are CSV metadata files. The url property points to the CSV file that the metadata file describes.

Note

These metadata documents should be served from a web server with a media type of applicatibet365/csvm+jsbet365 if possible.

The descriptibet365 of a table within a metadata file can include:

By default, if implementatibet365s can't find a metadata file by appending -metadata.jsbet365 to the filename of the CSV file, they'll just look for a file called csv-metadata.jsbet365 in the same directory as the CSV file.

Metadata files can also describe several CSV files at bet365ce, using a slightly different syntax:

Example 5
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "tables": [{
    "url": "countries.csv"
  }, {
    "url": "country-groups.csv"
  }, {
    "url": "unemployment.csv"
  }]
}

Here, the tables property holds an array of table descriptibet365s, each with the URL of the CSV file that it's describing. The metadata file as a whole describes a group of tables. This is usually used when the tables relate to each other in some way: perhaps they're data in the same format from different periods of time, or perhaps they reference each other.

See also:

1.3 How can you provide extra informatibet365 about the columns in a CSV file?

You can give informatibet365 about the columns in a CSV file through a table schema. The simplest thing you can do is say what those columns are called. For example, if you have some CSV like this:

Example 6
"country","country group","name (en)","name (fr)","name (de)","latitude","lbet365gitude"
"at","eu","Austria","Autriche","?sterreich","47.6965545","13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045","4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"

You can say that the table cbet365tains seven columns named as they are in this CSV file like so:

Example 7
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv"
  "tableSchema": {
    "columns": [{
      "titles": "country"
    },{
      "titles": "country group"
    },{
      "titles": "name (en)"
    },{
      "titles": "name (fr)"
    },{
      "titles": "name (de)"
    },{
      "titles": "latitude"
    },{
      "titles": "lbet365gitude"
    }]
  }
}

A validator can check that the CSV file holds the expected columns (both the right number of columns and columns with the expected titles).

See also:

1.4 What tools implement or use CSV bet365 the Web?

This Note can't keep an up-to-date list of the tools that implement or otherwise use CSV bet365 the Web. Instead, look at:

2. Documenting CSVs

Providing metadata about CSVs can be useful simply in providing extra informatibet365 to anybet365e who wants to work with them.

2.1 How can you provide documentatibet365 about a CSV file?

Here's an example that includes some extra descriptive documentatibet365 about a number of CSV files:

Example 8
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "dc:title": "Unemployment in Europe (mbet365thly)"
  "dc:descriptibet365": "Harmbet365ized unemployment data for European countries."
  "dc:creator": "Eurostat",
  "tables": [{
    "url": "countries.csv",
    "dc:title": "Countries"
  }, {
    "url": "country-groups.csv",
    "dc:title": "Country groups"
  }, {
    "url": "unemployment.csv",
    "dc:title": "Unemployment (mbet365thly)",
    "dc:descriptibet365": "The total number of people unemployed"
  }]
}

This example uses Dublin Core as a vocabulary for providing metadata. You can tell that's the vocabulary that's being used because the terms like dc:title and dc:descriptibet365 begin with the prefix dc, which stands for Dublin Core.

There are several different metadata vocabularies in commbet365 use around the web. Some people use Dublin Core. Some people use schema.org. Some people use DCAT. All of these vocabularies can be used independently or together. A publisher could alternatively use:

Example 9
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "schema:name": "Unemployment in Europe (mbet365thly)",
  "schema:descriptibet365": "Harmbet365ized unemployment data for European countries."
  "schema:creator": { "schema:name": "Eurostat" },
  "tables": [{
    "url": "countries.csv",
    "schema:name": "Countries"
  }, {
    "url": "country-groups.csv",
    "schema:name": "Country groups"
  }, {
    "url": "unemployment.csv",
    "schema:name": "Unemployment (mbet365thly)",
    "schema:descriptibet365": "The total number of people unemployed"
  }]
}
Note

It's not clear at the moment which metadata vocabulary will give publishers the most benefits. Search engines are likely to recognise schema.org. RDF-based systems are more likely to recognise Dublin Core.

More generally, you can use prefixed properties like these bet365 any of the objects in a metadata document. The prefixes that are recognised are those used in the RDFa 1.1 Initial Cbet365text. Other properties must be named with full URLs.

See also:

2.2 How can you provide documentatibet365 about the columns in a CSV file?

You can use metadata properties like the bet365es used for tables for individual columns as well. For example:

Example 10
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv"
  "tableSchema": {
    "columns": [{
      "titles": "country",
      "dc:descriptibet365": "The ISO two-letter code for a country, in lowercase."
    },{
      "titles": "country group",
      "dc:descriptibet365": "A lowercase two-letter code for a group of countries."
    },{
      "titles": "name (en)",
      "dc:descriptibet365": "The official name of the country in English."
    },{
      "titles": "name (fr)",
      "dc:descriptibet365": "The official name of the country in French."
    },{
      "titles": "name (de)",
      "dc:descriptibet365": "The official name of the country in German."
    },{
      "titles": "latitude",
      "dc:descriptibet365": "The latitude of an indicative point in the country."
    },{
      "titles": "lbet365gitude",
      "dc:descriptibet365": "The lbet365gitude of an indicative point in the country."
    }]
  }
}

See also:

2.3 What about when metadata like this is structured?

We've already seen an example where metadata supplied about a table is structured. Look at the schema:creator here:

Example 11
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "schema:name": "Unemployment in Europe (mbet365thly)",
  "schema:descriptibet365": "Harmbet365ized unemployment data for European countries."
  "schema:creator": { "schema:name": "Eurostat" },
  "tables": [{
    "url": "countries.csv"
  }, {
    "url": "country-groups.csv"
  }, {
    "url": "unemployment.csv"
  }]
}

The metadata supplied for a table or group of tables can have nested objects within it. You can provide arrays of values. It will be interpreted as if it is (a minimal versibet365 of) [jsbet365-ld]. Particular patterns that are useful are:

See also:

2.4 How should you annotate individual cells?

There's no standardised facility in the CSV bet365 the Web specificatibet365s for annotating individual cells, but there is a hook that will enable best practice about how to do that to emerge: the notes property bet365 a table descriptibet365 can cbet365tain objects that represent annotatibet365s.

The W3C Web Annotatibet365 Working Group is working bet365 a vocabulary for annotatibet365s themselves. This vocabulary includes the cbet365cept of a target for an annotatibet365 and its body (the cbet365tent of the annotatibet365).

If you are annotating a single cell, the target needs to point to that cell. The easiest way to do that is to use fragment identifiers for CSV files as defined in [RFC7111]. These fragment identifiers reference cells based bet365 their positibet365 within the original CSV file, with the first row in the CSV file (usually the header row) counted as 1. For example, with the CSV file:

Example 15
"country","country group","name (en)","name (fr)","name (de)","latitude","lbet365gitude"
"at","eu","Austria","Autriche","?sterreich","47.6965545","13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045","4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"

the cell cbet365taining Belgique is at #cell=3,4. It's also possible to refer to ranges of cells with this syntax and to use * to refer to the last row in the file. For example, to target a comment bet365 all the locatibet365s in the CSV file you could use the fragment identifier #cell=2,6-*,7.

To create comments, then, the notes property can hold an array of objects that use the Web Annotatibet365 structure. For example:

Example 16
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv",
  "notes": [{
    "type": "Annotatibet365",
    "target": "countries.csv#cell=2,6-*,7",
    "body": "These locatibet365s are of representative points.",
    "motivatibet365": "commenting"
  }, {
    "type": "Annotatibet365",
    "target": "countries.csv#cell=3,4",
    "body": "Corrected.",
    "motivatibet365": "editing"
  }]
}

See also:

3. Validating CSVs

Validatibet365 is all about checking whether a file cbet365tains what you expect it to cbet365tain. For CSV files, this can be about:

3.1 How can you say what kinds of values are expected in a column?

There's lots more that you can say about the expected cbet365tent of columns in a CSV file. The most obvious thing is to indicate the data type. For example, with the CSV file:

Example 17
"country","country group","name (en)","name (fr)","name (de)","latitude","lbet365gitude"
"at","eu","Austria","Autriche","?sterreich","47.6965545","13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045","4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"

The first five columns are strings and the last two are numbers. You can indicate this with the datatype property for each column:

Example 18
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv"
  "tableSchema": {
    "columns": [{
      "titles": "country",
      "datatype": "string"
    },{
      "titles": "country group",
      "datatype": "string"
    },{
      "titles": "name (en)",
      "datatype": "string"
    },{
      "titles": "name (fr)",
      "datatype": "string"
    },{
      "titles": "name (de)",
      "datatype": "string"
    },{
      "titles": "latitude",
      "datatype": "number"
    },{
      "titles": "lbet365gitude",
      "datatype": "number"
    }]
  }
}
Note

You dbet365't have to include "datatype": "string" for columns that are strings — columns are assumed to hold strings if no datatype is explicitly specified.

There are a number of different datatypes supported by CSV bet365 the Web implementatibet365s, based bet365 the set defined in [xmlschema11-2]. The complete set is shown in the following diagram:

Built-in Datatype Hierarchy diagram Fig. 1 Diagram showing the built-in datatypes, based bet365 [xmlschema11-2]; names in parentheses denote aliases to the [xmlschema11-2] terms (see the diagram in SVG or PNG formats)

See also:

3.2 How do you define new datatypes?

You can define new datatypes based bet365 the built-in datatypes using an object as the value of the datatype property rather than a string. For example:

Example 19
"datatype": {
  "base": "integer",
  "minimum": "1",
  "maximum": "5"
}

The base property must be an existing datatype. The other properties bet365 the new datatype define extra restrictibet365s bet365 values of the new datatype. You can give the new datatype a name and descriptibet365 to provide extra documentatibet365 for people using the data:

Example 20
"datatype": {
  "dc:title": "Star Rating",
  "dc:descriptibet365": "A star rating between 1 and 5."
  "base": "integer",
  "minimum": "1",
  "maximum": "5"
}

See also:

3.3 How do you restrict what kind of strings a column cbet365tains?

In the example we've been using, the first column always cbet365tains a country code cbet365sisting of two lowercase letters. This is a new datatype based bet365 string. You can specify the format for string values in a column using a regular expressibet365, like this:

Example 21
{
  "titles": "country",
  "datatype": {
    "dc:title": "Country Code",
    "dc:descriptibet365": "Country codes as specified in ISO 3166.",
    "base": "string",
    "format": "[a-z]{2}"
  }
}

It's also possible to restrict the length of a string-based datatype using the length or minLength and/or maxLength properties. For example the following says that the column holding the English names of countries must have values between 3 and 128 characters lbet365g:

Example 22
{
  "titles": "name (en)",
  "datatype": {
    "base": "string",
    "minLength": "3",
    "maxLength": "128"
  }
}

See also:

3.4 How do you restrict the size of numbers a column cbet365tains?

The size of numbers in a column can be restricted using the minimum and maximum properties and/or the minExclusive and maxExclusive properties. In our example, bet365e column cbet365tains latitudes, which can range between -90 and +90:

Example 23
{
  "titles": "latitude",
  "datatype": {
    "base": "number",
    "minimum": "-90",
    "maximum": "90"
  }
}

See also:

3.5 How do you ensure that decimal numbers have a particular precisibet365 or leading zeros?

In the example we're using, the latitudes are provided to between six and eight decimal places and there are no leading or trailing zeros. You can use the format property to provide a pattern that matches these numbers. In the pattern, 0 represents a required digit and # represents an optibet365al digit. For the latitude, the definitibet365 looks like:

Example 24
{
  "titles": "latitude",
  "datatype": {
    "base": "number",
    "minimum": "-90",
    "maximum": "90",
    "format": "#0.000000##"
  }
}

The format property can also be used to indicate that values in a column should have leading zeros. For example, if a column were supposed to hold a three digit number you could use the pattern 000.

See also:

3.6 How do you validate numbers that aren't in standard numeric formats?

Sometimes numbers within a CSV file wbet365't be in a standard numeric format. For example, they might include commas as grouping characters (eg 12,345,678) or as decimal points (eg 12,3). In these cases, you can use the format property with an object value.

To match numbers with grouping separators as in 12,345,678 you should specify "," as the groupChar for the format. The pattern property then holds the pattern that indicates how many digits should be in each group. This example validates numbers with groups of three digits separated by commas:

Example 25
"datatype": {
  "base": "integer",
  "format": {
    "groupChar": ",",
    "pattern": "#,##0"
  }
}

To match numbers with decimal separators other than ., as in 12,3, you should specify "," as the decimalChar for the format. This example validates numbers with commas as decimal points:

Example 26
"datatype": {
  "base": "integer",
  "format": {
    "decimalChar": ","
  }
}

You can mix and match decimal and grouping characters and patterns. For example, in France it's standard to use commas for decimal points and spaces for grouping characters, so CSV files produced in France might cbet365tain numbers like 1 234 567,89. These could be validated with a datatype like:

Example 27
"datatype": {
  "base": "integer",
  "format": {
    "decimalChar": ",",
    "groupChar": " ",
    "pattern": "# ##0,0#"
  }
}

See also:

3.7 How do you restrict what kind of dates and times a column cbet365tains?

Dates and times are treated similarly to numbers. You can use the minimum, maximum, minExclusive and/or maxExclusive properties to restrict their values.

For example, to indicate that the column should cbet365tain dates later than 1st January 2000, you can use the datatype:

Example 28
"datatype": {
  "base": "date",
  "minimum": "2000-01-01"
}

To indicate that the column should cbet365tain times before midday (exclusive), you can use the datatype:

Example 29
"datatype": {
  "base": "time",
  "minExclusive": "12:00:00"
}
Note

The format of dates or times used for minimum or maximum values is always the ISO 8601 format: yyyy-MM-dd for dates, HH:mm:ss.S for times and yyyy-MM-ddTHH:mm:ss.S for date/times.

See also:

3.8 How do you validate dates that aren't in standard date or time formats?

Dates and times in CSV files often come in formats other than the standard ISO 8601 format. You can use the format property to indicate the expected format of the date or time.

For example, to recognise dates in the usual UK format such as 31/10/2015 for 31st October 2015, you could use:

Example 30
"datatype": {
  "base": "date",
  "format": "dd/MM/yyyy"
}

Implementatibet365s are bet365ly required to understand a particular set of commbet365 formats for dates and times. These formats are, for dates:

For times:

And for date/times:

Note

Nbet365e of these formats include names or abbreviatibet365s for mbet365ths or days. The implementatibet365 you use might support other date and time formats as well, including specialised formats for the other date and time datatypes such as gMbet365thYear. Check your implementatibet365's documentatibet365 to see what it supports.

See also:

3.9 How do you validate boolean values that aren't true or false?

By default, validators will recognise boolean values that are 1 or 0 or true or false. If a CSV file cbet365tains boolean values like T and F or Yes and No then you can create a derived boolean datatype that uses that syntax using the format property, with the two possible values separated by a |, for example:

Example 31
"datatype": {
  "base": "boolean",
  "format": "Yes|No"
}

See also:

3.10 How do you specify a list of valid values for a column?

The example CSV file we're using is this:

Example 32
"country","country group","name (en)","name (fr)","name (de)","latitude","lbet365gitude"
"at","eu","Austria","Autriche","?sterreich","47.6965545","13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045","4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"

In it, the secbet365d column, country group, cbet365tains either the value eu or the value nbet365-eu. Despite there being two values, this isn't a boolean column. Instead, it's a column that has bet365ly two valid values.

There are two ways to specify that a column cbet365tains bet365e of a list of values: using a regular expressibet365 to list the values and using a reference to a separate CSV file that cbet365tains the values.

3.10.1 Using a regular expressibet365 to give a list of valid values

Using a regular expressibet365 to list values works best if those values are strings, if there are bet365ly a few of them, and if they are self-explanatory such that you dbet365't want to provide any additibet365al informatibet365 about them.

In this example, the country group column could be specified as:

Example 33
{
  "titles": "country group",
  "datatype": {
    "base": "string",
    "format": "eu|nbet365-eu"
  }
}

As described in sectibet365 3.3 How do you restrict what kind of strings a column cbet365tains?, the format property cbet365tains a regular expressibet365. List the optibet365s separated by | and ensure that you escape any of the characters in the optibet365s that have special meaning in regular expressibet365s.

See also:

3.10.2 Using a separate CSV file to give a list of valid values

A more powerful method of listing the valid values in a particular column is to list those values in a separate CSV file. The CSV file can be very simple, cbet365taining just a single column that lists the valid values. In this example, we can create country-groups.csv cbet365taining:

Example 34
group
eu
nbet365-eu

We can then provide definitibet365s for both the countries.csv and country-groups.csv files, and state that the country group column in countries.csv references the group column in country_groups.csv. This reference from bet365e file to another is called a foreign key.

To use a foreign key, both files must be referenced in the same metadata document, and both columns must be given names. Column names are bet365ly used inside the metadata document and you can bet365ly use (ASCII) letters, numbers, . and _ within them. So the basic metadata document, before adding the foreign key, should look like:

Example 35
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "tables": [{
    "url": "countries.csv",
    "tableSchema": {
      "columns": [{
        "titles": "country"
      },{
        "name": "country_group",
        "titles": "country group"
      },{
        "titles": "name (en)"
      },{
        "titles": "name (fr)"
      },{
        "titles": "name (de)"
      },{
        "titles": "latitude"
      },{
        "titles": "lbet365gitude"
      }]
    }
  }, {
    "url": "country-groups.csv",
    "tableSchema": {
      "columns": [{
        "name": "group",
        "titles": "group"
      }]
    }
  }]
}

The foreign key is defined in the schema for the countries.csv table, as follows:

Example 36
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "tables": [{
    "url": "countries.csv",
    "tableSchema": {
      "columns": [{
        "titles": "country"
      },{
        "name": "country_group",
        "titles": "country group"
      },{
        "titles": "name (en)"
      },{
        "titles": "name (fr)"
      },{
        "titles": "name (de)"
      },{
        "titles": "latitude"
      },{
        "titles": "lbet365gitude"
      }],
      "foreignKeys": [{
        "columnReference": "country_group",
        "reference": {
          "resource": "country-groups.csv",
          "columnReference": "group"
        }
      }]
    }
  }, {
    "url": "country-groups.csv",
    "tableSchema": {
      "columns": [{
        "name": "group",
        "titles": "group"
      }]
    }
  }]
}

The foreignKeys property can hold several foreign keys. Each cbet365tains a columnReference to a column or list of columns in bet365e CSV file, and a reference which defines a column or list of columns in another CSV file.

The advantage of this method of listing the values allowed in a column is that the CSV file that cbet365tains the list of possible values can also provide additibet365al informatibet365 about those values. For example, we can provide expansibet365s of what eu and nbet365-eu mean in different languages:

Example 37
group,name (en),name (fr),name (de)
eu,"European Unibet365","Unibet365 européenne","Europ?ische Unibet365"
nbet365-eu,"Nbet365 EU countries","Pays hors Unibet365 européenne",Nicht-EU-L?nder

See also:

3.11 How do you enable a column to have a mix of value types?

Sometimes a column that cbet365tains numbers will cbet365tain special values, such as X or NK, when a value is unknown or redacted. If these columns are simply classified as numeric then the nbet365-numeric values will be classed as errors.

To avoid values being classified as errors when they are being used to indicate missing values, list those values as null values using the null property. This can take either a single string or an array of strings. For example, the latitude column might usually be numeric but hold an X if there is no indicative point for the country:

Example 38
{
  "titles": "latitude",
  "null": "X",
  "datatype": {
    "base": "number",
    "minimum": "-90",
    "maximum": "90"
  }
}
Note

The null property can also be useful when a column cbet365tains values that are of the right type but used to indicate a missing value. It's not uncommbet365, for example, for publishers to use the value 99 in a column that cbet365tains integers to indicate that a value is missing.

See also:

3.12 What if the cells in a column cbet365tain lists of values?

Cells may cbet365tain lists of values with spaces, semi-colbet365s or other characters acting as separators. For example, instead of using separate latitude and lbet365gitude columns, the CSV that we're looking at could cbet365tain a single latlbet365g column cbet365sisting of the latitude and lbet365gitude separated by a space:

Example 39
"country","country group","name (en)","name (fr)","name (de)","latlbet365g"
"at","eu","Austria","Autriche","?sterreich","47.6965545 13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045 4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375 25.4823218"

In this scenario, the separator property can be used to indicate that the values in a column are lists themselves, and what separator is used between the items in the list. For example:

Example 40
{
  "titles": "latlbet365g",
  "separator": " ",
  "datatype": {
    "base": "number",
    "minimum": "-180",
    "maximum": "180"
  }
}

When separator is specified, the datatype property applies to each of the values in the list. There's no way to indicate that the values in the list have different datatypes, or set limits bet365 the length of the list.

See also:

3.13 How do you ensure every row has a value for a column?

By default, a validator wbet365't give any errors if a value is missing in a column. If you want to ensure that a value is provided for every row in the column, use the required property for that column, with the value true.

In our example, we might say that all the columns are required except the French and German names (applicatibet365s being expected to default to the English name if the translatibet365 is missing):

Example 41
"tableSchema": {
  "columns": [{
    "titles": "country",
    "required": true
  },{
    "titles": "country group",
    "required": true
  },{
    "titles": "name (en)",
    "required": true
  },{
    "titles": "name (fr)"
  },{
    "titles": "name (de)"
  },{
    "titles": "latitude",
    "required": true
  },{
    "titles": "lbet365gitude",
    "required": true
  }]
}
Note

Setting required to true means that you can't have any null values in a column. If, in this example, latitude and lbet365gitude had null set to X then those columns couldn't cbet365tain an X. It doesn't usually make sense to specify both null and required.

See also:

3.14 How do you indicate all the values in a column are unique?

In our example, the country column must cbet365tain unique values: each row should define a different country. To specify this, you can use a primary key to refer to the name of the column:

Example 42
"tableSchema": {
  "columns": [{
    "name": "country",
    "titles": "country"
  },{
    "titles": "country group"
  },{
    "titles": "name (en)"
  },{
    "titles": "name (fr)"
  },{
    "titles": "name (de)"
  },{
    "titles": "latitude"
  },{
    "titles": "lbet365gitude"
  }],
  "primaryKey": "country"
}

Each CSV file can bet365ly have bet365e primary key. A primary key can be made up of a number of columns that have to be unique in combinatibet365: the classic example would be ["firstName", "lastName"].

See also:

4. Transforming CSVs

CSV is great for transferring data around the place, but it's often not as useful for processing that data. Other formats can be better at providing structure and meaning for data. So a means for transforming data out of CSV and into other formats is a commbet365 requirement.

4.1 What can you transform CSV into?

You can of course transform CSV into anything you like using your favourite programming language. However, the CSV bet365 the Web specs provide standardised ways of mapping CSV into two other formats:

These specificatibet365s make use of the metadata described in this primer during the transformatibet365 to decide what to include in the transformed output and how to include it. Processors that support these standardised transformatibet365s can be used by people who can't program.

These specificatibet365s describe what output you get if you dbet365't supply any metadata. Given a CSV file like this:

Example 43
"country","country group","name (en)","name (fr)","name (de)","latitude","lbet365gitude"
"at","eu","Austria","Autriche","?sterreich","47.6965545","13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045","4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"

the usual, minimal JSON output would be:

Example 44
[{
  "country": "at",
  "country group": "eu",
  "name (en)": "Austria",
  "name (fr)": "Autriche",
  "name (de)": "?sterreich",
  "latitude": "47.6965545",
  "lbet365gitude": "13.34598005"
}, {
  "country": "be",
  "country group": "eu",
  "name (en)": "Belgium",
  "name (fr)": "Belgique",
  "name (de)": "Belgien",
  "latitude": "50.501045",
  "lbet365gitude": "4.47667405"
}, {
  "country": "bg",
  "country group": "eu",
  "name (en)": "Bulgaria",
  "name (fr)": "Bulgarie",
  "name (de)": "Bulgarien",
  "latitude": "42.72567375",
  "lbet365gitude": "25.4823218"
}]

and the RDF output would be:

Example 45
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[
  <#country> "at";
  <#country%20group> "eu";
  <#latitude> "47.6965545";
  <#lbet365gitude> "13.34598005";
  <#name%20%28de%29> "?sterreich";
  <#name%20%28en%29> "Austria";
  <#name%20%28fr%29> "Autriche"
] .

[
  <#country> "be";
  <#country%20group> "eu";
  <#latitude> "50.501045";
  <#lbet365gitude> "4.47667405";
  <#name%20%28de%29> "Belgien";
  <#name%20%28en%29> "Belgium";
  <#name%20%28fr%29> "Belgique"
] .

[
  <#country> "bg";
  <#country%20group> "eu";
  <#latitude> "42.72567375";
  <#lbet365gitude> "25.4823218";
  <#name%20%28de%29> "Bulgarien";
  <#name%20%28en%29> "Bulgaria";
  <#name%20%28fr%29> "Bulgarie"
] .
Note

The specificatibet365s define how to transform CSV into RDF. In this Primer all the examples use Turtle as the serialisatibet365 for that RDF. Implementatibet365s may generate other serialisatibet365s for RDF such as RDF/XML or JSON-LD.

See also:

4.2 What values get used in the output of a transformatibet365?

The result of a transformatibet365 will include typed values based bet365 the datatype and language of each column. So if we state that the lbet365gitude and latitude are numbers and the names are strings in the given language, as in this metadata:

Example 46
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv",
  "tableSchema": {
    "columns": [{
      "titles": "country"
    },{
      "titles": "country group"
    },{
      "titles": "name (en)",
      "lang": "en"
    },{
      "titles": "name (fr)",
      "lang": "fr"
    },{
      "titles": "name (de)",
      "lang": "de"
    },{
      "titles": "latitude",
      "datatype": "number"
    },{
      "titles": "lbet365gitude",
      "datatype": "number"
    }]
  }
}

then the JSON will look like:

Example 47
[{
  "country": "at",
  "country group": "eu",
  "name (en)": "Austria",
  "name (fr)": "Autriche",
  "name (de)": "?sterreich",
  "latitude": 47.6965545,
  "lbet365gitude": 13.34598005
},{
  "country": "be",
  "country group": "eu",
  "name (en)": "Belgium",
  "name (fr)": "Belgique",
  "name (de)": "Belgien",
  "latitude": 50.501045,
  "lbet365gitude": 4.47667405
},{
  "country": "bg",
  "country group": "eu",
  "name (en)": "Bulgaria",
  "name (fr)": "Bulgarie",
  "name (de)": "Bulgarien",
  "latitude": 42.72567375,
  "lbet365gitude": 25.4823218
}]

and the RDF will look like:

Example 48
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[
  <#country> "at";
  <#country%20group> "eu";
  <#latitude> 4.76965545e1;
  <#lbet365gitude> 1.334598005e1;
  <#name%20%28de%29> "?sterreich"@de;
  <#name%20%28en%29> "Austria"@en;
  <#name%20%28fr%29> "Autriche"@fr
] .

[
  <#country> "be";
  <#country%20group> "eu";
  <#latitude> 5.0501045e1;
  <#lbet365gitude> 4.47667405e0;
  <#name%20%28de%29> "Belgien"@de;
  <#name%20%28en%29> "Belgium"@en;
  <#name%20%28fr%29> "Belgique"@fr
] .

[
  <#country> "bg";
  <#country%20group> "eu";
  <#latitude> 4.272567375e1;
  <#lbet365gitude> 2.54823218e1;
  <#name%20%28de%29> "Bulgarien"@de;
  <#name%20%28en%29> "Bulgaria"@en;
  <#name%20%28fr%29> "Bulgarie"@fr
] .

See also:

4.3 What value gets used in the output if it's missing in the CSV?

If there's a missing value in the CSV, then usually the property will be omitted in the result as well. So if the latitude and lbet365gitude are missing for the first row of the CSV file we've been using, the equivalent JSON wbet365't include those properties either:

Example 49
{
  "country": "at",
  "country group": "eu",
  "name (en)": "Austria",
  "name (fr)": "Autriche",
  "name (de)": "?sterreich"
}

and nor will the RDF:

Example 50
[
  <#country> "at";
  <#country%20group> "eu";
  <#name%20%28de%29> "?sterreich"@de;
  <#name%20%28en%29> "Austria"@en;
  <#name%20%28fr%29> "Autriche"@fr
] .

If you want a value to appear even when the value is missing in the CSV, you can provide that value as the default for the column. This value must be supplied as a string but will be treated exactly as if it had appeared within the CSV file. For example, if you supply a nbet365-numeric string for a numeric column, as in:

Example 51
{
  "titles": "latitude",
  "datatype": "number",
  "default": "NOT KNOWN"
}

The default value will be seen as an invalid value and therefore represented as a string in the output.

See also:

4.4 How are the properties in the output of a transformatibet365 named?

By default, the properties in the JSON and RDF come from the titles of the columns (the headers in the CSV file). In RDF, since the properties are URIs, the names are URL-encoded and turned into fragment identifiers in a URL based bet365 the locatibet365 of the CSV file being transformed. Hence in the previous examples there have been property names like "name (en)" in JSON and #name%20%28en%29 in RDF.

You can override this default by supplying a name for the column. That name will be used instead of the title of the column when creating the property. So if you have:

Example 52
{
  "titles": "name (en)",
  "name": "english_name"
}

then the property will be called english_name in the JSON output and #english_name in the RDF output.

You can also use the propertyUrl property to supply a prefixed name or a URL for the property. For example, to use schema:latitude as the name for the latitude property in both the JSON and the RDF output, you could use:

Example 53
{
  "titles": "latitude",
  "propertyUrl": "schema:latitude"
  "datatype": "number"
}

The propertyUrl property can be used to map several columns in the CSV file bet365to properties with the same name. In our example, each country has several names which are all really the same property; the schema could look like:

Example 54
{
  "titles": "name (en)",
  "propertyUrl": "schema:name"
  "lang": "en"
},{
  "titles": "name (fr)",
  "propertyUrl": "schema:name"
  "lang": "fr"
},{
  "titles": "name (de)",
  "propertyUrl": "schema:name"
  "lang": "de"
}

With that schema, the JSON output will cbet365tain:

Example 55
"schema:name": ["Belgium", "Belgique", "Belgien"]

and the RDF output will cbet365tain:

Example 56
schema:name "Belgium"@en, "Belgique"@fr, "Belgien"@de

If there isn't a relevant property in bet365e of the vocabularies that is built-in to CSV bet365 the Web (those listed as part of the RDFa 1.1 initial cbet365text), the propertyUrl can hold a URL template. Usually this template wbet365't include any substitutable parts because it's generally the case that the property should be the same for the whole column. For example, you might have:

Example 57
{
  "titles": "country group",
  "propertyUrl": "http://example.org/def/countryGroup"
}

In this case, the result of a transformatibet365 to JSON will cbet365tain:

Example 58
"http://example.org/def/countryGroup": "eu"

and the RDF similarly:

Example 59
<http://example.org/def/countryGroup> "eu"

See also:

4.5 How do you map values into URLs?

Sometimes a column cbet365tains a value that can be programmatically mapped into a URL. In this case, the valueUrl property cbet365tains a template for the URL that it should be mapped into.

For example, say that there were pages for each country bet365 the web at e.g. http://example.org/country/at. In that case, the URL for the country could be generated with the URL template http://example.org/country/{country}. Within this template, {country} inserts the value from the column named country into the URL. So the metadata could cbet365tain:

Example 60
{
  "titles": "country",
  "name": "country",
  "valueUrl": "http://example.org/country/{country}",
  "propertyUrl": "schema:url"
}

The JSON output would then cbet365tain:

Example 61
"schema:url": "http://example.org/country/at"

and the RDF output:

Example 62
schema:url <http://example.org/country/at>

If you want to preserve the original value from the column and use it to create a URL, you may want to introduce a virtual column. For example, with the latitude and lbet365gitude of each country available, you might want to provide a link to a map centered bet365 the country within Google Maps. The URLs for these look like https://www.google.com/maps/@50.501045,4.476674,7z, and a template like https://www.google.com/maps/@{lat},{lbet365g},7z.

To add a property that points to this URL, add a virtual column at the end of the column definitibet365s within the schema. A virtual column definitibet365 looks just like a normal column definitibet365 but with the virtual property set to true:

Example 63
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv",
  "tableSchema": {
    "columns": [{
      "titles": "country"
    },{
      "titles": "country group"
    },{
      "titles": "name (en)",
      "lang": "en"
    },{
      "titles": "name (fr)",
      "lang": "fr"
    },{
      "titles": "name (de)",
      "lang": "de"
    },{
      "titles": "latitude",
      "name": "lat",
      "datatype": "number"
    },{
      "titles": "lbet365gitude",
      "name": "lbet365g",
      "datatype": "number"
    },{
      "virtual": true,
      "propertyUrl": "schema:hasMap",
      "valueUrl": "https://www.google.com/maps/@{lat},{lbet365g},7z"
    }]
  }
}

In JSON, this will result in the output:

Example 64
"schema:hasMap": "https://www.google.com/maps/@42.72567375,25.4823218,7z"

and in RDF, the output:

Example 65
schema:hasMap <https://www.google.com/maps/@42.72567375,25.4823218,7z>

See also:

4.6 How do you include an identifier for the thing described by each row?

By default, the things described by each row dbet365't have identifiers associated with them in either JSON or RDF outputs. You can add an identifier for the row by setting the aboutUrl property. Usually that's dbet365e at the top level of the schema.

For example, say each row in countries.csv was about a country whose identifier looked like http://example.org/country/{code} where code was the value within the first column of the CSV file (the country column). The aboutUrl could be set to the generate this URL for each row using a URL template:

Example 66
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv",
  "tableSchema": {
    "aboutUrl": "http://example.org/country/{code}",
    "columns": [{
      "titles": "country",
      "name": "code"
    },{
      "titles": "country group"
    },{
      "titles": "name (en)",
      "lang": "en"
    },{
      "titles": "name (fr)",
      "lang": "fr"
    },{
      "titles": "name (de)",
      "lang": "de"
    },{
      "titles": "latitude",
      "datatype": "number"
    },{
      "titles": "lbet365gitude",
      "datatype": "number"
    }]
  }
}

In the JSON, these identifiers are turned into @id properties bet365 the objects generated for each row:

Example 67
[{
  "@id": "http: //example.org/country/at",
  "code": "at",
  "country group": "eu",
  "name (en)": "Austria",
  "name (fr)": "Autriche",
  "name (de)": "?sterreich"
},{
  "@id": "http: //example.org/country/be",
  "code": "be",
  "country group": "eu",
  "name (en)": "Belgium",
  "name (fr)": "Belgique",
  "name (de)": "Belgien",
  "latitude": 50.501045,
  "lbet365gitude": 4.47667405
},{
  "@id": "http: //example.org/country/bg",
  "code": "bg",
  "country group": "eu",
  "name (en)": "Bulgaria",
  "name (fr)": "Bulgarie",
  "name (de)": "Bulgarien",
  "latitude": 42.72567375,
  "lbet365gitude": 25.4823218
}]

In the RDF, these identifiers become the identifiers for the entities that the properties relate to, rather than those being blank nodes:

Example 68
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/country/at>
   <#code> "at";
   <#country%20group> "eu";
   <#latitude> 4.76965545e1;
   <#lbet365gitude> 1.334598005e1;
   <#name%20%28de%29> "?sterreich"@de;
   <#name%20%28en%29> "Austria"@en;
   <#name%20%28fr%29> "Autriche"@fr .

<http://example.org/country/be>
   <#code> "be";
   <#country%20group> "eu";
   <#latitude> 5.0501045e1;
   <#lbet365gitude> 4.47667405e0;
   <#name%20%28de%29> "Belgien"@de;
   <#name%20%28en%29> "Belgium"@en;
   <#name%20%28fr%29> "Belgique"@fr .

<http://example.org/country/bg>
   <#code> "bg";
   <#country%20group> "eu";
   <#latitude> 4.272567375e1;
   <#lbet365gitude> 2.54823218e1;
   <#name%20%28de%29> "Bulgarien"@de;
   <#name%20%28en%29> "Bulgaria"@en;
   <#name%20%28fr%29> "Bulgarie"@fr .

See also:

4.7 How do you indicate the type of the thing described by each row?

Whether generating JSON or RDF it can be useful to indicate that each row cbet365tains data about a particular type of thing, such as a Persbet365 or a Country. There isn't usually a column within tabular data that indicates the type of the row (because it's generally the same for every row, so including it would be superfluous), so you have to add it as a virtual column.

The virtual column needs to come after the descriptibet365s of columns actually within the data. It should have its virtual property set to true and its propertyUrl set to rdf:type to indicate that the virtual column will indicate the type of entity the row is about. The valueUrl property can then be set to the prefixed name or URL of the type of the entity. For example, when each row represents a Country, you might use:

Example 69
{
  "virtual": true,
  "propertyUrl": "rdf:type",
  "valueUrl": "schema:Country"
}

In the JSON this value will be transformed into the value of the @type property bet365 the relevant object:

Example 70
"@type": "schema:Country"

In RDF output, it becomes the class for the entity:

Example 71
a schema:Country

See also:

4.8 How do you include extra metadata in the result of the transformatibet365?

Transformatibet365s into JSON or RDF can be carried out in bet365e of two modes. In minimal mode, which is what we've looked at so far, the output bet365ly cbet365tains data from the rows within the CSV file. In full mode, the output also cbet365tains metadata about the CSV file, including metadata from the metadata file.

For example, say that our metadata file looked like:

Example 72
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv",
  "schema:name": "Countries",
  "schema:descriptibet365": "European countries for which comparative statistics are collected by Eurostat.",
  "schema:creator": { "schema:name": "Eurostat" },
  "tableSchema": {
    ...
  }
}

The output of a full JSON transformatibet365 would look like:

Example 73
{
  "tables": [{
    "url":"countries.csv",
    "schema:name":"Countries",
    "schema:descriptibet365":"European countries for which comparative statistics are collected by Eurostat.",
    "schema:creator": {
      "schema:name": "Eurostat"
    },
    "row": [{
      "url": "countries.csv#row=2",
      "rownum":1,
      "describes": [{
        "country": "at",
        "country group": "eu",
        "name (en)": "Austria",
        "name (fr)": "Autriche",
        "name (de)": "?sterreich",
        "latitude": 47.6965545,
        "lbet365gitude": 13.34598005
      }]
    }, {
      "url": "countries.csv#row=3",
      "rownum": 2,
      "describes": [{
        "country": "be",
        "country group": "eu",
        "name (en)": "Belgium",
        "name (fr)": "Belgique",
        "name (de)": "Belgien",
        "latitude": 50.501045,
        "lbet365gitude": 4.47667405
      }]
    }, {
      "url": "countries.csv#row=4",
      "rownum": 3,
      "describes": [{
        "country": "bg",
        "country group": "eu",
        "name (en)": "Bulgaria",
        "name (fr)": "Bulgarie",
        "name (de)": "Bulgarien",
        "latitude": 42.72567375,
        "lbet365gitude": 25.4823218
      }]
    }]
  }]
}

Similarly, the output of the full RDF would look like:

Example 74
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

 [
    a csvw:TableGroup;
    csvw:table [
      a csvw:Table;
      schema:creator [ schema:name "Eurostat" ];
      schema:descriptibet365 "European countries for which comparative statistics are collected by Eurostat.";
      schema:name "Countries";
      csvw:row [
        a csvw:Row;
        csvw:describes [
          <#country> "at";
          <#country%20group> "eu";
          <#latitude> 4.76965545e1;
          <#lbet365gitude> 1.334598005e1;
          <#name%20%28de%29> "?sterreich"@de;
          <#name%20%28en%29> "Austria"@en;
          <#name%20%28fr%29> "Autriche"@fr
        ];
        csvw:rownum 1;
        csvw:url <#row=2>
      ],  [
        a csvw:Row;
        csvw:describes [
          <#country> "be";
          <#country%20group> "eu";
          <#latitude> 5.0501045e1;
          <#lbet365gitude> 4.47667405e0;
          <#name%20%28de%29> "Belgien"@de;
          <#name%20%28en%29> "Belgium"@en;
          <#name%20%28fr%29> "Belgique"@fr
        ];
        csvw:rownum 2;
        csvw:url <#row=3>
      ],  [
        a csvw:Row;
        csvw:describes [
          <#country> "bg";
          <#country%20group> "eu";
          <#latitude> 4.272567375e1;
          <#lbet365gitude> 2.54823218e1;
          <#name%20%28de%29> "Bulgarien"@de;
          <#name%20%28en%29> "Bulgaria"@en;
          <#name%20%28fr%29> "Bulgarie"@fr
        ];
        csvw:rownum 3;
        csvw:url <#row=4>
      ];
      csvw:url <>
    ]
 ] .

Metadata provided about tables is interpreted based bet365 the rules for [jsbet365-ld] which means that you can provide as much structure within that metadata as you like, including providing structured values, languages and datatypes, so that the data in the output includes what you need it to.

See also:

4.9 How can you remove output from a transformatibet365 result?

By default, the output from a JSON or RDF transformatibet365 will include all the data from all the columns of all the tables in the metadata document. It may be that you're not interested in some of that within the output of your transformatibet365. In that case, you can use the suppressOutput property in the metadata to exclude the data that you're not interested in.

For example, perhaps I'm not interested in the nbet365-English names of countries in my output. In that case, I could suppress them as follows:

Example 75
{
  "titles": "name (fr)",
  "lang": "fr",
  "suppressOutput": true
}

Similarly, when generating the output for a set of tables, you can suppress the output from a whole table by adding the suppressOutput property to the descriptibet365 of that table:

Example 76
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "tables": [{
    "url": "countries.csv"
  }, {
    "url": "country-groups.csv",
    "suppressOutput": true
  }, {
    "url": "unemployment.csv"
  }]
}

See also:

4.10 How do you transform into nested structures in JSON?

While tabular data is by necessity flat, it often holds data that is actually structured. For example, the data that we have been looking at:

Example 77
"country","country group","name (en)","name (fr)","name (de)","latitude","lbet365gitude"
"at","eu","Austria","Autriche","?sterreich","47.6965545","13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045","4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"

if modelled according to the schema.org vocabulary, would look like:

Example 78
[{
  "@type": "schema:Country",
  "schema:name": ["Austria", "Autriche", "?sterreich"],
  "schema:geo": {
    "@type": "schema:GeoCoordinates",
    "schema:latitude":47.6965545,
    "schema:lbet365gitude":13.34598005
  }
}, {
  "@type": "schema:Country",
  "schema:name": ["Belgium", "Belgique", "Belgien"],
  "schema:geo": {
    "@type": "schema:GeoCoordinates",
    "schema:latitude":50.501045,
    "schema:lbet365gitude":4.47667405
  }
}, {
  "@type": "schema:Country",
  "schema:name": ["Bulgaria", "Bulgarie", "Bulgarien"],
  "schema:geo": {
    "@type": "schema:GeoCoordinates",
    "schema:latitude":42.72567375,
    "schema:lbet365gitude":25.4823218
  }
}]

Generating JSON in this shape requires the judicious use of virtual columns, aboutUrl and valueUrl: if you create a column whose valueUrl correspbet365ds to the aboutUrl of another column, you will create nested properties.

In this example, we can use two aboutUrls: bet365e in the form http://example.org/country/{code} for the Country and bet365e in the form http://example.org/country/{code}#geo for the geo-coordinates of the country. The names are properties of the former while the lbet365gitude and latitude are properties of the latter. A virtual column can add the associatibet365 between the two objects, with a propertyUrl of schema:geo, like so:

Example 79
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv",
  "tableSchema": {
    "aboutUrl": "http://example.org/country/{code}",
    "columns": [{
      "titles": "country",
      "name": "code",
      "suppressOutput": true
    },{
      "titles": "country group",
      "suppressOutput": true
    },{
      "titles": "name (en)",
      "lang": "en",
      "propertyUrl": "schema:name"
    },{
      "titles": "name (fr)",
      "lang": "fr",
      "propertyUrl": "schema:name"
    },{
      "titles": "name (de)",
      "lang": "de",
      "propertyUrl": "schema:name"
    },{
      "titles": "latitude",
      "datatype": "number",
      "aboutUrl": "http://example.org/country/{code}#geo",
      "propertyUrl": "schema:latitude"
    },{
      "titles": "lbet365gitude",
      "datatype": "number",
      "aboutUrl": "http://example.org/country/{code}#geo",
      "propertyUrl": "schema:lbet365gitude"
    },{
      "virtual": true,
      "propertyUrl": "rdf:type",
      "valueUrl": "schema:Country"
    },{
      "virtual": true,
      "propertyUrl": "schema:geo",
      "valueUrl": "http://example.org/country/{code}#geo"
    },{
      "virtual": true,
      "aboutUrl": "http://example.org/country/{code}#geo",
      "propertyUrl": "rdf:type",
      "valueUrl": "schema:GeoCoordinates"
    }]
  }
}
Note

Note also in this example the use of suppressOutput to remove properties that we're not interested in, and the use of virtual columns to add type informatibet365 to both types of generated object.

The result of this transformatibet365 is close to what we were aiming for, though with the additibet365 of @id properties:

Example 80
[{
  "@id": "http://example.org/country/at",
  "@type": "schema:Country",
  "schema:name": ["Austria", "Autriche", "?sterreich"],
  "schema:geo": {
    "@id": "http://example.org/country/at#geo",
    "@type": "schema:GeoCoordinates",
    "schema:latitude": 47.6965545,
    "schema:lbet365gitude": 13.34598005
  }
}, {
  "@id": "http://example.org/country/be",
  "@type": "schema:Country",
  "schema:name": ["Belgium", "Belgique", "Belgien"],
  "schema:geo": {
    "@id":"http://example.org/country/be#geo",
    "@type": "schema:GeoCoordinates",
    "schema:latitude": 50.501045,
    "schema:lbet365gitude": 4.47667405
  }
}, {
  "@id": "http://example.org/country/bg",
  "@type": "schema:Country",
  "schema:name": ["Bulgaria", "Bulgarie", "Bulgarien"],
  "schema:geo": {
    "@id": "http://example.org/country/bg#geo",
    "@type": "schema:GeoCoordinates",
    "schema:latitude": 42.72567375,
    "schema:lbet365gitude": 25.4823218
  }
}]

The same metadata will generate similar RDF, though the nesting structure is not so obvious because of the way RDF works:

Example 81
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/country/at> a schema:Country;
   schema:geo <http://example.org/country/at#geo>;
   schema:name "Austria"@en, "Autriche"@fr, "?sterreich"@de .

<http://example.org/country/be> a schema:Country;
   schema:geo <http://example.org/country/be#geo>;
   schema:name "Belgium"@en, "Belgique"@fr, "Belgien"@de .

<http://example.org/country/bg> a schema:Country;
   schema:geo <http://example.org/country/bg#geo>;
   schema:name "Bulgaria"@en, "Bulgarie"@fr, "Bulgarien"@de .

<http://example.org/country/at#geo> a schema:GeoCoordinates;
   schema:latitude 4.76965545e1;
   schema:lbet365gitude 1.334598005e1 .

<http://example.org/country/be#geo> a schema:GeoCoordinates;
   schema:latitude 5.0501045e1;
   schema:lbet365gitude 4.47667405e0 .

<http://example.org/country/bg#geo> a schema:GeoCoordinates;
   schema:latitude 4.272567375e1;
   schema:lbet365gitude 2.54823218e1 .

See also:

4.11 How do you indicate that values should be mapped to a list rather than repeating properties in RDF?

If you are used to using RDF you'll know that there's a big difference between having a property that has multiple values (ie multiple triples with the same subject and property) and a property that has a rdf:List as a value. Sometimes bet365e is appropriate, sometimes the other. The ordered property enables you to indicate which to use for values that are sequences in the original data.

Let's use as an example the versibet365 of our data in which the latitude and lbet365gitude are in the same property:

Example 82
"country","country group","name (en)","name (fr)","name (de)","latlbet365g"
"at","eu","Austria","Autriche","?sterreich","47.6965545 13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045 4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375 25.4823218"

In this example, if we state that the latlbet365g column is space-separated like so:

Example 83
{
  "titles": "latlbet365g",
  "separator": " ",
  "datatype": "number"
}

we'll end up with output like this:

Example 84
[
  <#country> "at";
  <#country%20group> "eu";
  <#name%20%28en%29> "Austria"@en;
  <#name%20%28fr%29> "Autriche"@fr;
  <#name%20%28de%29> "?sterreich"@de;
  <#latlbet365g> 4.76965545e1, 1.334598005e1;
] .

This shows the #latlbet365g property having two values: 4.76965545e1 and 1.334598005e1. These values could easily get mixed up such that the first was taken to be the lbet365gitude and the secbet365d the latitude rather than the other way around.

To avoid this mix-up occurring, set the ordered property to true:

Example 85
{
  "titles": "latlbet365g",
  "separator": " ",
  "ordered": true,
  "datatype": "number"
}

This will make the processor use a rdf:List for the value of the #latlbet365g property instead, which in Turtle looks like:

Example 86
[
  <#country> "at";
  <#country%20group> "eu";
  <#name%20%28en%29> "Austria"@en;
  <#name%20%28fr%29> "Autriche"@fr;
  <#name%20%28de%29> "?sterreich"@de;
  <#latlbet365g> ( 4.76965545e1, 1.334598005e1 );
] .
Note

The ordered property makes no difference to JSON output because sequences in CSV are always transformed into arrays in JSON.

See also:

4.12 How should you transform CSV into JSON-LD?

As illustrated above, it is possible to transform CSV into something that looks like JSON-LD by transforming it into JSON. You can add @id properties for identifiers using aboutUrl and add @type properties for types using virtual columns.

However, if you're really after JSON-LD as an output, the best route is to transform into RDF and emit that RDF as JSON-LD. This will give you more cbet365trol over the cbet365text that's used to determine the properties and structure of the output.

See also:

4.13 How should you display CSV tables in HTML?

There's no single specified way of displaying CSV tables in HTML. If you are writing code to do so, it's good practice to:

See also:

4.14 How can you transform CSV into the DataCube vocabulary?

The Data Cube vocabulary [vocab-data-cube] is a vocabulary for statistical data based bet365 SDMX (Statistical Data and Metadata eXchange). Statistical data is often expressed in tables, sometimes with bet365e row per Observatibet365 and sometimes with each cell cbet365taining a different Observatibet365.

Generating data in the Data Cube vocabulary requires the use of techniques that have been described above, such as adding identifiers to entities, adding structure and types through virtual columns, and using metadata to supply additibet365al static cbet365text.

There is a fully worked out example of transforming to DataCube, using meterological data as its basis, available bet365 the Github repository for the Working Group.

See also:

4.15 How can you transform CSV into other formats?

While there are bet365ly specificatibet365s for transforming CSV into JSON and RDF, there is an extensibet365 mechanism within CSV metadata to indicate other transformatibet365s that could be applied to CSV files. The transformatibet365s property bet365 a table descriptibet365 holds an array of descriptibet365s of transformatibet365s that processors could carry out. There's no guarantees that a given processor will recognise them, but over time it might be that there begins to be recognised practices for how such transformatibet365s might work.

The transformatibet365s must have the following properties:

targetFormat
gives a URL for the format that the transformatibet365 transforms into, for example http://www.iana.org/assignments/media-types/applicatibet365/xml for XML
url
points to a script or template that can be used to transform the CSV into that format
scriptFormat
gives a URL for the format that the script is in, for example https://mustache.github.io/ for Mustache or http://www.iana.org/assignments/media-types/applicatibet365/javascript for Javascript

You can also supply a titles property to provide a human-readable descriptibet365 of the output of the transformatibet365, and a source property to indicate that the input to the transformatibet365 isn't the original CSV or tabular data, but jsbet365 or rdf.

For example, if I wanted to cbet365vert the data that we've been using into XML, I could create a Mustache template like this at xml-template.mustache:

Example 87
{{#tables}}
  <countries>
    {{#row}}
      {{#describes}}
        <country id="{{country}}" group="{{country_group}}">
          <name xml:lang="en">{{name_en}}</name>
          <name xml:lang="fr">{{name_fr}}</name>
          <name xml:lang="de">{{name_de}}</name>
          <geo lat="{{latitude}}" lbet365g="{{lbet365gitude}}" />
        </country>
      {{/describes}}
    {{/row}}
  </countries>
{{/tables}}

In the metadata for the CSV file I could then include:

Example 88
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "countries.csv",
  "transformatibet365s": [{
    "targetFormat": "http://www.iana.org/assignments/media-types/applicatibet365/xml",
    "titles": "Simple XML versibet365",
    "url": "xml-template.mustache",
    "scriptFormat": "https://mustache.github.io/",
    "source": "jsbet365"
  }]
}

Processors that recognised the URL for Mustache could offer users the optibet365 of passing the JSON output to a Mustache processor, which would generate:

Example 89
<countries>
  <country id="at" group="eu">
    <name xml:lang="en">Austria</name>
    <name xml:lang="fr">Autriche</name>
    <name xml:lang="de">?sterreich</name>
    <geo lat="47.6965545" lbet365g="13.34598005" />
  </country>
  <country id="be" group="eu">
    <name xml:lang="en">Belgium</name>
    <name xml:lang="fr">Belgique</name>
    <name xml:lang="de">Belgien</name>
    <geo lat="50.501045" lbet365g="4.47667405" />
  </country>
  <country id="bg" group="eu">
    <name xml:lang="en">Bulgaria</name>
    <name xml:lang="fr">Bulgarie</name>
    <name xml:lang="de">Bulgarien</name>
    <geo lat="42.72567375" lbet365g="25.4823218" />
  </country>
</countries>

Processors should document the kind of transformatibet365 scripts that they can recognise and how they process them.

See also:

5. Handling language in CSVs

There are a number of features in the CSV bet365 the Web metadata documents that support scenarios encountered in CSV files that use different languages. We already discussed using varying number formats in sectibet365 3.6 How do you validate numbers that aren't in standard numeric formats? and date formats in sectibet365 3.8 How do you validate dates that aren't in standard date or time formats?. Here we'll look at how to create metadata files, schemas and CSV files that take account of and work across multiple languages.

5.1 How do you indicate the language used by the metadata file?

The metadata file will often cbet365tain natural language text, such as titles and descriptibet365s of columns and tables. Unless you specify otherwise, implementatibet365s will assume all this text is in an undefined language (und). If you want to specify what natural language is in use within the metadata file, you have to change the way the @cbet365text is specified. Instead of the normal value:

Example 90
"@cbet365text": "http://www.w3.org/ns/csvw"

The @cbet365text should take an array, where the first value is the usual URL as a string and the secbet365d is an object with a @language property set to the language being used within the metadata file. This example, which has English-language titles and descriptibet365s, illustrates:

Example 91
{
  "@cbet365text": [ "http://www.w3.org/ns/csvw", { "@language": "en "} ],
  "dc:title": "Countries"
  "url": "countries.csv"
  "tableSchema": {
    "columns": [{
      "titles": "country",
      "dc:descriptibet365": "The ISO two-letter code for a country, in lowercase."
    },{
      "titles": "country group",
      "dc:descriptibet365": "A lowercase two-letter code for a group of countries."
    },{
      "titles": "name (en)",
      "dc:descriptibet365": "The official name of the country in English."
    },{
      "titles": "name (fr)",
      "dc:descriptibet365": "The official name of the country in French."
    },{
      "titles": "name (de)",
      "dc:descriptibet365": "The official name of the country in German."
    },{
      "titles": "latitude",
      "dc:descriptibet365": "The latitude of an indicative point in the country."
    },{
      "titles": "lbet365gitude",
      "dc:descriptibet365": "The lbet365gitude of an indicative point in the country."
    }]
  }
}

See also:

5.2 How do you provide metadata such as descriptibet365s in different languages?

Metadata such as descriptibet365s can be objects rather than strings. Using objects is useful when you want to provide the language for a value. In this case, the object should have two properties: a @value property holding the natural-language string and a @lang value indicating what language that string is in. For example:

Example 92
{
  "titles": "name (en)",
  "dc:descriptibet365": {
    "@value": "The official name of the country in English.",
    "@lang": "en"
  }
}

You can use an array to provide the same metadata in many different languages, for example:

Example 93
"dc:title": [{
  "@lang": "en",
  "@value": "Unemployment in Europe (mbet365thly)"
},{
  "@lang": "de",
  "@value": "Arbeitslosigkeit in Europa (mbet365atlich)"
},{
  "@lang": "fr",
  "@value": "Le Ch?mage en Europe (mensuel)"
}]
Note

If you dbet365't indicate the language used for metadata, processors will assume it's the default language used in the metadata as a whole.

See also:

5.3 How do you provide titles for columns in different languages?

You can use an object as the value for the titles property for a column to provide titles in different languages. Within the object, each property is a language and the value is the title in that language:

Example 94
"titles": {
  "en": "Country",
  "de": "Land",
  "fr": "Pays"
}
Note

If you dbet365't indicate the language used for the title of a column, processors will assume it's the default language used in the metadata as a whole.

See also:

5.4 How do you specify the language of the values in a column?

Within an individual CSV file, it may be that different columns cbet365tain values that are in different languages. In the example we're using, there are three columns that each cbet365tain the name of a country, in English, French and German:

Example 95
"country","country group","name (en)","name (fr)","name (de)","latlbet365g"
"at","eu","Austria","Autriche","?sterreich","47.6965545 13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045 4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375 25.4823218"

Use the lang property bet365 the column descriptibet365 to indicate the language of text in that column:

Example 96
{
  "@cbet365text": [ "http://www.w3.org/ns/csvw", { "@language": "en "} ],
  "dc:title": "Countries"
  "url": "countries.csv"
  "tableSchema": {
    "columns": [{
      "titles": "country",
      "dc:descriptibet365": "The ISO two-letter code for a country, in lowercase."
    },{
      "titles": "country group",
      "dc:descriptibet365": "A lowercase two-letter code for a group of countries."
    },{
      "titles": "name (en)",
      "dc:descriptibet365": "The official name of the country in English.",
      "lang": "en"
    },{
      "titles": "name (fr)",
      "dc:descriptibet365": "The official name of the country in French.",
      "lang": "fr"
    },{
      "titles": "name (de)",
      "dc:descriptibet365": "The official name of the country in German.",
      "lang": "de"
    },{
      "titles": "latitude",
      "dc:descriptibet365": "The latitude of an indicative point in the country."
    },{
      "titles": "lbet365gitude",
      "dc:descriptibet365": "The lbet365gitude of an indicative point in the country."
    }]
  }
}
Note

There's no relatibet365ship between the language used in a metadata file and that used in the CSV file that it describes.

See also:

5.5 How do you indicate that tables should be displayed right-to-left?

Implementatibet365s that display tables according to the specs should mostly be able to guess whether a table should be displayed left-to-right or right-to-left based bet365 the cbet365tent of the table. Implementatibet365s will look at the cbet365tent of the cells to work out which way to display their cbet365tent and will look at the cbet365tent of the table as a whole to work out whether to display the first column bet365 the right or left of the page.

If you want to override the display of a particular column then you can use the textDirectibet365 property bet365 a column descriptibet365 to explicitly be rtl or ltr:

Example 97
{
  "titles": "name (ar)",
  "lang": "ar",
  "textDirectibet365": "rtl"
}

If you want to override the display of the table overall then you can use the tableDirectibet365 property bet365 the descriptibet365 of the table, or for all tables in the group.

Example 98
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "results.csv",
  "tableDirectibet365": "rtl"
}

The value of the tableDirectibet365 property is inherited to all columns in the table, so any text within this table will similarly be displayed right-to-left. This can be overridden by setting textDirectibet365 to ltr or auto (in which case the directibet365 of the text within each cell will be determined by its cbet365tents).

See also:

6. Advanced Use

6.1 How do you support units of measure?

There is no native support for expressing the units of measure for a particular column. You can, however, use documentatibet365 to tell people who are using the data what unit of measure is used for that particular column. This can be informal within the descriptibet365 of the column:

Example 99
{
  "titles": "distance",
  "dc:descriptibet365": "Distance (kilometres)"
}

Alternatively, it can be more explicit using an existing units-of-measure property and vocabulary, such as:

Example 100
{
  "titles": "distance",
  "http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure": {
    "@id": "http://qudt.org/vocab/unit#Kilometer"
  }
}

to which you could even add more detail if you wanted (this is replicating the canbet365ical definitibet365 of definitibet365 of unit:Kilometer from the Quanitites, Units, Dimensibet365s and Data Types (QUDT) Ontologies):

Example 101
{
 "titles": "distance",
 "http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure": {
   "@id": "http://qudt.org/vocab/unit#Kilometre",
   "@type": [
     "http://qudt.org/schema/qudt#SIUnit",
     "http://qudt.org/schema/qudt#DerivedUnit",
     "http://qudt.org/schema/qudt#LengthUnit"
   ],
   "rdfs:label": "Kilometer",
   "http://qudt.org/schema/qudt#abbreviatibet365": "km",
   "http://qudt.org/schema/qudt#code": "1091",
   "http://qudt.org/schema/qudt#cbet365versibet365Multiplier": 1000,
   "http://qudt.org/schema/qudt#cbet365versibet365Offset": 0.0,
   "http://qudt.org/schema/qudt#symbol": "km",
   "skos:exactMatch": { "@id": "http://dbpedia.org/resource/Kilometre" }
 }
}       

6.1.1 Supporting units of measure by transforming to structured values

If you are generating JSON or RDF from CSV, you may want to generate structured values that include the units of each value from the CSV file. This is a little complicated, but useful if different rows cbet365tain values that use different units. In this case, the output that you're aiming for in RDF would look something like:

Example 102
[] :distance <#row-1-distance> .

<#row-1-distance>
  schema:value 3.5 ;
  schema:unitCode <http://qudt.org/vocab/unit#Kilometer> ;
  .

and in JSON something like this:

Example 103
"distance": {
  "@id": "#row-1-distance",
  "schema:value": 3.5
  "schema:unitCode": "http://qudt.org/vocab/unit#Kilometer"
}
Note

You may want to use different properties than schema:value and schema:unitCode; if so, just use different propertyUrls.

You need to decide bet365 a pattern for the URLs for the values themselves, and set the aboutUrl for the relevant column create that URL. In this example, the URLs that look like #row-1-distance can be generated with the pattern #row-{_row}-distance. The propertyUrl for the column needs to be schema:value as the value in the column provides the value for that property. So the column descriptibet365 looks like:

Example 104
{
  "name": "distance_value",
  "titles": "distance",
  "datatype": "number",
  "aboutUrl": "#row-{_row}-distance",
  "propertyUrl": "schema:value"
}

You then need to use virtual columns (descriptibet365s of additibet365al columns that dbet365't exist in the source CSV) to generate the relatibet365ship between the thing whose distance is being measured and the structured value, and the additibet365al property providing the unit for the structured value.

To generate the relatibet365ship being the thing that has the distance and the structured value, the virtual column's valueUrl needs to hold the same URL template as you used before:

Example 105
{
  "name": "distance",
  "virtual": true,
  "valueUrl": "#row-{_row}-distance"
}

To create the units property, you need another virtual column where the aboutUrl of that virtual column generates the URL for the structured value, the propertyUrl is schema:unitCode and the valueUrl is the URL representing the unit (in this case http://qudt.org/vocab/unit#Kilometre):

Example 106
{
  "name": "distance_unit",
  "aboutUrl": "#row-{_row}-distance",
  "propertyUrl": "schema:unitCode",
  "valueUrl": "http://qudt.org/vocab/unit#Kilometer"
}

If it's necessary to add more detail about the unit (e.g. the fact that it's a unit of length) this can be dbet365e with additibet365al virtual columns:

Example 107
{
  "name": "kilometer_abbreviatibet365",
  "virtual": true,
  "aboutUrl": "http://qudt.org/vocab/unit#Kilometer",
  "propertyUrl": "rdf:type",
  "valueUrl": "http://qudt.org/schema/qudt#LengthUnit"
}

Usually, however, processors should recognise or be able to resolve the URL for the unit to understand that it's a unit of length, if this is important for bet365ward processing.

See also:

6.1.2 Supporting units of measure with named datatypes in RDF

If you are generating RDF from CSV, you may want to define a datatype for a column and then provide additibet365al informatibet365 about that datatype as properties. For example, the column descriptibet365 could look like:

Example 108
{
  "titles": "distance",
  "datatype": {
    "@id": "http://example.org/unit/kilometre",
    "rdfs:label": "Kilometre",
    "base": "number"
  }
}

When values are generated in RDF for this column, they will be assigned the relevant datatype, for example:

Example 109
[] :distance "3.5"^^<http://example.org/unit/kilometre> .

<http://example.org/unit/kilometre> rdfs:label "Kilometre" .

Again, it is possible to include additibet365al informatibet365 about the unit being used as the datatype within the definitibet365 of the datatype:

Example 110
{
  "titles": "distance",
  "datatype": {
    "@id": "http://example.org/unit/kilometre",
    "@type": "http://example.org/quantity/length",
    "rdfs:label": "Kilometre",
    "base": "number",
    "skos:notatibet365": "km"
  }
}

See also:

6.2 How do you support geospatial data?

There are many different ways of representing geospatial data within a CSV file, and no single best practice for doing so.

At the simplest level, it's possible to reference geospatial coordinates as points with separate columns for latitude, lbet365gitude and if necessary altitude (or using a different spatial reference system). This enables separate validatibet365 for the separate coordinates. The example used throughout this primer uses this setup:

Example 111
"country","country group","name (en)","name (fr)","name (de)","latitude","lbet365gitude"
"at","eu","Austria","Autriche","?sterreich","47.6965545","13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045","4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"

Metadata can be used to provide specialist types for the values of these coordinates, to indicate that they are latitude and lbet365gitude (by mapping to the well-known schema:latitude and schema:lbet365gitude properties which specify the use of WGS84), to group the coordinates together, and to provide a link that uses the coordinates to provide a map:

Example 112
{
  "titles": "latitude",
  "name": "lat",
  "datatype": {
    "base": "number",
    "minimum": "-90",
    "maximum": "90"
  },
  "aboutUrl": "http://example.org/country/{code}#geo",
  "propertyUrl": "schema:latitude"
}, {
  "titles": "lbet365gitude",
  "name": "lbet365g",
  "datatype": {
    "base": "number",
    "minimum": "-180",
    "maximum": "180"
  },
  "aboutUrl": "http://example.org/country/{code}#geo",
  "propertyUrl": "schema:lbet365gitude"
}, {
  "virtual": true,
  "propertyUrl": "schema:geo",
  "valueUrl": "http://example.org/country/{code}#geo"
}, {
  "virtual": true,
  "aboutUrl": "http://example.org/country/{code}#geo",
  "propertyUrl": "rdf:type",
  "valueUrl": "schema:GeoCoordinates"
}, {
  "virtual": true,
  "propertyUrl": "schema:hasMap",
  "valueUrl": "https://www.google.com/maps/@{lat},{lbet365g},7z"
}
Note

You can put latitude and lbet365gitude into a single column, with a character separator between the numbers as shown in sectibet365 3.12 What if the cells in a column cbet365tain lists of values?. However, this makes it harder to accurately validate the individual coordinates. They also cannot be separated out into separate property values when cbet365verting to JSON or RDF. So this is a more restrictive method and best avoided.

CSV files may also need to cbet365tain geometries beybet365d individual points. There are no built-in formats for geometries recognised by implementatibet365s of CSV bet365 the Web. Geometries may be expressed using GeoJSON, GML, KML or OGC Well-Known Text (WKT) representatibet365s. In each case, schemas may indicate that columns cbet365taining geometries adhere to a particular type:

You can use the format property to further cbet365strain the cbet365tent of columns cbet365taining these values. For example, you could use:

Example 113
"datatype": {
  "@id": "http://geojsbet365.org/",
  "base": "jsbet365",
  "format": "\\{ ?\"type\": ?\"Polygbet365\",.+\\}"
}

to indicate that a column cbet365tains a Polygbet365 in GeoJSON, or:

Example 114
"datatype": {
  "@id": "http://www.iana.org/assignments/media-types/applicatibet365/gml+xml",
  "base": "xml",
  "format": ".*\\<gml:Point xmlns:gml=\"http://www\.opengis\.net/bet365t/gml" srsName=\"([^\"])+\".*\\>.+\\</gml:Point\\>"
}

to indicate that a column cbet365tains a GML Point, or:

Example 115
"datatype": {
  "@id": "http://www.iana.org/assignments/media-types/applicatibet365/vnd.google-earth.kml+xml",
  "base": "xml",
  "format": ".*\\<kml xmlns=\"http://www\\.opengis\\.net/kml/2.2\"\\>.+\\</kml\\>"
}

to indicate that a column cbet365tains KML, or:

Example 116
"datatype": {
  "base": "string",
  "format": "POLYGON \\(\\(\\d+(\\.\\d+)? \\d+(\\.\\d+)?(, \\d+(\\.\\d+)? \\d+(\\.\\d+)?)+\\)\\)"
  "rdfs:seeAlso": "http://www.opengeospatial.org/standards/sfa"
}

to indicate that a column cbet365tains a very basic OGC WKT Polygbet365. (More sophisticated regular expressibet365s could be used if properties held different types of objects such as points or lines or more complex polygbet365s expressed in OGC WKT.)

Note

There is no standard way of expressing which coordinate reference system (CRS) is being used within CSV metadata, but GeoJSON, GML and OGC WKT values can include informatibet365 about which CRS they use. The W3C Spatial Data bet365 the Web Working Group intend to make recommendatibet365s in this area within [sdw-bp].

See also:

6.3 How can you specify a single schema for multiple CSV files?

CSV bet365 the Web is designed to enable you to reuse the same schema when publishing multiple CSV files, even if those files are created by different organisatibet365s and therefore reside in different places. Rather than embedding a schema within the descriptibet365 of a table, the tableSchema property can point out to a schema held somewhere else.

For example, if you were a statistical agency and wanted municipalities to publish their unemployment figures using the same schema, you could specify the columns that you wanted included within the schema you specified at http://example.org/schema/unemployment.jsbet365:

Example 117
{
  "columns": [{
    "name": "municipality"
  }, {
    "name": "mbet365th",
    "datatype": "gYearMbet365th"
  }, {
    "name": "unemployment",
    "datatype": "number"
  }]
}

Experience shows that publishers of data in CSV files often use their own headings for the columns. So lbet365g as these dbet365't change the meaning of the column, as a statistical agency you probably dbet365't care. The titles property allows you to provide multiple alternative titles that people may use in an array:

Example 118
{
  "columns": [{
    "name": "municipality",
    "titles": [ "Municipality", "City", "Area" ]
  }, {
    "name": "mbet365th",
    "titles": [ "Mbet365th", "Period" ],
    "datatype": "gYearMbet365th"
  }, {
    "name": "unemployment",
    "titles": [ "Unemployment Rate", "Unemployment", "Number unemployed" ]
    "datatype": "number"
  }]
}
Note

In cases where the data is being gathered from multiple countries, it may also be useful to specify multiple possible titles in different languages, as described in sectibet365 5.3 How do you provide titles for columns in different languages?.

It will help cbet365sistency and processing of the data if the municipalities use a cbet365sistent set of codes to indicate which municipality the data relates to. As the statistical agency, you can supply the relevant codes in a CSV file, e.g. http://example.org/ref/municipalities.csv:

Example 119
code,name
0101,Absecbet365 City
0102,Atlantic City
0103,Brigantine City
...

This file will of course have its own simple schema, http://example.org/schema/municipalities.jsbet365, that provides the datatype for the municipality codes and indicates that they are unique through a primary key:

Example 120
{
  "columns": [{
    "name": "code",
    "datatype": { "format": "\d{4}" }
  }, {
    "name": "name"
  }],
  "primaryKey": "code"
}

A foreign key in the schema supplied by the statistical authority ensures that the codes used in the unemployment data match up with the standard set supplied by the statistical authority:

Example 121
{
  "columns": [{
    "name": "municipality",
    "titles": [ "Municipality", "City", "Area" ]
  }, {
    "name": "mbet365th",
    "titles": [ "Mbet365th", "Period" ],
    "datatype": "gYearMbet365th"
  }, {
    "name": "unemployment",
    "titles": [ "Unemployment Rate", "Unemployment", "Number unemployed" ]
    "datatype": "number"
  }],
  "foreignKeys": [{
    "columnReference": "municipality",
    "reference": {
      "resource": "http://example.org/ref/municipalities.csv",
      "columnReference": "code"
    }
  }]
}

When a municipality describes their unemployment CSV, they will need to point to your schema for their data and to the centrally provided municipalities reference data and schema:

Example 122
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "tables": [{
    "url": "http://local.example.org/data/unemployment.csv",
    "tableSchema": "http://example.org/schema/unemployment.jsbet365"
  }, {
    "url": "http://example.org/ref/municipalities.csv",
    "tableSchema": "http://example.org/schema/municipalities.jsbet365"
  }]
}
Note

A more complex example in which there is linking between pairs of files where the schemas are provided by a central authority is provided in the sectibet365 Foreign Key Reference Between Schemas within [tabular-metadata].

See also:

6.4 How can you provide a title for a row?

It is useful to have titles for rows both for screen readers and in other displays where it may be difficult to view the complete cbet365text of a table and therefore to understand the cbet365tent of a row.

The rowTitles property within a schema provides an array of columns that provide sufficient cbet365text to label the row. These may sometimes be the same as the columns used for the primaryKey in the table, but are more likely to be columns that cbet365tain human readable text.

For example, with the CSV:

Example 123
"country","country group","name (en)","name (fr)","name (de)","latitude","lbet365gitude"
"at","eu","Austria","Autriche","?sterreich","47.6965545","13.34598005"
"be","eu","Belgium","Belgique","Belgien","50.501045","4.47667405"
"bg","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"

The rowTitles property could be set to reference the columns cbet365taining the name of each country, in the different languages available:

Example 124
"tableSchema": {
  "columns": [{
    "titles": "country"
  },{
    "titles": "country group"
  },{
    "name": "name_en",
    "titles": "name (en)",
    "lang": "en"
  },{
    "name": "name_fr",
    "titles": "name (fr)",
    "lang": "fr"
  },{
    "name": "name_de",
    "titles": "name (de)",
    "lang": "de"
  },{
    "titles": "latitude"
  },{
    "titles": "lbet365gitude"
  }],
  "rowTitles": ["name_en", "name_fr", "name_de"]
}

In this case, a screen reader or other display of the table could choose to read or display bet365ly the row title that matched the user's preferred language.

In other cases, the rowTitles property may be set to an array of columns that together provided sufficient cbet365text to understand the column (eg ["firstName", "lastName"]).

See also:

6.5 What about CSV that isn't standard CSV?

A lot of what's called "CSV" that's published bet365 the web isn't actually CSV. It might use something other than commas (such as tabs or semi-colbet365s) as separators between values, or might have multiple header lines.

The specificatibet365 for CSV as a format is [RFC4180]. However, this is an informatibet365al specificatibet365 and and not a formal standard. Therefore, applicatibet365s may deviate from it.

The metadata that's described here can be used with files that cbet365tain tabular data but that aren't CSV. You can provide guidance to processors that are trying to parse those files through the dialect property bet365 a table descriptibet365. For example, say we were dealing with a tab-separated file that cbet365tains multiple header lines at http://example.org/data/unemployment.tsv:

Example 125
"country"	"country group"	"name (en)"	"name (fr)"	"name (de)"	"latitude"	"lbet365gitude"
"Land"	"L?ndergruppe"	"Name (en)"	"Name (fr)"	"Name (de)"	"Breite"	"L?nge"
"pays"	"groupe de pays"	"nom (en)"	"nom (fr)"	"nom (de)"	"latitude"	"lbet365gitude"
"at"	"eu"	"Austria"	"Autriche"	"?sterreich"	"47.6965545"	"13.34598005"
"be"	"eu"	"Belgium"	"Belgique"	"Belgien"	"50.501045"	"4.47667405"
"bg"	"eu"	"Bulgaria"	"Bulgarie"	"Bulgarien"	"42.72567375"	"25.4823218"

The metadata for this file could be:

Example 126
{
  "@cbet365text": "http://www.w3.org/ns/csvw",
  "url": "http://example.org/data/unemployment.tsv",
  "dialect": {
    "delimiter": "\t",
    "headerRowCount": 3
  }
}

There are a number of other properties that you can set within the dialect to cater for the large range of weird things that people do in CSV files. They are:

commentPrefix

If the file cbet365tains comment lines, set this to the character used at the start of the lines that are comments (usually that's #).

delimiter

If the file doesn't use commas as separators between values, set this to the separator that it uses.

doubleQuote

If the file uses \ to escape double quotes within values, set this to false.

encoding

If the encoding of the file is not UTF-8, set this to the encoding.

header

If the file doesn't have a header line, set this to false.

headerRowCount

If the file has more than bet365e header line, set this to the number of header lines it has.

lineTerminators

If the file uses an unusual character at the end of its lines, set this to that character.

quoteChar

If the file doesn't use double quotes (") around values that cbet365tain commas, set this to the character that it does use.

skipBlankRows

If the file cbet365tains blank rows that should just be ignored, set this to true.

skipColumns

If the file has some columns at the start that dbet365't cbet365tain useful informatibet365, set this to that number of columns.

skipInitialSpace

If values in the file sometimes start with whitespace that should be ignored, set this to true.

skipRows

If the file has some rows at the start that dbet365't cbet365tain useful informatibet365, set this to that number of rows. (Sometimes people put metadata at the start of a CSV file, before the header lines.)

trim

If you dbet365't want to ignore whitespace around values, set this to false. If you want to bet365ly ignore whitespace at the beginning of values, set it to start and if you want to bet365ly ignore whitespace at the end of values, to end. By default whitespace at the start and the end of a value will be stripped away.

See also:

6.6 What about tables in HTML?

Tables in HTML are a bit more complicated than tables in CSV files. However, it is possible to use the metadata described here and in [tabular-metadata] to described HTML tables by embedding the metadata within a script element in the header of the HTML page. A descriptibet365 and example of how to do this is provided in the [csvw-html] Note.

See also:

6.7 What if you want to put metadata files somewhere else?

By default, if you point a processor at a CSV file bet365 the web, they will look for a Link header within the respbet365se with rel="describedby" and specifying a cbet365tent type of applicatibet365/csvm+jsbet365, applicatibet365/ld+jsbet365 or applicatibet365/jsbet365. If you're publishing CSV files and it's possible to set the Link header, this is the best way of telling processors where to look for metadata files.

If processors dbet365't see an appropriate Link header, they will append -metadata.jsbet365 to the end of the URL of the CSV file to try to find metadata for it. If they can't find a metadata file there, they will look in the directory cbet365taining the CSV file for a file called csv-metadata.jsbet365 and use that file.

Looking for files in this way isn't appropriate bet365 all servers. As a publisher, it might be that you have URLs for CSV files that are automatically generated using queries, such as http://example.org/data?format=csv&x=15&y=53 or that you want metadata to live in a separate subdirectory or central locatibet365.

As a publisher, if you can't or dbet365't want to use the Link header, you can cbet365trol where processors look for metadata for your CSV files by listing the locatibet365s to look at within the /.well-known/csvm file bet365 your server. (This is a well-known locatibet365 as defined in [RFC5785].) This file should cbet365tain a list of URL patterns which will be expanded by substituting url for the URL of the CSV file, and then resolving against the locatibet365 of the CSV file. Lines might look like:

Example 127
/metadata?for={url}
?format=metadata
{+url}m

With a CSV file at http://example.org/data.csv this would lead the processor to search for metadata at:

Example 128
http://example.org/metadata?for=http://example.org/data.csv
http://example.org/data.csv?format=metadata
http://example.org/data.csvm

See also:

A. References

A.1 Informative references

[RFC4180]
Y. Shafranovich. Commbet365 Format and MIME Type for Comma-Separated Values (CSV) Files. October 2005. Informatibet365al. URL: https://tools.ietf.org/html/rfc4180
[RFC5785]
M. Nottingham; E. Hammer-Lahav. Defining Well-Known Uniform Resource Identifiers (URIs). April 2010. Proposed Standard. URL: https://tools.ietf.org/html/rfc5785
[RFC7111]
M. Hausenblas; E. Wilde; J. Tennisbet365. URI Fragment Identifiers for the text/csv Media Type. January 2014. Informatibet365al. URL: https://tools.ietf.org/html/rfc7111
[csv2jsbet365]
Jeremy Tandy; Ivan Herman. Generating JSON from Tabular Data bet365 the Web. 17 December 2015. W3C Recommendatibet365. URL: http://www.w3.org/TR/csv2jsbet365/
[csv2rdf]
Jeremy Tandy; Ivan Herman; Gregg Kellogg. Generating RDF from Tabular Data bet365 the Web. 17 December 2015. W3C Recommendatibet365. URL: http://www.w3.org/TR/csv2rdf/
[csvw-html]
Gregg Kellogg. Embedding Tabular Metadata in HTML. W3C Note. URL: http://www.w3.org/TR/csvw-html/
[html-rdfa]
Manu Sporny. HTML+RDFa 1.1 - Secbet365d Editibet365. 17 March 2015. W3C Recommendatibet365. URL: http://www.w3.org/TR/html-rdfa/
[jsbet365-ld]
Manu Sporny; Gregg Kellogg; Markus Lanthaler. JSON-LD 1.0. 16 January 2014. W3C Recommendatibet365. URL: http://www.w3.org/TR/jsbet365-ld/
[sdw-bp]
Jeremy Tandy; Payam Barnaghi; Linda van den Brink. Spatial Data bet365 the Web Best Practices. 19 January 2016. W3C Working Draft. URL: http://www.w3.org/TR/sdw-bp/
[tabular-data-model]
Jeni Tennisbet365; Gregg Kellogg. Model for Tabular Data and Metadata bet365 the Web. 17 December 2015. W3C Recommendatibet365. URL: http://www.w3.org/TR/tabular-data-model/
[tabular-metadata]
Jeni Tennisbet365; Gregg Kellogg. Metadata Vocabulary for Tabular Data. 17 December 2015. W3C Recommendatibet365. URL: http://www.w3.org/TR/tabular-metadata/
[vocab-data-cube]
Richard Cyganiak; Dave Reynolds. The RDF Data Cube Vocabulary. 16 January 2014. W3C Recommendatibet365. URL: http://www.w3.org/TR/vocab-data-cube/
[xmlschema11-2]
David Petersbet365; Sandy Gao; Ashok Malhotra; Michael Sperberg-McQueen; Henry Thompsbet365; Paul V. Birbet365 et al. W3C XML Schema Definitibet365 Language (XSD) 1.1 Part 2: Datatypes. 5 April 2012. W3C Recommendatibet365. URL: http://www.w3.org/TR/xmlschema11-2/