Name		Name	Last commit message	Last commit date
parent directory ..
img		img
README.md		README.md
formatting_tips.md		formatting_tips.md
life_of_a_dataset.md		life_of_a_dataset.md
mcf_format.md		mcf_format.md
representing_statistics.md		representing_statistics.md

README.md

Adding a Dataset to Data Commons

This document summarizes the steps involved in adding a dataset to Data Commons (DC).

Prerequisites

Ensure that the Data Commons team has approved the addition of the dataset. Please suggest a dataset here.
Review the following documents to get a background on the data model, format and workflow:
- Summary of data model (DC inherits schema from schema.org)
- How statistics is represented in DC
- MCF Format
- Life of a dataset

Design location and schema mapping

Data Commons is a single graph that reconciles references to the same entities and concepts across datasets. This linking happens at the time of importing datasets. To get started:

Identify how the locations/places/entities and variables/properties in the dataset will get mapped.
For locations/places, use the following (in preferred order): global identifiers (like FIPS), geo info (lat/lng, geo boundary), qualified names.
For variables, find already existing schema in Data Commons (from existing statistical variables here), or add new StatisticalVariable nodes along with core schema (new Class, Property, Enumeration nodes) as necessary.

This process typically happens in collaboration with the DC core team, and we recommend that you put together a short import document (Template, Example1, Example2)

Schema-less imports

For datasets with complex schema or ones that we want to import quickly, we can start with a schema-less import, and iteratively add schema. The “schema-less” part of this framework means that the statistical variables are not yet fully defined. This lets us get the dataset into Data Commons without blocking on schema definition. To learn more, please review the following links:

Preparing artifacts

Once the entity and schema mapping have been finalized, you can now prepare the artifacts. This includes:

StatisticalVariable MCF nodes (if any) checked into schema repo
Template MCF and corresponding cleaned tabular files (typically CSV).
Data cleaning code (along with README) checked into data repo
Validation results for the artifacts (from running dc-import tool)

Note: you may also use the DC Import Wizard to help generate artifacts for common dataset structures

Review by DC team

When all the artifacts are ready, please get it reviewed by the DC core team via github Pull Requests made to the data repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

README.md

Adding a Dataset to Data Commons

Prerequisites

Design location and schema mapping

Schema-less imports

Preparing artifacts

Review by DC team

Files

docs

Directory actions

More options

Directory actions

More options

Latest commit

History

docs

Folders and files

parent directory

README.md

Adding a Dataset to Data Commons

Prerequisites

Design location and schema mapping

Schema-less imports

Preparing artifacts

Review by DC team