Quickstart

To start developing a custom Data Commons instance, we recommend that you develop your site and host your data locally. This uses a SQLite database to store custom data.

setup2

This page shows you how to run a local custom Data Commons instance inside a Docker container, load sample custom data, and enable natural querying. A custom Data Commons instance uses code from the public open-source repo, available at https://github.com/datacommonsorg/.

Prerequisites

  • Obtain a GCP billing account and project.
  • Install Docker Engine.
  • Install Git.
  • Get an API key for Data Commons by submitting the Data Commons API key request form. The key is needed to authorize requests from your site to the base Data Commons site. Typical turnaround times are 24-48 hours.
  • Optional: Get a Github account, if you would like to browse the Data Commons source repos using your browser.

One-time setup steps

Enable Google Cloud APIs and get a Maps API key

  1. Go to https://console.cloud.google.com/apis/dashboard for your project.
  2. Click Enable APIs & Services.
  3. Under Maps, enable Places API and Maps Javascript API.
  4. Go to https://console.cloud.google.com/google/maps-apis/credentials for your project.
  5. Click Create Credentials > API Key.
  6. Record the key and click Close.
  7. Click on the newly created key to open the Edit API Key window.
  8. Under API restrictions, select Restrict key.
  9. From the drop-down menu, enable Places API and Maps Javascript API. (Optionally enable other APIs for which you want to use this key.)
  10. Click OK and Save.

Clone the Data Commons repository

  1. Open a terminal window, and go to a directory to which you would like to download the Data Commons repository.
  2. Clone the website Data Commons repository:

    git clone https://github.com/datacommonsorg/website.git
    

    This creates a local website subdirectory.

  3. When the downloads are complete, navigate to the root directory of the repo, website. References to various files and commands in these procedures are relative to this root.

    cd website
    

Set API keys as environment variables

  1. Using your favorite editor, open custom_dc/sqlite_env.list.
  2. Enter the relevant values for DC_API_KEY and MAPS_API_KEY.
  3. Leave ADMIN_SECRET blank for now.

Warning: Do not use any quotes (single or double) or spaces when specifying the values.

Note: If you are storing your source code in a public/open-source version control system, we recommend that you do not store the environment variables files containing secrets. Instead, store them locally only.

About the downloaded files

Directory/file Description
custom_dc/sample/ Sample supplemental data that is added to the base data in Data Commons. This page shows you how to easily load and view this data. The data is in CSV format and mapped to Data Commons entity definitions using the config.json file.
custom_dc/examples/ More examples of custom data in CSV format and config.json. To configure your own custom data, see Work with custom data.
server/templates/custom_dc/custom/ Contains customizable HTML files. To modify these, see Customize HTML templates.
static/custom_dc/custom/ Contains customizable CSS file and default logo. To modify the styles or replace the logo, see Customize Javascript and styles.
custom_dc/sqlite_env.list Contains environment variables for a development environment using SQLite as the database. For details of the variables, see the comments in the file.
custom_dc/cloudsql_env.list Contains environment variables for a development or production environment using Cloud SQL as the database. For details of the variables, see the comments in the file.

Start the services

From the root directory, website, run Docker as follows.

Note: If you are running on Linux, depending on whether you have created a “sudoless” Docker group, you will need to preface every docker invocation with sudo.

docker run -it \
-p 8080:8080 \
-e DEBUG=true \
--env-file $PWD/custom_dc/sqlite_env.list \
-v $PWD/custom_dc/sample:/userdata \
gcr.io/datcom-ci/datacommons-website-compose:stable

This command does the following:

  • The first time you run it, downloads the latest stable Data Commons image, gcr.io/datcom-ci/datacommons-website-compose:stable, from the Google Cloud Artifact Registry, which may take a few minutes. Subsequent runs use the locally stored image.
  • Starts a Docker container in interactive mode.
  • Starts development/debug versions of the Web Server, NL Server, and Mixer, as well as the Nginx proxy, inside the container
  • Maps the sample data to the Docker path /userdata, so the servers do not need to be restarted when you load the sample data

Stop and restart the services

If you need to restart the services for any reason, do the following:

  1. In the terminal window where the services are running, press Ctrl-c to kill the Docker container.
  2. Rerun the docker run command as usual.

Tip: If you close the terminal window in which you started the Docker container, you can kill it as follows:

  1. Open another terminal window, and from the website directory, get the Docker container ID.

      docker ps
    

    The CONTAINER ID is the first column in the output.

  2. Run:

  docker kill CONTAINER_ID
	

View the local website

Once Docker is up and running, visit your local instance by pointing your browser to http://localhost:8080. You should see something like this:

screenshot_homepage

You can browse the various Data Commons tools (Variables, Map, Timelines, etc.) and work with the entire base dataset.

Load sample data

In this step, we will add sample data that we have included as part of the download for you to load it into your custom instance. This data is from the Organisation for Economic Co-operation and Development (OECD): “per country data for annual average wages” and “gender wage gaps”.

To load and view the sample data:

  1. Point your browser to the admin page at http://localhost:8080/admin.
  2. Since you have not yet specified an ADMIN_SECRET, leave it blank.
  3. Click Load Data. It may take a few seconds to load.

This does the following:

  • Imports the data from the CSV files, resolves entities, and writes the data to a SQLite database file, custom_dc/sample/datacommons/datacommons.db.
  • Generates embeddings in the Docker image and loads them. (To learn more about embeddings generation, see the FAQ.

Tip: When you restart the Docker instance, all data in the SQLite database is lost. If you want to preserve the sample data and have it automatically always load after restarting your Docker instance, without having to run the load data function each time, include this additional flag in your Docker run command:

-v $PWD/custom_dc/sample/datacommons:/sqlite

Now click the Timeline link to visit the Timeline explorer. Try entering a country and click Continue. Now, in the Select variables tools, you’ll see the new variables:

screenshot_timeline

Select one (or both) and click Display to show the timeline graph:

screenshot_display

To issue natural language queries, click the Search link. Try NL queries against the sample data you just loaded, e.g. “Average annual wages in Canada”.

screenshot_search

Note that NL support increases the startup time of your server and consumes more resources. If you don’t want NL functionality, you can disable it by updating the ENABLE_MODEL flag in sqlite_env.list from true to false.

Send an API request

A custom instance can accept REST API requests at the endpoint /core/api/v2/. To try it out, here’s an example request that returns the same data as in the interactive queries above, using the observation API. You can enter this query in your browser to get nice output:

http://localhost:8080/core/api/v2/observation?entity.dcids=country%2FCAN&select=entity&select=variable&select=value&select=date&variable.dcids=average_annual_wage

Note: You do not need to specify an API key as a parameter.

If you select Prettyprint, you should see output like this:

screenshot_api_call