Visualizing Japan's Population Data using D3.js, Flask and MongoDB

 

 

 

Visualizing Japan's Population Data using D3.js, Flask and MongoDB

Clive Edwards (mail at cedwards.info)


Summary. In this project an interactive data visualisation is created using a dataset containing Japan's population between 1871 and 2015. The data is broken down by geographic region in a number of ways, including by island, prefecture, region and capital.

Skills used:

Introduction

This project is based on that which Adil Moujahid created [1]. However, it is modified in the following ways:

Dataset

The dataset [2] used for this project details Japan's population between 1872 and 2015. It was chosen for two main reasons: It was thought that these two elements would be interesting to try to visualize in an interactive way. In order to visualise the geographic information a geojson [3] of Japan was used. To visualise the time series data a slider was used to select specific years. The initial format of the data is shown below.

    prefecture year population capital region estimated_area island
    0 Aichi-ken 1872.1667 1210368.0 Nagoya-shi Chubu 5165.12 Honshu
    1 Aichi-ken 1873.0000 1217444.0 Nagoya-shi Chubu 5165.12 Honshu
    2 Aichi-ken 1874.0000 1217521.0 Nagoya-shi Chubu 5165.12 Honshu
    3 Aichi-ken 1875.0000 1234003.0 Nagoya-shi Chubu 5165.12 Honshu
    4 Aichi-ken 1876.0000 1244711.0 Nagoya-shi Chubu 5165.12 Honshu

Some basic cleaning was done:

df['prefecture'] = df['prefecture'].str.replace("-ken", '')
df.population = df.population.fillna(0).astype(int)  
df['pop_density'] = df['population']/df['estimated_area']
df['pop_density'] = df['pop_density'].round(1)

Using MongoDB to store the data

MongoDB [4] was used to store the data in a non-relational database. Non-relational databases have the following advantages:

Setting up MongoDB

After installation, on linux the mongodb service (mongod) is started with

sudo service mongod start

To check that the mongod process has started successfully the contents of the log file at /var/log/mongodb/mongod.log can be checked; in particular the line '[initandlisten] waiting for connections on port 27017'. The status of the service can be checked/ modified subsequently with the following:

sudo service mongod stop

sudo service mongod restart

service mongod status

Loading the database

MongoDB supports importing databases in both csv and json format. To load the database from the csv

mongoimport --db DATABASE_NAME --collection COLLECTION_NAME --type csv --file 'PATH_TO_CSV/FILENAME.csv' --headerline

In order to make sure the database is loaded correctly use:

Mongo

show dbs

use DATABASE_NAME

show collections

db.getCollection("COLLECTION_NAME").findOne()

db.getCollection("COLLECTION_NAME").findOne({}, {VARIABLE_1:1, VARIABLE_2:1, VARIABLE_3:1, VARIABLE_4:1, _id:0})

where VARIABLE_1 etc are the variables defined in the data. The above can also be used to make queries to the database from the command line.

Using PyMongo to interact with the database from within Python

from pymongo import MongoClient

#Specify string names inside '' for following variables 
MONGODB_HOST = 'localhost'
DBS_NAME = 'DATABASE_NAME'
COLLECTION_NAME = 'COLLECTION_NAME' 
#Specify numerical variable (default used)
MONGODB_PORT = 27017

#Specify variables in csv of interest
FIELDS = {'prefecture': True, 'year': True, 'population': True, 'capital': True, 'region': True,'estimated_area': True,'island': True,'pop_density': True, '_id': False}

Using Flask to run the webserver

Flask [5] is used to create a webserver that retrieves the data from the database as it is requested by the rendered html page containing the visualisation. In particular the Python flask application looks for an index.html file inside the templates directory, and serves this at the specified port of http://localhost:5000/.

from flask import Flask
from flask import render_template

app = Flask(__name__)

@app.route("/")
def index():
    return render_template("index.html")

if __name__ == "__main__":
    app.run(host='0.0.0.0',port=5000,debug=True)

In order to check that the database is accessible, the following can used:

@app.route("visualisation/testing")
def visualisation_testing():
    connection = MongoClient(MONGODB_HOST, MONGODB_PORT)
    collection = connection[DBS_NAME][COLLECTION_NAME]
    input_variables = collection.find(projection=FIELDS)
    
    json_variables = []
    for v in input_variables:
        json_variables.append(v)
    json_variables = json.dumps(json_variables, default=json_util.default)
    connection.close()
    return json_variables

which utilizes the information supplied previously to output the specified variables defined by FIELDS to http://localhost:5000/visualisation/testing.

Creating a new visualization layout based on keen.io

Keen.io [6] produces templates for visualisations. The template used by [1] was modified to create a new layout. This was done by modifying the index.html file which Flask serves. Each element of the visualisation has an entry like the following:

<!-- population vs population density--> 
          <div class="col-sm-12">
            <div class="chart-wrapper">
              <div class="chart-title">
                Population vs population density as function of capital
              </div>
              <div class="chart-stage">
                <div id="bubble-chart"></div>
              </div>
            </div>
          </div>
          <!-- population vs population density -->

The individual elements are arranged into rows using a div tag of class 'row' and into columns depending on how many graphs/info elements are put on the same row. This way a grid of elements split into rows and columns can be created.

Creating the data visualizations with D3.js

The index.html file contains links to the css style file, Bootstrap Keen.io dependancy and also to the javascript libraries used for charting namely

The graphs and information elements themselves are designed within the graphs.js file (also linked in the index.html file) and called individually from the index.html file by div id tags. For example, the bubble chart is referenced with

<div id="bubble-chart"></div>

In the graph.js file it is constructed using

var populationBubbleChart = dc.bubbleChart('#bubble-chart');

The graphs.js file first uses the queue.js libraries .defer method to read both the dataset and geojson files simultaneously. The .await function is used to delay starting the construction of the charts (defined within the function makeGraphs) until all the data has been read. The makeGraphs function takes the input data sources as input. It performs the following steps:

The visualisation is shown in Figure 1

Figure 1: gif animation of visualisation.

Bibliography

  1. A. Moujahid. Interactive Data Visualization with D3.js, DC.js, Python, and MongoDB., http://adilmoujahid.com, 2015.
  2. J. Derenski. Japan Population Data: Japan's Population Over Time, and Space., Japan, 2018.
  3. M. Wichary. Japan geojson., Japan, 2013.
  4. M. Inc. MongoDB for GIANT Ideas., MongoDB, 2009.
  5. A. Ronacher. Flask: web development, one drop at a time., A Python Microframework., 2010.
  6. A. Kasprowicz. Keen: Analytics for Developers., keen.io, 2015.
  7. M. Bostock. D3 for Data-Driven Documents., IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis), 2011.
  8. dc. dc.js - Dimensional Charting Javascript Library., dc, 2009.
  9. M. Bostock. Crossfilter: Fast Multidimensional Filtering for Coordinated Views., crossfilter, 2011.
  10. M. Bostock. d3-queue - Evaluate asynchronous tasks with configurable concurrency., dc, 2011.
© 2018, Clive Edwards