Summary. In this project an interactive data visualisation is created using a dataset containing Japan's population between 1871 and 2015. The data is broken down by geographic region in a number of ways, including by island, prefecture, region and capital.
Skills used:
This project is based on that which Adil Moujahid created [1]. However, it is modified in the following ways:
Some basic cleaning was done:
df['prefecture'] = df['prefecture'].str.replace("-ken", '')
df.population = df.population.fillna(0).astype(int)
df['pop_density'] = df['population']/df['estimated_area']
df['pop_density'] = df['pop_density'].round(1)
MongoDB [4] was used to store the data in a non-relational database. Non-relational databases have the following advantages:
After installation, on linux the mongodb service (mongod) is started with
sudo service mongod start
To check that the mongod process has started successfully the contents of the log file at /var/log/mongodb/mongod.log can be checked; in particular the line '[initandlisten] waiting for connections on port 27017'. The status of the service can be checked/ modified subsequently with the following:
sudo service mongod stop
sudo service mongod restart
service mongod status
MongoDB supports importing databases in both csv and json format. To load the database from the csv
mongoimport --db DATABASE_NAME --collection COLLECTION_NAME --type csv --file 'PATH_TO_CSV/FILENAME.csv' --headerline
In order to make sure the database is loaded correctly use:
Mongo
show dbs
use DATABASE_NAME
show collections
db.getCollection("COLLECTION_NAME").findOne()
db.getCollection("COLLECTION_NAME").findOne({}, {VARIABLE_1:1, VARIABLE_2:1, VARIABLE_3:1, VARIABLE_4:1, _id:0})
where VARIABLE_1 etc are the variables defined in the data. The above can also be used to make queries to the database from the command line.
from pymongo import MongoClient
#Specify string names inside '' for following variables
MONGODB_HOST = 'localhost'
DBS_NAME = 'DATABASE_NAME'
COLLECTION_NAME = 'COLLECTION_NAME'
#Specify numerical variable (default used)
MONGODB_PORT = 27017
#Specify variables in csv of interest
FIELDS = {'prefecture': True, 'year': True, 'population': True, 'capital': True, 'region': True,'estimated_area': True,'island': True,'pop_density': True, '_id': False}
Flask [5] is used to create a webserver that retrieves the data from the database as it is requested by the rendered html page containing the visualisation. In particular the Python flask application looks for an index.html file inside the templates directory, and serves this at the specified port of http://localhost:5000/.
from flask import Flask
from flask import render_template
app = Flask(__name__)
@app.route("/")
def index():
return render_template("index.html")
if __name__ == "__main__":
app.run(host='0.0.0.0',port=5000,debug=True)
In order to check that the database is accessible, the following can used:
@app.route("visualisation/testing")
def visualisation_testing():
connection = MongoClient(MONGODB_HOST, MONGODB_PORT)
collection = connection[DBS_NAME][COLLECTION_NAME]
input_variables = collection.find(projection=FIELDS)
json_variables = []
for v in input_variables:
json_variables.append(v)
json_variables = json.dumps(json_variables, default=json_util.default)
connection.close()
return json_variables
which utilizes the information supplied previously to output the specified variables defined by FIELDS to http://localhost:5000/visualisation/testing.
Keen.io [6] produces templates for visualisations. The template used by [1] was modified to create a new layout. This was done by modifying the index.html file which Flask serves. Each element of the visualisation has an entry like the following:
<!-- population vs population density-->
<div class="col-sm-12">
<div class="chart-wrapper">
<div class="chart-title">
Population vs population density as function of capital
</div>
<div class="chart-stage">
<div id="bubble-chart"></div>
</div>
</div>
</div>
<!-- population vs population density -->
The individual elements are arranged into rows using a div tag of class 'row' and into columns depending on how many graphs/info elements are put on the same row. This way a grid of elements split into rows and columns can be created.
The index.html file contains links to the css style file, Bootstrap Keen.io dependancy and also to the javascript libraries used for charting namely
The graphs and information elements themselves are designed within the graphs.js file (also linked in the index.html file) and called individually from the index.html file by div id tags. For example, the bubble chart is referenced with
<div id="bubble-chart"></div>
In the graph.js file it is constructed using
var populationBubbleChart = dc.bubbleChart('#bubble-chart');
The graphs.js file first uses the queue.js libraries .defer method to read both the dataset and geojson files simultaneously. The .await function is used to delay starting the construction of the charts (defined within the function makeGraphs) until all the data has been read. The makeGraphs function takes the input data sources as input. It performs the following steps:
Figure 1: gif animation of visualisation.