Projects

Cleaning Airline Data with Python

In this project a messy dataset about airline flights is cleaned to enable it to be used for further analysis. For this purpose, the project uses Python pandas library.

Creating a static website using Hugo and hosting it using GitHub Pages

in documentation

This report summarises the process of creating a static website using Hugo and then hosting it using GitHub Pages. The purpose of the webpage is to allow the user to i) give a brief summary of his/ her background and ii) provide a list of links showing projects they have been involved with. To this end, modifications are made to a default bootstrap template to allow the website to meet these requirements. DocOnce is used to create individual project reports. The website is then hosted through GitHub Pages.

Exploratory Data Analysis of Facebook Ad Data with Python

in eda

In this project facebook ad data is analysed through means of an exploratory data analysis. Metrics commonly use in ad analysis are implemented and investigated. It is assumed business performance is driven by absolute return on advertising spend and as such the ROAS metric is targeted. This preliminary analysis suggests further campaigns should focus on the 30-34 age group, particularly males. The advertising spend is least effectively targeted on the 45-49 age group. However, the number of clicks associated with these conclusions is in some cases low and it is therefore suggested that further work aim to show the statistical significance of targeting these groups.

Feasibility of reconstructing ttH (with H decaying to WW) for the purpose of measuring the Standard Model Higgs CP

in end to end analysis

This work investigates the feasibility of reconstructing ttH (with H decaying to WW) for the purpose of measuring the Standard Model Higgs CP. The method used to measure the Higgs CP relies on using the top quark momenta and therefore requires that the ttH topology be fully reconstructed. To this end, a fit based method using chi squared minimization is implemented and explored as a means by which the events can be reconstructed. This method is compared to a multivariate boosted decision tree implementation. The boosted decision tree implementation yields improved results compared to the fit based approach, particularly in events with a semi-leptonically decaying top. In such events, this approach is seen to yield improvements of the order of 10 % for certain aspects of the reconstructed topology and by as much as a few percent for the ttH topology as a whole.

Live Travel and Weather Updates using APIs

in visualization

Having up to date and specific weather and travel information in a single point of reference for the daily commute can be a very useful time saver and convenience. This project uses real time and forcasted data from the Transport for London Unified API and the MET Office weather API Datapoint to develop a plotly Dash dashboard. The dashboard provides up to date travel information for several designated routes as well as real time and forcasted weather information for the local area. The dashboard is deployed online using Heroku.

Making steps towards an efficient and effective marking system

in problem solving

This report summarises the development of a marking system aimed at reducing the time taken to mark exercise books of student work whilst at the same time ensuring this process remains effective and is in line with a typical school marking policy. The system uses the RAG 123 marking criteria to assess student understanding of a particular learning objective and then uses this to automatically assign each student individual tailored feedback, including comment, example and next steps/ challenge question. The methodology was used by the author in his teaching duties who recognised an immediate improvement in the quality and consistency of his feedback to students. Using the system also reduced his marking time by over 50%. The feedback it delivered was praised by a headteacher and his department during book scutinies.

Optimization of the Level 2 electron-photon Trigger within ATLAS

in end to end analysis

This report outlines preliminary studies performed into the optimization of the ATLAS Level 2 electron-photon egamma trigger with respect to the 2010 data run. Electrons from Z decay are a major source of high pT electrons, an understanding of which is vital for many physics analyses. This optimization focuses on electrons from Z decay. The optimization procedure identifies signal (actual electrons) and background (mis-identified electrons) Level 2 trigger electrons using geometrical matching. Reducing the number of these background trigger electrons has obvious benefits with respect to data storage. A number of possible variables useful in discriminating this signal from background are identified. A log-likelihood ratio discriminant method is developed based on these variables and tested using Monte Carlo. The performance is cross checked using 2010 data. The log-likelihood discriminant method is compared with a cuts based method, based on individual variables and is shown to give improved results.

Searching for the Higgs Boson with the ATLAS detector at CERN

in end to end analysis

The search potential of a Standard Model Higgs boson in the Vector Boson Fusion production mechanism with Higgs boson decaying to two leptons and two neutrinos via decay to two Z bosons with the ATLAS detector is investigated. The ATLAS detector is a general purpose detector in operation at CERN measuring proton-proton collisions produced by the Large Hadron Collider. This channel has been shown to have high sensitivity at large Higgs mass, where large amounts of missing energy in the signal provide good discrimination over expected backgrounds. This work takes a first look at whether the sensitivity of this channel may be improved using the remnants of the vector boson fusion process to provide extra discrimination, particularly at lower mass where sensitivity of the main analysis is reduced because of lower missing energy.

Using Docker and Kubernetes to produce a scalable fraud detection API

in machine learning

In this report a simple logistic regression model is used to classify credit card transactions as fraudulent or not. A Recall of 0.8 and Precision of 0.7 is obtained for a false positive rate of 0.0005. However, for a model to be useful from a business perspective an understanding of how to deploy the model in the real world is important. Docker and Kubernetes are investigated for this purpose.

Visualizing Japan's Population Data using D3.js, Flask and MongoDB

in visualization

In this project an interactive data visualisation is created using a dataset containing Japan's population between 1871 and 2015. The data is broken down by geographic region in a number of ways, including by island, prefecture, region and capital.

Web Scraping text data with Scrapy and Beautiful Soup

in data cleaning

There is a wealth of information available on the internet. Web scraping is the process which enables people to collate and start to organise this data into a more structured format for further analysis. This project investigates this process using data provided by the UK Parliment, in particular the financial interests of members of the House of Commons. Attributes of the html data motivate the use of two Python libraries, Beautiful Soup and Scrapy, for this work.

Cleaning Airline Data with Python

Creating a static website using Hugo and hosting it using GitHub Pages

Exploratory Data Analysis of Facebook Ad Data with Python

Feasibility of reconstructing ttH (with H decaying to WW) for the purpose of measuring the Standard Model Higgs CP

Live Travel and Weather Updates using APIs

Making steps towards an efficient and effective marking system

Optimization of the Level 2 electron-photon Trigger within ATLAS

Searching for the Higgs Boson with the ATLAS detector at CERN

Using Docker and Kubernetes to produce a scalable fraud detection API

Visualizing Japan's Population Data using D3.js, Flask and MongoDB

Web Scraping text data with Scrapy and Beautiful Soup

Categories

Skills