github twitter linkedin email rss
Faculty data science fellowship
Apr 2, 2019
5 minutes read

I haven’t posted in a long time, I’ve been rather busy. Towards the end of summer 2018, I started writing my PhD thesis, which was submitted in January. During the writing process I was considering what would come next. Since the first year of my PhD, I knew that I wanted to get into data science in some way, and had my eye on a number of training programs and fellowships. In November, I applied for the Faculty Data Science Fellowship. After a Skype interview, where I was asked for my motivations, experience, and interests, I was told I was successful and had won a place on the January fellowship. Happy days.

The Faculty fellowship is an eight week program, taking STEM PhD holders and training them in all aspects of data science, from database design to deep learning. After two weeks of technical and business training, the fellows are placed with a partner company for a six week data science consultancy project. It all culminates with Demo Day, where the fellows present their consultancy projects to an audience of over 400 people. And so, after submitting my PhD thesis on the Friday, I rocked up to the Faculty offices on Monday the 14th of January ready to do some real data science.

The first week was really intense. From meeting the other 18 fellows and the Faculty staff, to hardcore statistics and probability workshops, we had pitches from 19 companies about their projects, as well as a networking session to help match fellows with projects. We were asked to choose our five preferred projects, and happily I was matched with LiveStyled. LiveStyled are a technology platform which allow venues and events to become smart. Meaning - they develop apps and websites for clients like the O2 Arena, where visitors can buy tickets, food and beverages, merchandise, etc, all through the app. This is a rich data source for the venues, offering the chance for personalised content and rewards.

The project started in Week 3, after two weeks of technical training on a whole range of data science topics:

  • Probability and Statistics
  • Bayesian Inference
  • Linear Algorithms
  • Natural Language Processing
  • Neural Nets
  • SQL and NoSQL
  • Unsupervised Learning
  • Trees and Ensemble Methods

Each Friday over the eight weeks we also covered subjects such as presentations, project management, deployment, and dashboards. The sessions were expertly handled by the Faculty data scientists, who are truly experts and passionate about the subjects, with in-depth knowledge.

From Weeks 3-8 I was based at the LiveStyled offices at Angel, London. I was tasked with two parallel projects - here I will only discuss one of those. This invovled LiveStyled’s own data, and whether we could implement a machine learning model to drive revenue at events.

In the first part of the project, I explored the LiveStyled database using SQL and Pandas. Here I picked out some key buying behaviours at events, such as when people buy, popular items, and how a first purchase influences a second purchase. After this exploratory analysis, I started to think that it would be cool if we could use machine learning to predict the probablity of someone making another purchase after their first purchase at an event. If we could do that, we could understand what the driving forces are behind different buying behaviours. In the short term, we could also target marketing and discounts towards people who would give the greatest return, converting people who are only making one purchase into multi-purchase customers.

I set about making features from the database. The most difficult of these were tracking indidivual purchases and matching counting how many previous purchases were made at one event. This invovled some complex SQL joins and Pandas lambda functions. However, after a few days we had a good number of features to start training models.

For model training, I surveyed around six different classification algorithms, where XGBoost was the most performant. I used this for hyperparameter tuning using a randomised CV search. After cross-validation, I deployed the XGBoost model as a hosted endpoint on AWS SageMaker and wrote the data processing routines as AWS Lambda functions in Python. This was really fun, since I had no experience of hosting and deploying models before, and having them interact with live databases.

This took me to the end of my time at LiveStyled. In my final few days, I wrote a revenue forecasting model which, based on data at past events, helps to inform of the optimal output probability thresholds from the model at which to start offering discounts. This was a nice little addition to the project to help maximise revenue and minimise, or understand, the cost of a false positive prediction from the model.

I presented my work at the Faculty Demo Day on March 7th at Blackrock. The days before were spent honing and practicing the talk - I’ve never spent so long preparing a talk that was so short! It was a nerve wracking day, with so many people and so many great presentations. You can watch mine at the bottom of the page. I would also recommend checking out some of the other presentations if you’re interested - the quality and variation of data science applications is staggering!

After the fellowship, I had a couple of days off before my PhD viva, which I passed with minor corrections. I am now searching for Data Science positions in Paris. Hit me up on twitter/email/linkedin if you know of anything interesting!

md


Back to posts