Data Engineer
From 2024 to Present at InCrowd Sports ltd, UK
- Developed and deployed ETL Prefect flows powering 9 siloed fan data platforms (FDP) servicing 15 clients including Arsenal, RFL and Euroleague - Python, Prefect, Docker, Kubernetes, EC2
- Scoped, developed and unit tested 3rd party data sync integrations with established platforms such as Salesforce, Flowcode, DotDigital, Onesignal, SeatGeek, VenueMaster, Experian and Adobe Magneto. Syncing >100M records monthly.
- Collaborated on data warehouse modelling to consolidate user data, enabling in-house FDP SaaS such as Audience Builder, Fan Manager, Campaigns, Promo Assets and Single Customer View - dbt, PostgreSQL, CRM
- Led the monitoring and maintenance of 260 Prefect flows responsible for all ETL pipelines, ensuring a 97% success rate across > 5,000 daily jobs - Site Reliability
- Deployed 100s of feature requests across 30 clients with varying requirements such as regular SFTP drops, seasonal reporting and BI logic changes.
- Complied with GDPR, developing Right To Removal and Subject Access Request flows.
Data Scientist
From 2022 to 2024 at InCrowd Sports ltd, UK
- Co-Developed a campaign optimization pipeline servicing IC’s audience builder for retail, ticketing, memberships and content promotion, leading to a 10% improvement in click through rates - Advanced Targeting, A/B Testing, Marketing ROI
- Produced AWS cloud cost forecasting based on cloudfront usage for 30 clients across 100s of microservices, resulting in a 30% cloud cost reduction, more accurate billing and better profit margins - Statistical Analysis, Pricing Models
- Led the data science team to develop best practices and tooling including experiment structure, CI/CD and testing - MLflow, PyCaret, SageMaker Feature Store
- Implemented a Text2SQL self service tool to allow non technical staff to query databases and gather analytics independently, slashing manual analyst tasks by 90% - LLMs
- Communicated findings to stakeholders in long standing weekly sessions with actionable recommendations and insights to business approaches
Junior Data Scientist
From 2021 to 2022 at InCrowd Sports ltd, UK
- Developed classification modelling to predict season ticket and subscription renewals based on previous purchasing behaviour, digital engagement and match check ins for Crystal Palace - LTV, Churn
- Implemented engagement scoring models quantifying levels of fandom for gamification and marketing - Audience Segmentation
- Developed a client facing tool that quantifies co occurrence strength between user demographics and purchasing behaviour based on association rules (Lift) - Market Basket Analysis, Streamlit, Marketing Analytics
- Developed internal data quality and exploration tools to quantify completeness, nullity and cardinality, allowing AMs to communicate CRM best practices, highlight enrichment opportunities and flag relationships that can be used in prediction - Data Wrangling, QA
Junior Data Scientist - Part Time
From 2020 to 2021 at InCrowd Sports ltd, UK
- Developed article tagging and categorization to allow for promotion of player/club specific products - Topic Modelling, NER
Industrial Placement - Data Engineering and Analytics
From 2019 to 2020 at InCrowd Sports ltd, UK
- Took on multiple data roles, liaising with stakeholders, backend and app development teams.
- Delivered monthly and seasonal data reporting in a client facing environment.
- Data Modelling of raw Snowplow and GA4 events into insights ready tables - dbt
- Developed data sources to service dashboards using cross database using data lake - Dremio, PostgreSQL, Redshift, MySQL
- Developed dashboards highlighting KPIs for audience demographics, digital engagement, purchasing behaviour - Tableau
- Answered low turnaround times Ad-Hoc queries, liaising with AMs and PMs
- Developed tracking specifications for app devs and BI dashboard testing
Junior Research Associate - NLP
2019 at TAG Lab/CASM Consulting, UK
- Utilised state of the art transformer language models to establish text processing components powering client applications - BERT
- Keyword Extraction, Sentiment Analysis, Semantic Similarity modelling of reddit comments from r/changemyview
- Mapping word embeddings to a “spectrum” based on network clustering - Gephi
- Scraped Twitter and Reddit, extracting political arguments for parlia.com
- Contributed to "new paper lunches", where researchers present new interesting papers during a group lunch
MSc Data Science - Part Time, Distinction
2021 to 2024 at University of Sussex
A highly rigorous degree that establishes advanced mathematical principles integral to ML/AI. Including probability theory, statistical analysis, linear algebra, calculus, optimization techniques. For the dissertation, I developed methods capable of circumventing token limits in transformer based Language Models to service long document classification, not possible by pre GPT3.5 models.
Relevant Units:
- Statistical Inference
- Mathematics and Computational Methods for Complex Systems
- Data Analysis Techniques
- Algorithmic Data Science
- Wearable Technologies
BSc Hons Computer Science and Artificial Intelligence, 2.1
2017 to 2021 at University of Sussex
BCS accredited degree. Practical experience in Software Engineering and Machine Learning, alongside dedicated academic research in AI. Including Advanced Natural Language Engineering, Computer Vision and Neural Networks. For the dissertation, I developed a NLP toolkit to help writers quickly identify currently trending topics amongst Twitter communities, predict and maximise online engagement before publishing. The underlying methods link named entities in articles to trending tweets. This results in a dataset which allows for training a regression model, to predict the number of impressions from a potential new topic by examining a fresh stream of tweets. Titled “ALT: Article Library Toolkit”.
Relevant Units:
- Machine Learning
- Databases
- Software Engineering
- Advanced Natural Language Engineering
- Computer Vision
- Neural Networks
- Program Analysis
- Computer Security
- ML Libraries: PyTorch, TensorFlow
- SageMaker feature store and model registery
- MLFlow Expirement Tracking
- PyCaret, Streamlit
- AWS: S3, EC2, ECR, Redshift
- Google Cloud: GCS, BigQuery, Google Analytics 4
- CI/CD, Bitbucket Pipelines
- NLP: LLMs
- DBs: PostgresSQL, NoSQL: MongoDB
- Python
- OOP: Java
- Methodolgies: Agile, SCRUM, Kanban
- Git
- APIs
- English (native), German (conversational) and Arabic (native)
- Visualisation & BI tools: Tableau, Grafana, Data Studio, D3.js
- Web Scraping: Scrapy, Beautiful Soup
- Solidity Blockchain Smart Contracts
- C# and Unity
- Graph Visualisation: Gephi