Resume
A Dynamic and results-driven Data Scientist with expertise in designing and implementing machine learning models, advanced analytics, and data-driven strategies. Proven ability to translate complex datasets into actionable insights through innovative solutions in predictive modeling, audience segmentation and natural language processing. Skilled in leveraging statistical analysis, AI techniques, and cloud platforms to drive business growth and optimize decision-making. Adept at collaborating with cross-functional teams, communicating technical findings to stakeholders, and fostering a data-centric culture. Passionate about delivering impactful solutions that bridge the gap between data and business value.
Experience

Data Engineer

From 2024 to Present at InCrowd Sports ltd, UK

  • Developed and deployed ETL Prefect flows powering 9 siloed fan data platforms (FDP) servicing 15 clients including Arsenal, RFL and Euroleague - Python, Prefect, Docker, Kubernetes, EC2
  • Scoped, developed and unit tested 3rd party data sync integrations with established platforms such as Salesforce, Flowcode, DotDigital, Onesignal, SeatGeek, VenueMaster, Experian and Adobe Magneto. Syncing >100M records monthly.
  • Collaborated on data warehouse modelling to consolidate user data, enabling in-house FDP SaaS such as Audience Builder, Fan Manager, Campaigns, Promo Assets and Single Customer View - dbt, PostgreSQL, CRM
  • Led the monitoring and maintenance of 260 Prefect flows responsible for all ETL pipelines, ensuring a 97% success rate across > 5,000 daily jobs - Site Reliability
  • Deployed 100s of feature requests across 30 clients with varying requirements such as regular SFTP drops, seasonal reporting and BI logic changes.
  • Complied with GDPR, developing Right To Removal and Subject Access Request flows.

Data Scientist

From 2022 to 2024 at InCrowd Sports ltd, UK

  • Co-Developed a campaign optimization pipeline servicing IC’s audience builder for retail, ticketing, memberships and content promotion, leading to a 10% improvement in click through rates - Advanced Targeting, A/B Testing, Marketing ROI
  • Produced AWS cloud cost forecasting based on cloudfront usage for 30 clients across 100s of microservices, resulting in a 30% cloud cost reduction, more accurate billing and better profit margins - Statistical Analysis, Pricing Models
  • Led the data science team to develop best practices and tooling including experiment structure, CI/CD and testing - MLflow, PyCaret, SageMaker Feature Store
  • Implemented a Text2SQL self service tool to allow non technical staff to query databases and gather analytics independently, slashing manual analyst tasks by 90% - LLMs
  • Communicated findings to stakeholders in long standing weekly sessions with actionable recommendations and insights to business approaches

Junior Data Scientist

From 2021 to 2022 at InCrowd Sports ltd, UK

  • Developed classification modelling to predict season ticket and subscription renewals based on previous purchasing behaviour, digital engagement and match check ins for Crystal Palace - LTV, Churn
  • Implemented engagement scoring models quantifying levels of fandom for gamification and marketing - Audience Segmentation
  • Developed a client facing tool that quantifies co occurrence strength between user demographics and purchasing behaviour based on association rules (Lift) - Market Basket Analysis, Streamlit, Marketing Analytics
  • Developed internal data quality and exploration tools to quantify completeness, nullity and cardinality, allowing AMs to communicate CRM best practices, highlight enrichment opportunities and flag relationships that can be used in prediction - Data Wrangling, QA

Junior Data Scientist - Part Time

From 2020 to 2021 at InCrowd Sports ltd, UK

  • Developed article tagging and categorization to allow for promotion of player/club specific products - Topic Modelling, NER

Industrial Placement - Data Engineering and Analytics

From 2019 to 2020 at InCrowd Sports ltd, UK

  • Took on multiple data roles, liaising with stakeholders, backend and app development teams.
  • Delivered monthly and seasonal data reporting in a client facing environment.
  • Data Modelling of raw Snowplow and GA4 events into insights ready tables - dbt
  • Developed data sources to service dashboards using cross database using data lake - Dremio, PostgreSQL, Redshift, MySQL
  • Developed dashboards highlighting KPIs for audience demographics, digital engagement, purchasing behaviour - Tableau
  • Answered low turnaround times Ad-Hoc queries, liaising with AMs and PMs
  • Developed tracking specifications for app devs and BI dashboard testing

Junior Research Associate - NLP

2019 at TAG Lab/CASM Consulting, UK

  • Utilised state of the art transformer language models to establish text processing components powering client applications - BERT
  • Keyword Extraction, Sentiment Analysis, Semantic Similarity modelling of reddit comments from r/changemyview
  • Mapping word embeddings to a “spectrum” based on network clustering - Gephi
  • Scraped Twitter and Reddit, extracting political arguments for parlia.com
  • Contributed to "new paper lunches", where researchers present new interesting papers during a group lunch

Education

MSc Data Science - Part Time, Distinction

2021 to 2024 at University of Sussex

A highly rigorous degree that establishes advanced mathematical principles integral to ML/AI. Including probability theory, statistical analysis, linear algebra, calculus, optimization techniques. For the dissertation, I developed methods capable of circumventing token limits in transformer based Language Models to service long document classification, not possible by pre GPT3.5 models.

Relevant Units:

  • Statistical Inference
  • Mathematics and Computational Methods for Complex Systems
  • Data Analysis Techniques
  • Algorithmic Data Science
  • Wearable Technologies

BSc Hons Computer Science and Artificial Intelligence, 2.1

2017 to 2021 at University of Sussex

BCS accredited degree. Practical experience in Software Engineering and Machine Learning, alongside dedicated academic research in AI. Including Advanced Natural Language Engineering, Computer Vision and Neural Networks. For the dissertation, I developed a NLP toolkit to help writers quickly identify currently trending topics amongst Twitter communities, predict and maximise online engagement before publishing. The underlying methods link named entities in articles to trending tweets. This results in a dataset which allows for training a regression model, to predict the number of impressions from a potential new topic by examining a fresh stream of tweets. Titled “ALT: Article Library Toolkit”.

Relevant Units:

  • Machine Learning
  • Databases
  • Software Engineering
  • Advanced Natural Language Engineering
  • Computer Vision
  • Neural Networks
  • Program Analysis
  • Computer Security

Certifications
Relevant Skills
  • ML Libraries: PyTorch, TensorFlow
  • SageMaker feature store and model registery
  • MLFlow Expirement Tracking
  • PyCaret, Streamlit
  • AWS: S3, EC2, ECR, Redshift
  • Google Cloud: GCS, BigQuery, Google Analytics 4
  • CI/CD, Bitbucket Pipelines
  • NLP: LLMs
  • DBs: PostgresSQL, NoSQL: MongoDB
  • Python
  • OOP: Java
  • Methodolgies: Agile, SCRUM, Kanban
  • Git
  • APIs

Other Skills
  • English (native), German (conversational) and Arabic (native)
  • Visualisation & BI tools: Tableau, Grafana, Data Studio, D3.js
  • Web Scraping: Scrapy, Beautiful Soup
  • Solidity Blockchain Smart Contracts
  • C# and Unity
  • Graph Visualisation: Gephi
youssef.one