๐ Hi, Iโm Ameerah
Always learning, always becoming.
About Me โจ
Welcome to my data portfolio!
Iโm a Statistics graduate with a specialization in Big Data Analytics, passionate about turning raw data into meaningful narratives. Whether itโs modeling inflation trends, exploring public sentiment, or visualizing patterns in music and crime, I enjoy solving real-world problems through data.
Iโm especially drawn to:
- ๐ Data storytelling that brings insights to life
- ๐ง Statistical modeling for forecasts and decisions
- ๐ก Process improvement powered by evidence
- ๐ Exploring the human side of data through sentiment & behavioral analysis
I have a growing interest in data science, particularly in applying machine learning to understand complex systems and drive better decisions. Iโm constantly exploring new tools and techniques to deepen my skillset.
My current toolkit includes R, Python, SQL, Power BI, and Markdown/HTML. I believe in clean visuals, reproducible work, and making data accessible to all.
Featured Projects ๐๐๐ค
๐บ YouTube Global Statistics Report
What drives subscriber growth on the worldโs biggest platform?
Click to expand project details
An interactive R Markdown analysis exploring patterns in subscriber count, content category, and geography โ powered by bootstrap regression and statistical testing.
๐ Featured Visual:
Highlights:
- ๐ถ Entertainment and Music lead in both views and subscribers.
- ๐ More uploads โ more subscribers โ the relationship is weak and nonlinear.
- ๐ U.S. and India dominate the top channel landscape.
๐ View GitHub Repo ๐ Read the Report on RPubs
๐จ Boston Public Safety Dashboard
How can public crime data reveal safer paths forward?
Click to expand project details
An interactive dashboard project analyzing over 50,000 Boston crime records using Power BI and Excel. Focused on uncovering crime hotspots, temporal patterns, and offense trends to support public safety decisions.
๐ Featured Visual:
Highlights:
- ๐ Crimes are most frequent between 10 AM โ 10 PM, peaking around 4 PM.
- ๐ Common offenses include larceny, assault, and drug violations.
- ๐บ๏ธ Maps and filters help identify high-risk neighborhoods and time zones.
๐ง Spotify Streaming Analysis 2024
Can streaming features help predict whether a song is explicit?
Click to expand project details
This project analyzes Spotifyโs 2024 streaming data to classify tracks as explicit or not. The analysis involved data cleaning, EDA, and applying supervised learning models โ all performed in Python.
๐ Featured Visual:
Highlights:
- โ๏ธ Addressed class imbalance using Balanced Random Forest and baseline Logistic Regression.
- ๐ Included ROC curve evaluation and comparison between models.
- ๐งน Comprehensive preprocessing and EDA to understand feature distributions.
๐ CPI Forecasting with ARIMA (Python)
How can past inflation data help us predict future trends?
Click to expand project details
An end-to-end time series modeling project using manual ARIMA tuning and backward stepwise regression to forecast Malaysiaโs Consumer Price Index (CPI). The final model was evaluated against actual CPI values from Sep 2023โAug 2024.
๐ Featured Visual:
Highlights:
- ๐ Built ARIMA(1,1,2) model manually with statistical testing (ADF, KPSS).
- ๐ Included backward stepwise regression with lag features to optimize prediction.
- ๐ Compared forecast results against actual CPI for accuracy assessment.
๐ฌ Reddit Sentiment Analysis: Hump Day Yay or Nay?
Should Wednesdays be the new weekend?
Click to expand project details
A lighthearted NLP project exploring Reddit sentiment toward a midweek break. Using over 1,100 comments from a viral post on r/unpopularopinion, the analysis combines scraping, sentiment scoring, and quirky word clouds to uncover what the internet really thinks of #MidweekReset.
๐ Featured Visual:
Highlights:
- ๐งผ Full NLP pipeline: text cleaning, tokenizing, stopword removal.
- ๐ง VADER sentiment scoring with negation-aware processing.
- ๐ Violin plots and word clouds reveal emotional tones and common themes.
- ๐ Top-voted comments skewed positive or neutral, supporting midweek reset.
โพ MLB SQL Analysis
Exploring player performance with window functions and CTEs
Click to expand project details
A structured SQL and Tableau project analyzing Major League Baseball (MLB) player data from Maven Analytics. The SQL portion focuses on player origins, salaries, and career spans using CTEs, window functions, and aggregation techniques. Tableau brings the findings to life through interactive visuals.
๐ Featured Visual:
Highlights:
- ๐งฎ Applied
ROW_NUMBER
,NTILE
, andLAG
to analyze debut/final games and career lengths. - ๐ซ Identified top schools producing high-earning, long-career MLB players.
- ๐ธ Explored team salary distributions and milestone thresholds ($1B+ in salary spend).
- ๐ Built a Tableau dashboard featuring:
- Treemap of top 10 salary-producing schools
- Funnel from schools to $5M+ players
- Scatter plot of career span vs. school salary
- Choropleth map of player origins by state
- Bar chart of top schools by career longevity
Reach Me ๐ง
โ Thank You
Appreciate you stopping by!
Whether itโs building models or making sense of meme sentiment, I love bringing data to life โ usually with a strong coffee in hand.
Letโs chat if youโre into dashboards, storytelling, or data that makes you go โhuh, interesting.โ
Chasing patterns, questions, and great coffee,
โ Ameerah