Projects

This section is where theory meets practice. Each project here represents a different slice of what I bring to data work: collecting messy data from the web, building and comparing machine learning models, applying cutting-edge NLP techniques, and making complex methods accessible to others. These are educational reports and dashboards to showcase approach and results—not production applications—and underlying data are not shared; the focus is on methodology, visualizations, and insights.

You’ll see a mix of techniques: classic machine learning, modern transformer models, web scraping, and interactive visualizations. The thread that runs through all of them is the same: turning raw data into understandable insights that people can actually use.

What we can learn about runners from Strava

I scraped 1,000+ Strava activities from the 2025 Boston Marathon to analyze what gear runners wear and whether it correlates with performance. Using Selenium to navigate dynamic webpages and wrestling with messy user-generated data, I dove into the world of running watches and shoes. Spoiler alert: Garmin dominates the watch market, Nike rules the shoe game among faster runners, and despite what the marketing says, there’s no statistically significant evidence that shoe brand affects how hard you work during a marathon. READ MORE

A local RAG assistant for teaching materials

To help students navigate a semester’s worth of Computational Social Science materials – i.e., lecture notes and topic-specific notebooks – I built a Retrieval-Augmented Generation (RAG) system that answers questions strictly based on the course resources. Using the ragnar package in R, I convert documents to markdown, chunk them along headings, and store them in a DuckDB-backed vector index with embeddings from a local nomic-embed-text model. Students can then ask questions and get answers based on the course resources using either a local Ollama model (qwen2.5:3b) or the Claude API (granted they locally provide an Anthropic API key or set it as an environment variable). To make interactions more natural, I implemented this chatbot in a simple RShiny app with a conversational UI. The R package runs fully locally, comes with my teaching materials pre-installed and is available on GitHub. Further, options to use your own materials are provided (this functionality is still experimental and may not work for all use cases). READ MORE