Selected Work
The following case studies represent a sample of applied analytical work across consulting engagements, independent research, and industry projects. Each project applied rigorous quantitative methods to a concrete problem – from organizational strategy to market intelligence to NLP systems. Where applicable, detailed write-ups are linked below. Client-confidential engagements are described at the methodological level only.
Organizational Performance & Growth Strategy
Context: A U.S.-based professional education company sought to identify underperforming course offerings and develop a data-driven strategy for curriculum and marketing investment.
Approach: Developed interactive dashboards aggregating enrollment, revenue, and engagement data to surface portfolio-level performance patterns. Designed targeted advertising campaigns using social media analytics and web traffic data. Conducted market analysis to identify gaps in AI-focused curriculum.
Result: Informed strategic restructuring of the course portfolio and delivered an AI curriculum strategy that yielded five new course offerings. Marketing campaigns resulted in measurable enrollment increases.
Methods: RShiny dashboards, Google Analytics, social media analytics, performance benchmarking
Multilingual Survey Response Classification
Context: A Scandinavian research institution required automated classification of open-ended survey responses collected across four languages to enable large-scale comparative analysis.
Approach: Designed and implemented NLP classification pipelines using local LLMs and sBERT-based clustering. Validated automated outputs against expert-coded benchmarks using standard inter-rater reliability metrics.
Result: Achieved human-level classification accuracy (F1: 0.91–0.98) across all four languages. Delivered reproducible analysis code and publication-ready visualizations. Co-authored two manuscripts currently under peer review.
Methods: sBERT, LLM inference, multilingual NLP, F1 validation, Python, R
RAG-Based Course Assistant for Higher Education
Context: Students navigating large volumes of course materials – lecture notes, readings, notebooks – face a common challenge: finding relevant content quickly without re-reading everything. The same problem applies broadly across higher education, wherever instructors maintain substantial document repositories that students are expected to query and synthesize.
Approach: Designed and developed teachrag, an R package implementing a Retrieval-Augmented Generation (RAG) system that enables natural-language querying of course materials. The system converts documents to markdown, chunks them along headings, and indexes them in a DuckDB-backed vector store using local embeddings. A conversational RShiny interface allows students to ask questions and receive answers grounded strictly in course content, via either a local Ollama model or the Anthropic Claude API. The package ships with my own teaching materials pre-installed and supports custom document ingestion, making it readily adaptable by other instructors.
Result: Deployed for students in a graduate Computational Social Science course. Packaged and released as an open-source R package to make the approach reproducible and accessible to educators beyond my own courses.
Methods: RAG architecture, vector search, local LLM inference, RShiny, R package development
Industrial Sector Classification & Certification Verification
Context: A European client required a clean, structured database of industrial companies categorized by sector, with ISO 50001 certification status verified at scale – a task too large for manual review alone.
Approach: Developed an LLM-based classification and verification pipeline combining regex-based pre-filtering, LLM inference for sector assignment, and structured quality-control workflows. Designed a hybrid automated/manual review process to maximize accuracy on high-stakes classification decisions.
Result: Delivered a quality-controlled, structured database ready for client decision-making. The pipeline reduced manual review burden while maintaining verification accuracy on certification status.
Methods: LLM pipelines, regex filtering, quality-control workflows, Python
GitHub: perplexicaR → · Client details otherwise confidential.
Job Market Intelligence Analysis
Context: What does the current demand landscape for quantitative and data science skills actually look like – at the level of specific methods, tools, and experience requirements?
Approach: Analyzed 10,000+ data science and analytics job postings using NLP-based skill extraction to identify demand patterns by role type, industry, and geography. Applied Poisson regression to model skill co-occurrence and relative demand intensity.
Result: Identified distinct skill clusters across role types, quantified relative demand for specific methodologies (causal inference, NLP, ML), and mapped salary distribution patterns across markets.
Methods: NLP skill extraction, Poisson regression, text mining, data visualization
Full write-up forthcoming.