Project Playground
Things I’ve built because they seemed fun and possibly useful
Throughout my entire educational and professional career, I have built skills and expertise through play, and I’ve built tools because they are enjoyable to build. I would say this is a core value, but it’s also an indelible imperative for me.
Over the past few years I’ve been playing a lot. Each project below represents a joyful exploration of topics and techniques that caught my attention.
Across these projects, several themes emerge:
- Curiosity as methodology: Every project starts from a real question—where can my sister go sledding near Detroit? What does inequality look like at scale? How do writers develop narrative?—and the approach is to build something that answers it.
- Making complexity legible: Whether separating individual from organizational variation, decomposing time series into seasonal components, or building interactive visualizations of wealth dynamics—taking messy, complex phenomena and finding structure.
- Rigorous methods in unexpected places: Hierarchical models for software teams, measurement theory for wearable data, beam search for image prompts, agent architectures for storytelling. Research methodology travels further than people expect.
- From analysis to tool: Projects tend to evolve from one-off explorations into reusable systems: a Python library for terrain visualization, a Docker image for reproducible Bayesian computing, a pipeline for iterative image generation, an organization system for 100k+ photos.
- Generative thinking that is serious about uncertainty: Bayesian models, agent-based simulations, stochastic processes, beam search, LLM internals—a comfort with uncertainty and a preference for generative models that can be interrogated rather than black-box predictions.
Statistical Modeling & Analysis
PhD Unemployment Model
Bayesian time series analysis of PhD unemployment rates relative to general unemployment and other graduate degree holders, using Current Population Survey microdata from IPUMS USA.
Why and what: At the top of the second Trump administration, there was a lot of economic upheaval. My research team at Pluralsight was a casualty of the increasing cost of money in tech, and general economic uncertainty. Meanwhile, my colleagues at Harvard and other institutions of higher education were having their funding frozen or rescinded entirely. I wanted to see this play out in the unemployment numbers and understand if the anecdotes were plentiful enough to notice at a national scale. The project involves Bayesian time series decomposition (GAMs, Gaussian processes, autoregressive structures), handling large survey datasets (40M+ rows with data.table and fst for fast I/O), and applying test-driven development to statistical modeling workflows. It uses Stan via brms and cmdstanr for principled uncertainty quantification across multiple seasonal components, trends, and economic cycles. I learned that for as flexible as GAMs are, they don’t handle big shifts in smoothness well, and I was reminded that they don’t lend themselves to easy mechanistic explanations.
Garmin Health Data Analysis
Bayesian analysis of personal wearable health data from Garmin, modeling trends and cross-metric relationships in weight, sleep, activity, and heart rate.
Why and what: Thanks to inspiration from several friends who lift (many of them women in academia, and my 80+ year old father), I’ve been working out. After developing the state-space model for the unemployment data, I wanted to see if I could model my personal data using similar methods, so I built a progression of Bayesian models—Gaussian processes, cross-lagged models, and state-space models—for noisy health time series (weight, sleep, activity, heart rate). As a developmental psychologist, what stood out to me about these data is how much I had to think about the time-lags at which processes unfold. I was reminded that this is an underappreciated point for many observational designs in psychology and beyond.
Research Publications & Infrastructure
No Silver Bullets: Software Cycle Time
Repository for the peer-reviewed paper “No Silver Bullets: Why Understanding Software Cycle Time is Messy, Not Magic” (Flournoy, Lee, Wu, & Hicks, 2025; doi:10.1007/s10664-025-10735-w). Published in Empirical Software Engineering. Analyzes 55,000+ observations across 216 organizations using Bayesian hierarchical modeling to separate individual and organizational variation in software development velocity.
Why and what: At Pluralsight, I had access to a large dataset of software development activity across hundreds of organizations. The conventional wisdom in developer productivity is full of confident claims—particular tools, practices, or team structures that supposedly unlock dramatic improvements. I wanted to see what the data actually said when you modeled it carefully, especially by using a legitimate distribution for outcome measures that have very non-gaussian forms. Using Bayesian hierarchical models, I could separate individual-level variation from organizational-level variation and estimate the effects of common workplace factors while properly accounting for the nested, noisy structure of the data.
Key finding: Common workplace factors have precise but modest effects on cycle time, set against considerable unexplained variation—suggesting systems-level thinking rather than individual-focused interventions.
verse-cmdstan
Docker container extending rocker/verse with pre-compiled CmdStan (and other related tools) for reproducible Bayesian computing environments ready for gorgeous Quarto reports.
Why and what: Getting the full R + Stan ecosystem working—the right compiler toolchain, C++ libraries, R packages, TeX, and CmdStan all playing nicely together—requires a lot of careful configuration or coordination with IT on shared academic infrastructure like SLURM clusters where users often lack root access. A pre-built Docker image (which can be used in Singularity/Apptainer contexts as well) sidesteps all of this: you get a known-good environment that runs identically on a laptop, a CI server, or a university HPC cluster, making Bayesian analyses genuinely reproducible across collaborators and compute environments.
Simulation & Computational Modeling
Wealth Stratification Agent-Based Model
An agent-based model demonstrating how wealth inequality persists across generations through environmental factors, genetic potential, assortative mating, and stochastic events. How low can you get the Gini coefficient to go?
Why and what: Economic inequality is a major problem in the USA and in the world. At the same time, meritocracy is a fairly uncontroversial system of reward that many strive to perfect. How does our understanding of rewards for merit, and the causal sources of “merit”, lead toward or away from economic inequality? I built an interactive browser-based simulation to explore how these dynamics play out: agents inherit alleles, receive environmental endowments, accumulate wealth, and mate assortatively across generations. The model uses log-normal + Pareto wealth distributions and tracks Gini coefficients as you adjust the levers. The next major update will give it a more realistic underlying economic model.
Understanding LLMs
A hands-on exploration of how language models actually work—not just how to use them, but how they represent and process information internally. The goal is to build up a simple BERT-ish model from the ground up. As a side-effect, I have a demo of a Claude Code tutoring environment.
Why and what: Using LLMs to do work for you is old hat. Learning something new by making them make you do work for them is the new frontier. I wanted a Claude Code set up that would force me to learn. And I wanted to learn exactly how these fancy new next-token-prediction models work. So far I’ve re-learned everything I should have remembered from my UC Berkeley AI class back when neural networks were basically toy examples of a sort-of-abandoned method (and now I suspect we’re coming back around to trying to instantiate expertise more directly). The experience of being tutored by an LLM has been really useful and it was trivially easy to set up.
Data Visualization & Geospatial Tools
Terrain Maker v2
A Python library for creating custom terrain visualizations from Digital Elevation Models with Blender 3D rendering, supporting arbitrary spatial data overlays and multi-view rendering from any cardinal direction.
Why and what: What? Maps. What? MAPS! Mmmmmm, maps. I love maps. I love to see how landscapes I know well flow into one another and are situated in a context that helps make them what they are. This has been a long-time hobby. I started just playing with data in QGIS and then bringing it into blender. Then I realized I could make python do my manual labor. Now it’s a python library. Geospatial data processing (SRTM tiles, GeoTIFF, reprojection, resampling), programmatic 3D rendering with Blender’s Python API, and building reusable scientific visualization libraries with smart caching. The Detroit example generates 1.3M-vertex terrain meshes with color-mapped data overlays.
AI/ML & Generative Systems
Image Organizer
AI-powered image organization system for large local photo collections (100k+ images), with automatic tagging via multiple ML models, RAW file support, face and theme clustering with visualizations, and semantic search.
Why and what: I’ve been taking photographs, in a serious way, since I picked up my dad’s old SLR in high school. I was even paid to take pictures at some point. I have a lot of them and I have enough knowledge of various ML models to make the computer organize them for me. Existing tools were fine but didn’t lean into the unsupervised learning opportunities enough. So I started working on this.
Image Generation Pipeline v2
A Node.js automated prompt refinement system for image generation, implementing beam search—a standard ML algorithm for exploring multiple promising paths simultaneously—applied to iterative prompt refinement for AI image generation.
Why and what: If you’ve ever played with image generation in the Midjourney or OpenAI ecosystem, you’ve probably had the experience of iteratively refining your prompts. I wanted to see if I could get LLMs to do some of this work for me. So I built a beam search to take a simple input prompt, expand it into a “what” description and a “how it should look” description and then recombined it. Each generation of images (say, 4 at a time) are fed into a tournament ranking system to choose the top N images. Those top N prompts are then refined based on a vision-language model’s feedback and generate the new generation. Fun and easy! I got to think about token security for a public-facing web app, session persistence, and UI design in addition to the above.
Story Time
A web-based tool for writers to expand, refine, revise, and restructure narratives using locally-hosted language models.
Why and what: This is my most nascent project. I used to write terrible poetry and marginally better short stories. As a reader, I’m intrigued by writers’ processes of developing a narrative. Playing around with online and local LLMs, I noticed that a single prompt and response, even when chained (e.g., pick up this story from where we left off) usually loses the narrative. So I started playing around with “outside-in” outline expansion. That is, start with the top level narrative points, then have an LLM expand each of those into an inner set of bullets—for each of those, expand into a more detailed set (keeping track of the next major point), and so on That seemed to work a bit better. Not satisfied, and inspired by a conversation with a friend about agent swarms, I realized it might be fun to have agents play characters with an orchestrator who has knowledge of the overall narrative mission and where agents have access to limited context of the world they can access stored in something like a ChromaDB. Very much a WIP.