A running list of things I build for myself at home. They are not products and nothing here is for sale. They started as separate experiments and have turned into one small system I use every day, so I will start with the two pages where I actually meet it, then work back through the parts behind them.
Dashboards
Two pages I open every day
Almost everything below funnels into two screens. The first is a morning brief I read with coffee. It composes the day's training call, my schedule, a news digest, and the weather into one page, so I am not opening five apps before I am awake. A line of it might read: "Lift day, push focus, 40 minutes. Three things on the calendar, first at nine. Two items worth reading. 58 and clear."
The second is a fitness page, the workout selector's home, where the day's recommendation sits next to the reasoning that produced it, so I can argue with it instead of just following it.
Building the parts separately and composing them late turned out to be the right order. Each system stays small, testable, and replaceable on its own. The two pages hold no logic of their own, they just read from the parts and arrange them, so when I want to change what my morning looks like, I change the arrangement, not the machinery. Here are the parts.
Recommendations that have to earn it
Learns from passive behavior
Built with Python and SQLite, reading from Plex, Sonarr, TMDb, and Tautulli, with Claude doing the scoring.
Most recommenders treat every suggestion as a one-shot guess. I wanted one that earns trust over time, and only acts on its own once it has proven its picks were worth my time.
How it works
The taste profile is my existing library, which already encodes years of decisions about what is worth keeping. Candidates come from TMDb, a far larger catalog than I would ever browse. Claude scores each candidate against the profile from 1 to 10 and writes a one-line reason.
What it does with the score is the real design. Each source of recommendations climbs a trust ladder, trial to active to monitoring, and only earns the right to auto-add to my library once its earlier picks have actually been watched to the end. The completion signal comes from Tautulli, the playback layer, not from what I added to a list and never opened, because adding is aspirational and finishing is honest.
Example
A scored candidate comes back like this. "Slow Horses, 8 of 10. Matches your pull toward dry, low-glamour spy drama; nearest neighbor in your library is The Day of the Jackal. Holding on trial until something of its kind gets watched."
A score is a guess about one title. A ladder is a judgment about a source over time, and that is the more honest thing to let a machine act on by itself.
Teaching it by correcting it
Learns from human correction
Built with Python and SQLite, one Haiku call per new event, learning from a table of my own corrections.
Every calendar has events that mean something a machine cannot read off the title. I built a layer that turns each new event into structured meaning, and the interesting part is not the first guess, it is what happens when the guess is wrong.
How it works
Each new event gets one Haiku call that returns structured fields plus a confidence score between 0 and 1, and that score does real work. Above 0.85 it resolves on its own. Between 0.6 and 0.85 it waits for me to confirm or fix. Below 0.6 it is still saved and flagged, never dropped, because a wrong guess I can correct beats an event I never see.
When I do fix one, the correction is stored, and recent corrections get fed back into the next Haiku call as examples, so it adapts to my patterns rather than getting generically better. Recurring events inherit the resolution through a stable identifier on the series, so the model runs once per series instead of once per occurrence.
Example
In practice a recurring event that keeps getting mistagged needs only one correction. The fix propagates to the whole series, and it joins the handful of examples the model sees on the next call, so that shape of event stops getting it wrong.
The cheapest way to make a model fit your world is to let it be wrong, correct it, and have it remember. It is the same human-in-the-loop pattern that makes AI trustworthy in work that actually matters.
Knowing your own normal
Learns from a personal baseline
Built with Python and SQLite, pulling from Apple Health, Strava, and Hevy.
Most health alerts fire on absolute thresholds, a number some population study decided was high or low. That is noisy, because my normal is not the population's normal. This system learns my baseline from my own recent data and only speaks up when a change is real.
How it works
For each metric it keeps a rolling 28-day window of my own history and asks whether today sits inside or outside my normal. The output is one of four tiers, normal, elevated, possibly fighting something, and likely sick, so the rest of my setup can act on a single signal instead of a wall of numbers.
Most of the work was in the gates, not the model. A reading has to stay off for about four hours before it counts, which kills single-bad-reading noise. An elevated heart rate right after a hard workout is explained away by that workout's own intensity score rather than read as illness. A drop in blood oxygen is the one signal allowed to escalate the tier quickly, because there a real change should not wait.
Example
On a normal day it just says normal and stays quiet. After two short nights with a resting heart rate drifting up, it moves to "possibly fighting something" and quietly tells the workout selector to ease off before I have noticed anything myself.
Personal baselines beat population thresholds for anything that varies person to person, and the gate that turns a reading into an alert matters more than the reading.
The workout that fits the day
Reasons across the whole picture
Built with Python and SQLite, one Claude call a day over a packet the system assembles first.
Deciding what to train on a given day is a negotiation, not a question. I have running goals and lifting goals competing for the same recovery, and some days the honest answer is to rest. The inputs that should drive that call are scattered across everything else here.
How it works
Before any model runs, plain queries assemble a tight packet: the last seven days of training summarized, an acute-versus-chronic load read, the current health tier from the system above, the day's weather, the equipment on hand, and how far I am from each goal. That packet, not raw data, is all Claude sees. It is small and factual, which keeps the reasoning grounded and the cost near nothing.
The two-level fatigue read is the input I lean on most. Acute load catches the cooked-today case. Chronic load catches the opposite failure, ramping up too fast or letting fitness slide, which a single day never shows. Competing goals are handed over as inputs, not rules, so a goal can bend around a bad week instead of a rule breaking the first time I am sick.
Example
A morning reads. "Lift, push focus, 40 minutes. You ran hard twice this week and your legs need a day. Rain clears tonight, so save the easy run for tomorrow, when the half-marathon block actually needs it."
The hard part is never a single signal, it is holding them against each other. A formula adds numbers up. Deciding that tired legs and a wet forecast outweigh a plan that says run today is judgment, and that is the part worth handing to a model.
A morning brief that filters itself
Reads the firehose so I don't have to
Built with Python, FastAPI, and SQLite, with two Claude models doing different jobs.
I follow six legal and AI sources. On a busy week that is well over a hundred headlines, and most of them do not matter to me in particular. So a pipeline reads them every couple of hours, scores each one against my actual work, and leaves a few lines on my morning page.
How it works
Plain string filters drop the obvious junk before any model runs, sponsored posts, duplicates, anything published before the last fetch. Noticing an ad is not worth a model call.
What survives gets scored by Haiku, the small fast model, on two axes from 1 to 10. Novelty is measured against what I have already seen this week, so yesterday's story rewritten drops out. Relevance is measured against me in particular: my work running knowledge and innovation at a law firm, the legal industry I sit in, the peer firms I keep an eye on, and the specific products, vendors, and models my teams might actually touch. A generic AI headline scores middling; the same development attached to a tool we use, or a move by a firm in our space, scores high. Once a day Sonnet, the slower model, writes a two or three line brief over the dozen or so top-scoring items, the only expensive call in the whole thing.
Those relevance rules live in a config file, not in the model and not in code. The file is where my world is written down, the topics, the peer firms, the vendors and models worth flagging. That is also what makes it easy to retune: adding a source or changing what counts as signal is a one-line edit, not a deploy.
Example
A morning line might read. "Two firms published client-facing GenAI policies this week; one ties model use to its existing outside-counsel guidelines, which is the angle worth watching." That is the entire interaction. No feed, no inbox, no skimming.
The whole pipeline costs a couple of dollars a month. The pattern under it is the one I reuse everywhere: cheap code throws out the obvious, a cheap model sorts at volume, and the expensive model is spent only on the final synthesis.