I build production LLM systems — and small tools for my own life.
Small tools I build for my own life. Each one started because something annoyed me enough to write code about it.
Schema migrations have versioning, rollback, and a paper trail. Most prompts in production don't. What changed for me when I started treating them the same way, and where it broke first.
Halfway into a thesis benchmark on 68k bank transactions, the classical baseline is uncomfortably close to BETO. The catch isn't accuracy. It's what you pay per inference, multiplied by traffic you actually have.
Infrastructure traces tell you the request landed. Quality metrics tell you the answer was wrong. Until both share an ID, you're guessing. A pattern set from running this on call, and the one thing I'd build first if I had to start over.
"Tool calling with extra steps" is the wrong framing. The shift is smaller and weirder: the editor reads our Jira, our repo, our traces, and starts answering the questions I used to walk over to a teammate for.
I'm a Senior AI Platform Engineer in El Salvador. I work from the same desk where I started in cloud infrastructure eight years ago. The systems have grown, the teams behind them too — but the principles haven't. Only the layer I apply them to.
I care about systems you can operate, not just demo. About prompts that fail loudly. About observability that survives the third on-call rotation. About cost lines that don't blow up the first time a feature gets traction.
Outside of paid work, I build small tools for my own life — usually with the same disciplines, because I don't know how to build software any other way.