The term "slop" is rapidly entering the conventional English lexicon. In fact, it was Merriam Webster's Word Of the Year for 2025. It refers to low-quality media generated by an LLM. It can be applied to images, text, music, or most relevantly to us, code.
As an open source maintainer, I frequently both receive and create pull requests that could be labeled as slop. One of my principal jobs is to stand at the gate to the Harper repository and say, "thou shalt not pass" to any low-quality or buggy code that wishes to enter. So my question becomes, how can I turn AI generated code from "slop" tier to "top tier"?
There are a gazillion guides out there about the various techniques you can use to improve the quality of AI generated code. Those techniques will change, and I'm sure my readership already knows all about them. Instead, I want to talk about some more generally applicable ideas that I've found especially useful in the age of AI. I'm something of a contrarian, so expect some dishing on conventional "vibe coders".
I am often sent stories of developers discovering Claude Code for the first time and using it to build some simple CRUD app from scratch, without ever reading or writing a line of code themselves. Those are impressive stories, and they're a sign of the amazing progress that the nation's frontier AI labs have made in the last few months. They are not, however, examples of how an open-source maintainer should operate.
As I said before, it's critical that maintainer act as a gate that allows or denies code from entering the repository (and thus being ultimately delivered to user's devices). In order to do that, they need to have a good understanding of how the code already works. Then, and only then, should they allow modifications to it.
This should be pretty obvious advice to most in our industry, but evidently it isn't. I've seen several instances recently of developers vibe-coding a new feature without some existing foundational understanding of how the original code worked. The result: the new feature works well enough, but its implementation breaks some other part of the application. That's not to mention that it increases the apps complexity on the whole far more than necessary.
Having a foundational understanding from the get-go is an easy way to prevent this kind of tragic outcome.
LLMs are not currently capable enough to recognize when a pattern in the code they're writing already exists in the codebase.
I've tried pushing them in the right direction with skills and with an AGENTS.md, and they'll pick it up given enough massaging, but the fact is that they still need to be poked.
If I didn't have a good knowledge of my project's internals, I wouldn't know to poke them at all.
Understand your project deeply. Understand its code, its values, and its purpose.
Now that you're an expert on how your project works, you're ready to start reviewing AI-written pull requests. Here's the thing you need to remember: you have better taste than the clanker does. You ultimately get to decide what code gets merged. What should that code look like? How should it be tested? Your answer to those questions should guide you towards prompting a model to revise its work or submitting feedback on a pull request opened by a human.
Here's what your prompt should not look like:
Obviously, you wouldn't want feedback like that if you were the one writing the code. If you're looking at a patch, whoever wrote it did the best job they could. Whether it was an LLM or a human doesn't matter. If the code isn't up to your refined taste as a human, you need to give them a nudge in the right direction. Here are some better versions of the prompts from before:
For those unaware, you should never say "I don't like that" to human's work.
It's easy to get the impression that with enough tokens, anything is possible. Maybe that will be true in the future, but I don't think that's true today.
LLMs are limited in intelligence and experience. Even when provided access to the best coding setups in the world, they are still incapable of shipping many (most?) features or fixing many (most?) bugs. The fact that I have to include question marks is a testament to how far we've come, but it's important not to overstate things. If you're reading this, you're pretty smart. It's fine to let the model try to solve a problem for you once or twice, but if it still fails, you'll need to get in there and do it yourself. Don't be afraid to get your hands dirty.
This post was proofread by Harper.
Failing to account for this reality can slow down development and dissuade contributors from sticking around.
Back in my day, we used math for autocomplete.
Writing is one of life's greater joys. It's a mental workout that often brings me a level of clarity that is hard to find elsewhere.