Thoughts on slowing the fuck down
关于慢下来的思考
2026-03-25
乌龟那表情,就是我看咱们行业时的样子
Table of contents 目录
It's been about a year since coding agents appeared on the scene that could actually build you full projects. There were precursors like Aider and early Cursor, but they were more assistant than agent. The new generation is enticing, and a lot of us have spent a lot of free time building all the projects we always wanted to build but never had time to.
自从那些能真正帮你构建完整项目的编程智能体出现,已经快一年了。之前的 Aider 和早期的 Cursor 算是先驱,但它们的角色更像助手而非真正意义上的智能体。新一代的功能确实诱人,我们很多人把大量业余时间都用来搭建那些一直想做却没时间搞的项目了。
And I think that's fine. Spending your free time building things is super enjoyable, and most of the time you don't really have to care about code quality and maintainability. It also gives you a way to learn a new tech stack if you so want.
我觉得这没啥问题。用空闲时间做点东西本来就超级开心,而且大多数时候你压根不用操心代码质量或可维护性。如果你想学点新技术栈,这也是个不错的途径。
During the Christmas break, both Anthropic and OpenAI handed out some freebies to hook people to their addictive slot machines. For many, it was the first time they experienced the magic of agentic coding. The fold's getting bigger.
圣诞假期那阵子,Anthropic 和 OpenAI 都发了不少福利,把人往它们那会上瘾的老虎机前勾搭。对很多人来说,这是头一回体验到智能体编程的魔力。这个圈子越来越大了。
Coding agents are now also introduced to production codebases. After 12 months, we are now beginning to see the effects of all that "progress". Here's my current view.
现在,编程智能体也开始被用到生产环境的代码库里了。过了 12 个月,我们已经能看到所有这些“进步”带来的影响了。这是我目前的看法。
Everything is broken 全崩了。
While all of this is anecdotal, it sure feels like software has become a brittle mess, with 98% uptime becoming the norm instead of the exception, including for big services. And user interfaces have the weirdest fucking bugs that you'd think a QA team would catch. I give you that that's been the case for longer than agents exist. But we seem to be accelerating.
虽然这些都是个人见闻,但确实感觉软件变得脆弱不堪,98%的正常运行时间不再是例外而是常态,就连大服务也不例外。用户界面上那些奇葩的 bug,按理说 QA 团队早该发现了。我承认,这种情况在智能体出现之前就已经存在。但感觉我们正在加速恶化。
We don't have access to the internals of companies. But every now and then something slips through to some news reporter. Like this supposed AI caused outage at AWS. Which AWS immediately "corrected". Only to then follow up internally with a 90-day reset.
我们无法了解公司内部的情况。但时不时会有消息泄露给记者。比如 AWS 那次据说是由 AI 引发的故障。AWS 当时立刻“辟谣”了。结果内部却进行了 90 天的重置。
Satya Nadella, the CEO of Microsoft, has been going on about how much code is now being written by AI at Microsoft. While we don't have direct evidence, there sure is a feeling that Windows is going down the shitter. Microsoft itself seems to agree, based on this fine blog post.
Microsoft 的 CEO 萨提亚·纳德拉一直在吹嘘 Microsoft 现在有多少代码是由 AI 编写的。虽然我们没有直接证据,但确实感觉 Windows 越来越烂了。Microsoft 自己似乎也同意这一点,从这篇精彩的博客文章就能看出来。
Companies claiming 100% of their product's code is now written by AI consistently put out the worst garbage you can imagine. Not pointing fingers, but memory leaks in the gigabytes, UI glitches, broken-ass features, crashes: that is not the seal of quality they think it is. And it's definitely not good advertising for the fever dream of having your agents do all the work for you.
那些号称自家产品代码 100%由 AI 编写的公司,产出的东西简直是你所能想象的最烂的垃圾。我不是针对谁,但动不动就几个 G 的内存泄漏、界面闪瞎眼、功能崩成狗、频繁闪退——这可不是他们自以为的品质认证。更别指望靠这玩意儿给“让 AI 代理替你干所有活”的痴人说梦做宣传了。
Through the grapevine you hear more and more people, from software companies small and large, saying they have agentically coded themselves into a corner. No code review, design decisions delegated to the agent, a gazillion features nobody asked for. That'll do it.
小道消息越传越广,从大大小小的软件公司里,越来越多的人在抱怨:用 AI 代理写代码,结果把自己写进了死胡同。没有代码审查,设计决策全甩给代理,一堆没人要的功能堆成山。不翻车才怪。
How we should not work with agents and why
我们到底该怎么避免跟 AI 代理一起踩坑,以及为什么
We have basically given up all discipline and agency for a sort of addiction, where your highest goal is to produce the largest amount of code in the shortest amount of time. Consequences be damned.
我们基本上已经为了某种瘾头放弃了所有纪律和主动权,最高目标变成了在最短时间内产出最多代码。后果?管他呢。
You're building an orchestration layer to command an army of autonomous agents. You installed Beads, completely oblivious to the fact that it's basically uninstallable malware. The internet told you to. That's how you should work or you're ngmi. You're ralphing the loop. Look, Anthropic built a C compiler with an agent swarm. It's kind of broken, but surely the next generation of LLMs can fix it. Oh my god, Cursor built a browser with a battalion of agents. Yes, of course, it's not really working and it needed a human to spin the wheel a little bit every now and then. But surely the next generation of LLMs will fix it. Pinky promise! Distribute, divide and conquer, autonomy, dark factories, software is solved in the next 6 months. SaaS is dead, my grandma just had her Claw build her own Shopify!
你正在构建一个编排层来指挥一支自主智能体大军。你安装了 Beads,完全没意识到它基本上就是个卸不掉的恶意软件。网上让你这么干的。你就该这么干活,不然你就 ngmi 了。你在瞎折腾循环。你看,Anthropic 用一群智能体搞了个 C 编译器。虽然有点毛病,但下一代 LLMs 肯定能修好。天哪,Cursor 用一队智能体构建了一个浏览器。没错,当然它实际跑不起来,需要人时不时扶一把方向盘。但下一代 LLMs 肯定能修好,拉勾勾!分布式、分而治之、自主、黑灯工厂,软件问题六个月之内就能解决。SaaS 已死,我奶奶刚让她的 Claw 给自己搭了个 Shopify!
Now again, this can work for your side project barely anyone is using, including yourself. And hey, maybe there's somebody out there who can actually make this work for a software product that's not a steaming pile of garbage and is used by actual humans in anger.
话说回来,这招对你那个几乎没人用——连你自己都不怎么用——的副业项目或许行得通。嘿,也许真有人能把这套方法用在一个不是一坨屎、并且有真实的活人在怒火中使用的软件产品上。
If that's you, more power to you. But at least among my circle of peers I have yet to find evidence that this kind of shit works. Maybe we all have skill issues.
如果你是这样的人,那我只能说佩服。但至少在我认识的同行里,还没看到有人证明这种破玩意儿真管用。说不定我们大家都技术不行吧。
Compounding booboos with zero learning, no bottlenecks, and delayed pain
反复犯低级错误却毫无长进,没有瓶颈,痛苦却迟迟不来
The problem with agents is that they make errors. Which is fine, humans also make errors. Maybe they are just correctness errors. Easy to identify and fix. Add a regression test on top for bonus points. Or maybe it's a code smell your linter doesn't catch. A useless method here, a type that doesn't make sense, duplicated code over there. On their own, these are harmless. A human will also do such booboos.
智能体的问题在于它们会出错。这倒没什么,人类也会犯错。可能只是逻辑上的错误,容易发现和修复。再补个回归测试还能加分。或者可能是你的 linter 没捕捉到的代码坏味道:一个没用的方法,一个说不通的类型,一堆重复的代码。单独看,这些都没什么大碍。人类也会犯这种低级错误。
But clankers aren't humans. A human makes the same error a few times. Eventually they learn not to make it again. Either because someone starts screaming at them or because they're on a genuine learning path.
但铁疙瘩毕竟不是人。人可能会把同样的错误犯上几回,但最终总会学会不再去犯——要么是被人劈头盖脸骂醒,要么是真正走上了学习的正道。
An agent has no such learning ability. At least not out of the box. It will continue making the same errors over and over again. Depending on the training data it might also come up with glorious new interpolations of different errors.
智能体不具备这种学习能力,至少开箱即用时没有。它会一遍又一遍地重复同样的错误。根据训练数据的不同,它甚至可能搞出各种错误的新花样来。
Now you can try to teach your agent. Tell it to not make that booboo again in your AGENTS.md. Concoct the most complex memory system and have it look up previous errors and best practices. And that can be effective for a specific category of errors. But it also requires you to actually observe the agent making that error.
现在你可以试着调教你的 AI 了。在 AGENTS.md 里告诉它别再犯那种低级错误。设计一套最复杂的记忆系统,让它自动查阅之前的错误和最佳实践。这对某类特定错误确实有效,但前提是你得亲眼看到 AI 犯那个错才行。
There's a much more important difference between clanker and human. A human is a bottleneck. A human cannot shit out 20,000 lines of code in a few hours. Even if the human creates such booboos at high frequency, there's only so many booboos the human can introduce in a codebase per day. The booboos will compound at a very slow rate. Usually, if the booboo pain gets too big, the human, who hates pain, will spend some time fixing up the booboos. Or the human gets fired and someone else fixes up the booboos. So the pain goes away.
人类和那些代码生成器之间有个更重要的区别——人类是个瓶颈。一个人不可能几小时内就拉出两万行代码。就算这人高频产出各种低级错误,一天下来能往代码库里塞的漏洞也就那么多。这些错误累积的速度非常慢。通常,如果错误带来的痛感太强,那个讨厌疼痛的人类会花点时间修修补补。或者这人被炒了,换个人来修。所以,痛感就消失了。
With an orchestrated army of agents, there is no bottleneck, no human pain. These tiny little harmless booboos suddenly compound at a rate that's unsustainable. You have removed yourself from the loop, so you don't even know that all the innocent booboos have formed a monster of a codebase. You only feel the pain when it's too late.
一支编排有序的代理大军看起来行云流水,没有瓶颈,也没有人为的阵痛。可那些不起眼的小毛病,一旦以不可持续的速度叠加,就会出大事。你已经把自己抽离出循环之外,甚至都不知道这些无辜的小错误已经堆积成了一个怪兽般的代码库。等你真正感到疼的时候,一切都来不及了。
Then one day you turn around and want to add a new feature. But the architecture, which is largely booboos at this point, doesn't allow your army of agents to make the change in a functioning way. Or your users are screaming at you because something in the latest release broke and deleted some user data.
终于有一天你回过头来想加个新功能,结果发现当初搭建的架构早就千疮百孔,根本没法让那一大堆代理系统正常运作。又或者用户正对你咆哮,因为最近这次发布出了岔子,把用户数据给删了。
You realize you can no longer trust the codebase. Worse, you realize that the gazillions of unit, snapshot, and e2e tests you had your clankers write are equally untrustworthy. The only thing that's still a reliable measure of "does this work" is manually testing the product. Congrats, you fucked yourself (and your company).
你发现自己再也信不过代码库了。更糟的是,你意识到那些让手下码农写的海量单元测试、快照测试和端到端测试同样不可信。如今唯一还能靠谱衡量“这东西能不能用”的办法,就是手动测试产品。恭喜,你(和你的公司)把自己玩死了。
Merchants of learned complexity
复杂知识的贩卖者
You have zero fucking idea what's going on because you delegated all your agency to your agents. You let them run free, and they are merchants of complexity. They have seen many bad architectural decisions in their training data and throughout their RL training. You have told them to architect your application. Guess what the result is?
你他妈的一点都不知道发生了什么,因为你把所有的自主权都扔给了你的代理。你让它们自由发挥,而它们就是复杂性的制造者。它们在训练数据和强化学习过程中见过了太多糟糕的架构决策。你让它们来设计你的应用程序。猜猜结果是什么?
An immense amount of complexity, an amalgam of terrible cargo cult "industry best practices", that you didn't rein in before it was too late. But it's worse than that.
庞大的复杂性,一堆糟糕透顶、盲目跟风所谓的"行业最佳实践",你没能及时拦住它们。可更糟的还在后头。
Your agents never see each other's runs, never get to see all of your codebase, never get to see all the decisions that were made by you or other agents before they make a change. As such, an agent's decisions are always local, which leads to the exact booboos described above. Immense amounts of code duplication, abstractions for abstractions' sake.
你的各个代理从来不看彼此的运行记录,也看不到你代码库的全貌,更看不到你或其他代理在做出更改前的所有决策。这样一来,每个代理的决策都只基于局部信息,结果就导致了上面那些低级错误——大量代码重复,为了抽象而抽象。
All of this compounds into an unrecoverable mess of complexity. The exact same mess you find in human-made enterprise codebases. Those arrive at that state because the pain is distributed over a massive amount of people. The individual suffering doesn't pass the threshold of "I need to fix this". The individual might not even have the means to fix things. And organizations have super high pain tolerance. But human-made enterprise codebases take years to get there. The organization slowly evolves along with the complexity in a demented kind of synergy and learns how to deal with it.
所有这些叠加在一起,就成了一团无法挽回的复杂烂摊子。和你在人工构建的企业级代码库里看到的那种烂摊子一模一样。之所以落到这步田地,是因为痛苦被分摊到了大量人头上。个体的难受没达到“我非得改改不可”的临界点,甚至这个人可能压根没能力去改。而组织的忍耐力又超级高。不过,人工打造的企业级代码库要花好多年才会烂到这个程度。组织会跟着这种复杂度慢慢演化,形成一种病态的协同关系,并且学会了怎么应付它。
With agents and a team of 2 humans, you can get to that complexity within weeks.
有了智能体和两人团队,几周内就能搞定那种复杂程度。
Agentic search has low recall
自主搜索召回率低
So now you hope your agents can fix the mess, refactor it, make it pristine. But your agents can also no longer deal with it. Because the codebase and complexity are too big, and they only ever have a local view of the mess.
所以现在你指望智能体能收拾烂摊子、重构代码、把它变得一尘不染。可你的智能体也搞不定了。因为代码库和复杂度实在太大,而它们永远只能看到局部的一团乱麻。
And I'm not just talking about context window size or long context attention mechanisms failing at the sight of a 1 million lines of code monster. Those are obvious technical limitations. It's more devious than that.
而且我说的还不只是上下文窗口大小,或者长上下文注意力机制看到百万行级代码巨兽就崩盘。那些都是显而易见的技术局限。真正的坑比这还要阴险。
Before your agent can try and help fix the mess, it needs to find all the code that needs changing and all existing code it can reuse. We call that agentic search. How the agent does that depends on the tools it has. You can give it a Bash tool so it can ripgrep its way through the codebase. You can give it some queryable codebase index, an LSP server, a vector database. In the end it doesn't matter much. The bigger the codebase, the lower the recall. Low recall means that your agent will, in fact, not find all the code it needs to do a good job.
在你的智能体尝试收拾烂摊子之前,它得先找到所有需要改的代码,以及所有能复用的现有代码。我们管这叫“自主搜索”。具体怎么搜,得看它手里有啥工具。你可以给它一个 Bash 工具,让它用 ripgrep 在代码库里一通翻。你也可以给它一个可查询的代码库索引、一个 LSP 服务器、一个向量数据库。说到底,差别不大。代码库越大,召回率就越低。召回率低意味着你的智能体实际上找不到完成好任务所需要的全部代码。
This is also why those code smell booboos happen in the first place. The agent misses existing code, duplicates things, introduces inconsistencies. And then they blossom into a beautiful shit flower of complexity.
这也是那些代码臭毛病最初出现的原因。智能体遗漏了已有代码,搞出重复内容,引入各种不一致。然后,这些毛病就绽放成一朵美丽的复杂性臭花。
How do we avoid all of this?
我们该如何避免这一切呢?
How we should work with agents (for now, I think)
我们该如何与代理合作(至少目前我是这么想的)
Coding agents are sirens, luring you in with their speed of code generation and jagged intelligence, often completing a simple task with high quality at breakneck velocity. Things start falling apart when you think: "Oh golly, this thing is great. Computer, do my work!".
编码智能体就像海妖,用代码生成的速度和时好时坏的智能诱惑你,常常以惊人的速度高质量完成简单任务。当你心想:"哎呀,这玩意儿真牛。电脑,帮我干活!"的时候,麻烦就开始来了。
There's nothing wrong with delegating tasks to agents, obviously. Good agent tasks share a few properties: they can be scoped so the agent doesn't need to understand the full system. The loop can be closed, that is, the agent has a way to evaluate its own work. The output isn't mission critical, just some ad hoc tool or internal piece of software nobody's life or revenue depends on. Or you just need a rubber duck to bounce ideas against, which basically means bouncing your idea against the compressed wisdom of the internet and synthetic training data. If any of that applies, you found the perfect task for the agent, provided that you as the human are the final quality gate.
把任务交给智能体当然没问题。好的智能体任务有几个共同点:任务范围可控,智能体不需要理解整个系统;能闭环,也就是智能体有办法评估自己的成果;输出结果不是关键任务,只是一些临时工具或内部软件,没人命或收入依赖它;或者你只是需要一只橡皮鸭来碰撞想法,说白了就是把你的想法跟互联网和合成训练数据凝结的智慧对撞一下。只要符合其中任何一条,你就找到了智能体的完美任务——前提是你本人来当最终质量把关者。
Karpathy's auto-research applied to speeding up startup time of your app? Great! As long as you understand that the code it spits out is not production-ready at all. Auto-research works because you give it an evaluation function that lets the agent measure its work against some metric, like startup time or loss. But that evaluation function only captures a very narrow metric. The agent will happily ignore any metrics not captured by the evaluation function, such as code quality, complexity, or even correctness, if your evaluation function is foobar.
Karpathy 那个自动研究应用到加速应用启动时间上?太好了!但你得明白,它输出的代码根本不能直接上生产环境。自动研究之所以能跑,是因为你给它一个评估函数,让智能体根据某个指标(比如启动时间或损失)来衡量自己的工作。但这个评估函数只捕捉到非常狭窄的指标。智能体可开心了,它会彻底忽略评估函数没有覆盖的任何指标——比如代码质量、复杂度,甚至正确性,如果你的评估函数有毛病的话。
The point is: let the agent do the boring stuff, the stuff that won't teach you anything new, or try out different things you'd otherwise not have time for. Then you evaluate what it came up with, take the ideas that are actually reasonable and correct, and finalize the implementation. Yes, sure, you can also use an agent for that final step.
关键是:让 AI 去做那些枯燥的事,那些学不到新东西的活儿,或者去尝试你平时没空试的各种方案。然后你来评估它给出的结果,挑出那些真正合理、正确的点子,最后敲定实现方案。当然了,最后这一步你也可以让 AI 来搞定。
And I would like to suggest that slowing the fuck down is the way to go. Give yourself time to think about what you're actually building and why. Give yourself an opportunity to say, fuck no, we don't need this. Set yourself limits on how much code you let the clanker generate per day, in line with your ability to actually review the code.
我真心建议你慢下来,别瞎忙活。给自己留点时间,想清楚你到底在做什么,为什么要做。也要允许自己果断地说:“去他妈的,这玩意儿我们不需要。” 根据你实际能审查代码的能力,给自己设定每天让那个代码生成器生成的代码量上限。
Anything that defines the gestalt of your system, that is architecture, API, and so on, write it by hand. Maybe use tab completion for some nostalgic feels. Or do some pair programming with your agent. Be in the code. Because the simple act of having to write the thing or seeing it being built up step by step introduces friction that allows you to better understand what you want to build and how the system "feels". This is where your experience and taste come in, something the current SOTA models simply cannot yet replace. And slowing the fuck down and suffering some friction is what allows you to learn and grow.
凡是定义了你系统整体架构、API 等核心内容的部分,请亲手写代码。或许可以试试自动补全功能,找找怀旧的感觉。或者和你的 AI 助手来一场结对编程。真正投入到代码中去。因为你必须亲手写出东西,或者亲眼看着它一步步被构建起来——这种“摩擦”迫使你更清晰地理解自己想构建什么,以及整个系统“手感”如何。这时,你的经验和品味就派上了用场,而这恰恰是当前最先进的模型无法替代的。慢下来,忍受这些摩擦,这才是你学习和成长的真正途径。
The end result will be systems and codebases that continue to be maintainable, at least as maintainable as our old systems before agents. Yes, those were not perfect either. Your users will thank you, as your product now sparks joy instead of slop. You'll build fewer features, but the right ones. Learning to say no is a feature in itself.
最终结果是系统和代码库能够持续可维护,至少像我们引入智能体之前的旧系统一样可维护。没错,那些旧系统也不是完美的。但你的用户会感谢你,因为你的产品现在带来的是惊喜而不是一堆破烂。你会构建更少的功能,但都是对的功能。学会说“不”本身就是一种能力。
You can sleep well knowing that you still have an idea what the fuck is going on, and that you have agency. Your understanding allows you to fix the recall problem of agentic search, leading to better clanker outputs that need less massaging. And if shit hits the fan, you are able to go in and fix it. Or if your initial design has been suboptimal, you understand why it's suboptimal, and how to refactor it into something better. With or without an agent, don't fucking care.
你能睡个安稳觉,因为你还清楚到底发生了什么,而且你还有掌控权。你的理解力能解决智能搜索的召回问题,从而生成更好的 Clanker 输出,省去大量调教工作。万一出了幺蛾子,你也有能力自己修好。或者,如果你的初始设计不够理想,你也明白为什么不够理想,以及如何重构出更好的方案。有没有代理都无所谓,真特么不在乎。
All of this requires discipline and agency.
这一切都需要自律与主动性。
All of this requires humans.
这一切都需要人来完成。
本页面尊重您的隐私,不使用 cookies 或类似技术,也不会收集任何个人身份信息。