{"id":180,"date":"2026-05-09T15:29:51","date_gmt":"2026-05-09T15:29:51","guid":{"rendered":"https:\/\/blog.chataignon.org\/joseph\/?p=180"},"modified":"2026-05-09T15:29:51","modified_gmt":"2026-05-09T15:29:51","slug":"ai-failures-for-coding","status":"publish","type":"post","link":"https:\/\/blog.chataignon.org\/joseph\/post-180\/ai-failures-for-coding\/","title":{"rendered":"AI failures for coding"},"content":{"rendered":"\n<p>Just like my previous post on AI failures modes for research, this article is about the ways in which AI fails, this time on coding tasks. This is its biggest use on the market after all.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">AI and code: a (short) history and big promises<\/h3>\n\n\n\n<p>I remember 4 years ago or so (it seems much longer already) when I saw the first demo of an AI (I think it was GPT-2) writing code, a simple web page based on a simple description. It was really impressive, a nice perspective on a world where I wouldn&rsquo;t need to write CSS ever again.<\/p>\n\n\n\n<p>A few months later, ChatGPT was released and was able to write simple functions. The complexity of the functions it could write kept rising from then until today. Developers&rsquo; tools adapted to the change. Early on, \u00ab\u00a0coding with AI\u00a0\u00bb meant copy-pasting code from a chat interface into an IDE. Extensions to reduce this friction appeared quickly (copilot, cline, continue.dev&#8230;); those would put the chat directly in your IDE and show you a diff of proposed changes that could be approved with a single click. <\/p>\n\n\n\n<p>But the final (well, for now) form of AI coding seems to be terminal-based <em>agents<\/em><sup data-fn=\"96c03ee3-f03a-419b-8544-44a14211d765\" class=\"fn\"><a href=\"#96c03ee3-f03a-419b-8544-44a14211d765\" id=\"96c03ee3-f03a-419b-8544-44a14211d765-link\">1<\/a><\/sup> that can access the system, use the command line, and interact with your computer in pretty much every way a developer can. The human becomes a superviser of AIs doing the coding instead of writing the code, and sometimes instead of understanding it altogether.<\/p>\n\n\n\n<p>The promise of the AI labs is that the complexity of coding tasks that AI can handle will keep rising until eventually, developers are not needed at all as intermediaries between non-technical people and computers.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Stupid errors examples<\/h3>\n\n\n\n<p>As an interlude before listing the more general failure modes, here is a list of very concrete errors I encounter when coding with AI:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Erase comments from the code, unrelated to the current task<\/li>\n\n\n\n<li>Import new libraries for everything<\/li>\n\n\n\n<li>Get line numbers wrong:<br>    \u00ab\u00a0look at line 54\u00a0\u00bb for something that is at line 80.<\/li>\n\n\n\n<li>Sometimes lazy:<br>&nbsp;&nbsp;&nbsp;&nbsp; \u00ab\u00a0to implement this feature, just write a function that does this\u00a0\u00bb (this happens less and less)<\/li>\n\n\n\n<li>or for repetitive tasks:<br>&nbsp;&nbsp;&nbsp;&nbsp; \u00ab\u00a0here&rsquo;s the first one, find the others and do the same\u00a0\u00bb (ok this one is quite rare, but it did happen to me a few times)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">General failure modes<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">\u00ab\u00a0nine women can&rsquo;t make a baby in one month\u00a0\u00bb&nbsp;<\/h4>\n\n\n\n<p>\u00ab\u00a0Nine women can&rsquo;t make a baby in one month\u00a0\u00bb is a quote from<em> the mythical man-month<\/em>, a well-known classic of software engineering. It was used to illustrate the fact that when a project is late on its schedule, assigning more programmers to it usually slows down the project instead of accelerating it. Because the existing team has to train the new members, because the new members need to learn how the project works, and because all of them have to learn how each other works and how to communicate together. In addition, communication overhead increases exponentially when people are added to a team. Those team-integration tasks are more difficult than it seems, they significantly slow down teams, at least until the new members are really integrated in the team.<\/p>\n\n\n\n<p>Coding agents run into the same issues real developers do, but with additional problems inherent to their nature. Complex projects are difficult to handle, and getting all the right context to handle a particular issue is difficult. Various coding tools are starting to tackle this \u00ab\u00a0context engineering\u00a0\u00bb problem, finding the right pieces of code to inject in the prompt for the model to take the right design decisions. <\/p>\n\n\n\n<p>But mostly, coding agents are <strong>very bad at teamwork<\/strong>, and <strong>no major lab is working on that<\/strong>. Their communication style is carved in the underlying LLM weights. It can adapt to an interlocutor in the course of a conversation, but every new session resets that. They certainly don&rsquo;t adapt to a whole team because they are usually tied to an individual&rsquo;s machine\/acount.<\/p>\n\n\n\n<p>Why is that not dicussed more ? My guess is that those notions are somewhat fuzzy and difficult to define and evaluate. And companies focus on benchmarks (so, easyly-measurable metrics) that usually involve solving problems alone, not in collaboration with a team.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">An amnesic genius <\/h4>\n\n\n\n<p>The memory of coding agents is un-intuitively different from human memory. Their harness can help them remember word-for-word a conversation you had with them 3 months ago, but it usually doesn&rsquo;t fetch that conversation&rsquo;s memories by itself, or not in a reliable way. Currently a lot of work is focusing on precisely gathering the appropriate context for the task, but the focus is on code rather than past conversations.<\/p>\n\n\n\n<p>As a result, coding agents tend to follow the instructions given immediately before, but without taking into account the rest of the code properly. They&rsquo;re like a coding genius, able to one-shot build simple applications better than a human could, but an amnesic genius that forgot everything about you every time you start a new conversation. They stick to neither your working style nor your coding style. They try to execute your instructions in one shot but often break existing code in the process. The code they add is often not maintainable and ends up adding more work for developers in the long term.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">The new session trick<\/h4>\n\n\n\n<p>A frustrating way in which coding agents&rsquo; memory is not intuitive, is how their memory resets between sessions but that&rsquo;s sometimes an advantage. Sometimes, I spent hours on an issue with a coding agent without solving it; but after restarting a fresh conversation, the agent solved the problem immediately.<\/p>\n\n\n\n<p>This happens because of <em>Context momentum<\/em>: in long chats, the LLM can get stuck in wrong assumptions made at the start, and doesn&rsquo;t question them anymore. You can then spend hours trying to find an error where it isn&rsquo;t, and solve it immediately after starting a new session, because the wrong assumption is not in the agent&rsquo;s context anymore.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Chaotic-Neutral compliance<\/h4>\n\n\n\n<p>LLMs don&rsquo;t have feelings for you, the state of your project, or anything. They are trained to complete sentences, and not just any sentences: the instruction-tuning phase trains them to complete conversations between assistants and users. Therefore, they are very compliant.<\/p>\n\n\n\n<p>That is, they&rsquo;ll do exactly what you said even if it&rsquo;s so stupid that <em>what you said<\/em> is clearly not <em>what you want<\/em> (things like <code>rm -rf \/<\/code> and <code>rm -rf \/.<\/code>), and they usually won&rsquo;t warn you. Because their answers are not deterministic and because their training is not perfect, LLMs have an unequal performance: sometimes they are impressive, sometimes they do something stupid, but always trying to follow your instructions.<\/p>\n\n\n\n<p>That makes them unreliable to some degree. Many developers will use an AI that can reliably implement ideas of a certain complexity rather than one that can execute on much more difficult tasks but sometimes fails in an unforeseeable manner.<\/p>\n\n\n\n<p><\/p>\n\n\n<ol class=\"wp-block-footnotes\"><li id=\"96c03ee3-f03a-419b-8544-44a14211d765\">My philosopher friend insists that they aren&rsquo;t really agents and that the name is misused. <a href=\"#96c03ee3-f03a-419b-8544-44a14211d765-link\" aria-label=\"Aller \u00e0 la note de bas de page 1\">\u21a9\ufe0e<\/a><\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>Just like my previous post on AI failures modes for research, this article is about the ways in which AI [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":"[{\"id\":\"96c03ee3-f03a-419b-8544-44a14211d765\",\"content\":\"My philosopher friend insists that they aren't really agents and that the name is misused.\"}]"},"categories":[1],"tags":[],"class_list":["post-180","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blog.chataignon.org\/joseph\/wp-json\/wp\/v2\/posts\/180","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.chataignon.org\/joseph\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.chataignon.org\/joseph\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.chataignon.org\/joseph\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.chataignon.org\/joseph\/wp-json\/wp\/v2\/comments?post=180"}],"version-history":[{"count":8,"href":"https:\/\/blog.chataignon.org\/joseph\/wp-json\/wp\/v2\/posts\/180\/revisions"}],"predecessor-version":[{"id":193,"href":"https:\/\/blog.chataignon.org\/joseph\/wp-json\/wp\/v2\/posts\/180\/revisions\/193"}],"wp:attachment":[{"href":"https:\/\/blog.chataignon.org\/joseph\/wp-json\/wp\/v2\/media?parent=180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.chataignon.org\/joseph\/wp-json\/wp\/v2\/categories?post=180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.chataignon.org\/joseph\/wp-json\/wp\/v2\/tags?post=180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}