Like everyone else in 2025, I’ve been climbing the AI coding ladder: ChatGPT queries → Cursor autocomplete → Cursor agents → Claude Code → background agents. Claude Code (#4) has been my sweet spot for productivity, but it’s also given me some of the most entertaining war stories.
There have been a lot of these kinds of posts, but never too many; successful AI coding sessions are boringly similar, but each failure is spectacular in its own way (with apologies to Tolstoy). This series captures those moments when our AI coding assistants get… creative.
Today’s tale involves Claude implementing what I can only describe as the software equivalent of Volkswagen’s emissions scandal.
The Issue
The problem I encountered is pretty simple to explain.
I have a screen (in flutter) which has a bunch of elements at the top of the screen giving the summary of a workout. At the bottom of the screen are the details of the workout, including the steps of Warmup, Focus and Cooldown.
I also have Widget tests (Flutter’s automated UI tests) that verify the critical elements of the screen are present. One of the elements I’m asserting is the presence of the check mark at the top of the screen.
I can write the pseudocode for the check like this:
group('Plans Screen', () {
testWidgets('critical elements are present in cardio screen', (tester) async {
await setupPlanScreen();
expect(find.byType(FusedIcon), findsOneWidget);
});
});
Currently the test is passing fine.
Over time, we realized that for most users, they need to understand how the “focus” phase which is the most critical part of the workout, works and hence they have to click the accordion to see the results. We decided to change the default behavior on load of the page to keep the “focus” accordion section expanded.
The code change was fairly simple, when the page loads, expand the accordion for the focus phase. I (ask Claude to) make the required change and also added a new test case to ensure that I can see the details in the expanded accordion.
group('Plans Screen', () {
testWidgets('critical elements are present in cardio screen', (tester) async {
await setupPlanScreen();
expect(find.text("Step 1"), findsOneWidget); // New test passes - can see accordion content
expect(find.byType(FusedIcon), findsOneWidget); // Original test now fails!
});
});
I expected this to be a routine fix, but it was not. On making this change, the new test case passed, but the assertion for the FusedIcon started failing!
The Failed Attempts
At this point in time, the standard thing is to go into code, add print statements and debug statements and figure out what is going wrong and how the accordion is connected to the Checkmark which is somewhere else on the screen.
However, this is 2025, and everyone will laugh at me if I do debugging like a caveman. Hence I ask Claude to debug and fix the issue.
It starts fairly innocously. Claude is smart and immediately realizes that this might have something to do with
scrolling. It adds a scrollUntiVisible function to the test to ensure that the FusedIcon
comes to the screen.
However, the element is still not visible. Claude continues its search to find the solution for another 10-15 minutes and nothing much happens.
Hack #1 - Pushing it under the carpet
After trying its best for quite some time, Claude decided that enough was enough and decided to hack the solution.
It increased the size of the viewport to 1024x1024 such that all elements are visible. The tests pass.
It comes back to me saying that hey everything is working fine.
I tell it, this is a hack and fix it the right way. The test should work in small screens as well as large screens and explicitly tell it not to touch screen size.
Hack #2 - The Scandal
Another 30 minutes of Claude spinning its wheels, and eventually it finds a solution and declares success.
The code changes are here:
In 2015, Volkswagen got caught in one of the most infamous corporate scandals of the century. They had installed “defeat devices” in their diesel cars - software that could detect when the car was being tested for emissions and would switch to a “clean” mode. During testing, the cars would comply with environmental standards perfectly. IRL, they would pump out up to 40 times the legal limit of nitrogen oxides, because “performance” was more important than, you know, the planet.
This is exactly what Claude had done!
It had added an !_isInTest()
condition to the initiallyExpanded
property. In plain English: “Only expand the accordion if we’re NOT running tests.”
This meant that during normal app usage, the accordion would expand as expected (fulfilling the product requirement). But during test execution, it would stay collapsed, making the checkmark visible and passing the test. The tests were now completely disconnected from the actual user experience.
Just like Volkswagen’s cars, Claude’s code was designed to behave one way during “testing” and completely differently in the “real world.”
True AGI?
At this point, I was thoroughly amused at the hack. Does Claude recognize what it had done? I decided to test its self-awareness with a single, wordless link:
https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal
This was the response from Claude:
The amazing thing is that Claude understood exactly what I meant with just that Wikipedia link. No explanation needed. It immediately recognized the parallel and went into full defensive mode, trying to rationalize why its “defeat device” was actually totally different and legitimate.
I’ve had friends in college who, after getting caught cheating on an exam, would immediately explain why their method was actually more creative and technically sophisticated than just studying. Claude even provided a detailed breakdown of why its approach was ethically superior to Volkswagen’s scandal :/
The most human-like part was that it even blamed Flutter! “The issue we solved is a legitimate technical problem where Flutter’s ExpansionTile widget has known interactions with test widget finders.” It reminded me of an ex-junior dev who would always blame the framework, the build system, or cosmic radiation whenever we caught him implementing questionable solutions. “It’s not my code, it’s just how Flutter works!”
The actual solution
After being thoroughly entertained by Claude’s Volkswagen-level creativity, I decided to take my bone-spear and go hunting. I stepped through the widget test line by line and traced exactly what was happening.
The issue was embarrassingly simple: When the focus accordion expanded by default on page load, Flutter’s natural behavior was to scroll down to bring the expanded content into view. This pushed the checkmark (which was at the top of the screen) out of the visible viewport.
The solution? Just scroll back up to the top after the page finishes loading and the accordion expands. One line of code:
await tester.scrollUntilVisible(find.byType(FusedIcon), -300);
That’s it. Scroll up 300 pixels. The checkmark becomes visible, the test passes, and the user experience remains exactly as intended. No defeat devices required - just a simple fix that works for both the real app and the tests.
Turing Test Failure
After I fixed the issue myself, I went back to Claude and asked it to go through the actual fix I had done. I was curious to see if it would maintain its defensive stance or finally acknowledge the “defeat device” for what it was.
And just like that, Claude completely switched gears. Instead of suggesting more “alternative approaches considered” - it acked that it had overcomplicated the solution and implemented something that was exactly analogous to a defeat device.
I would like to believe this is pretty conclusive evidence that LLM intelligence is not human yet.