96: Adventures in babysitting coding agents · friends

Stream: friends

Topic: 96: Adventures in babysitting coding agents

Logbot (Jun 06 2025 at 19:02):

The ever-provocative Steve Yegge joins us fresh off a vibe coding bender so productive, he wrote a book on the topic alongside award-winning author Gene Kim. Steve tells us why he believes the IDE is dead, why babysitting AI agents is more fun than coding, when vibe coding might take over the enterprise, how software devs should approach coding agents, and what it all means for society. :link: https://changelog.am/96

Ch	Start	Title	Runs
01	00:00	Let's talk!	00:38
02	00:38	Sponsor: Retool	01:59
03	02:37	Hot takes & Friends	07:00
04	09:36	Babysitting AIs	06:51
05	16:27	All coding will change	03:34
06	20:02	Multi-agent is the key	04:24
07	24:26	The dopamine hit	02:22
08	26:48	The death of the junior eng?	03:25
09	30:13	Sponsor: Heroku	03:03
10	33:16	It's starting in enterprises	06:04
11	39:19	The new gig economy	01:27
12	40:46	A fitting analogy	02:51
13	43:37	Afraid to go beyond	07:00
14	50:37	Getting over the hump	03:29
15	54:06	Worth waiting?	03:41
16	57:47	Sponsor: Outshift by Cisco	01:03
17	58:51	Getting tactical	03:10
18	1:02:01	Talk to the plan	08:13
19	1:10:14	Who wrote this?	05:17
20	1:15:31	So expensive!	04:33
21	1:20:04	Giving them tools	02:08
22	1:22:12	Hitting the household	03:02
23	1:25:13	Dumbness as a feature	00:36
24	1:25:50	Memory matters	01:41
25	1:27:31	Societal impact	02:02
26	1:29:33	Being a kid today	00:56
27	1:30:29	What's the point?	01:27
28	1:31:57	Bye, friends	00:47
29	1:32:43	Next week on the pod	01:14

Jerod Santo (Jun 06 2025 at 19:17):

I thought Nathan Sobo's "genius-level golden retriever on acid" was the best description for AI coding agents ever until Steve dropped "toddler with a chainsaw on ice skates" on us :rolling_on_the_floor_laughing:

Matthew Sanabria (Jun 06 2025 at 22:31):

That was a good description.

Matthew Sanabria (Jun 06 2025 at 22:33):

I'm going to push back a bit on the two pizza team and gig topic. It's not AI that's unblocking or unlocking the benefits of that approach. It's the business leadership itself. There's nothing preventing these concepts today aside from poor leadership siloing and structuring a business in such a way where progress is blocked. Why should your good UX people be pigeonholed into a single product or project? Why should engineering be isolated from customers? It's because the poor business leaders decided it. That doesn't change with AI.

Andrew O'Brien (Jun 07 2025 at 18:26):

“Gig economy” for programmers does not fill me with excitement.

Even he meant a gig economy within an enterprise, because we live in the worst timeline, why would anyone expect they stop just-in-timing/elastic-workforce-balancing there?

valon-loshaj (Jun 07 2025 at 19:54):

lots of great ideas and big predictions on this one. loved listening to it.

one thing to keep in mind, if you dont already know this, steve is disproportionately incentivized to have agentic coding and vibe coding become a huge success. his company owns a big toll booth on that road.

not saying this invalidates anything he said in any way. but just like when an analyst goes on tv to talk about a stock and they need to disclose their holdings, it’s important that we take things like this into consideration when having conversations about the future of coding tools.

Andrew O'Brien (Jun 07 2025 at 20:56):

I would’ve loved to know what his monthly spend is for that kind is setup.

I do appreciate him calling out some more hobbyist/learner friendly alternatives (which I’ve already forgotten). But he’s the second thought leader type to recommend Amp to me in the last 24 hours and I was already balking at that <$20/month line (I just told Adam Gordon Bell “I’m in this website and I feel personally attacked :sweat_smile:”).

Andrew O'Brien (Jun 07 2025 at 22:43):

BTW, does anyone remember what those lower cost agentic editors were so I don’t have to scrub through or wait for the transcript?

Matthew Sanabria (Jun 07 2025 at 23:23):

I don't. I use Claude Code with an API token but I'm eyeing the $100/month plan myself.

Andrew O'Brien (Jun 08 2025 at 00:11):

Ah, Cline was one of them.

Andrew O'Brien (Jun 08 2025 at 00:12):

What kind of usage does that represent? Side projects… workday?

Brian Buchholz (Jun 08 2025 at 17:27):

Great convo! One of the few AI-bullish takes that didn't make me feel utter dread :sweat_smile:

Actually made me excited to try this agent-baby-sitter approach. But I still have questions and skepticism. At one point Steve says:

“they're submitting double digits more PRs per time unit than their colleagues who aren't using agentic coding. Now, the AI submitted PRs, they get turned back more often, but the ones that are making it through are dwarfing the work of the people who are doing it by hand. And performance review time is coming, okay”

https://podcasts.apple.com/us/podcast/adventures-in-babysitting-coding-agents-friends/id341623264?i=1000711871435&r=2815

Firstly, I hope not many managers are evaluating devs based on # of PRs, which seemed to be the implication here. (I can also push loads of garbage PRs with high-rejected rates, if that is metric :) But secondly, who is reviewing all these PRs? Once changes are merged, who is validating new features? As long as humans are reviewing code (which I hope is a long time), and as long as there are other steps in the SDLC (QA, feature-flag rollout, A/B testing, user feedback, deciding where to iterate, etc), it seems like the bottleneck will just be shifted right. So having LLMs push lines of code 24/7 is only going to take productivity so far if the metric is not just LOCs written, but useful features shipped. This is another instance where toy projects, POCs, and demos are categorically different than real-world production grade software with multiple stakeholders and change-management processes.

jrwren (Jun 08 2025 at 20:31):

I don't know what to say after listening to this podcast other than I just plain don't believe him.

not sure if he is lying intentionally, or if his experience is some special niche

or if I'm just straight up wrong.

Brian Buchholz (Jun 09 2025 at 12:25):

https://youtu.be/0aJRtCn_WqM @5:40

Re monthly spend, found this on YT saying he spends $300-500 a day, so $100k a year!

Andrew O'Brien (Jun 09 2025 at 12:31):

It just occurred to me that this agentic AI workflow is kind of the “you’ll own nothing and be happy” of tech for me. If I just read a solarpunk book and can fantasize about taking quick, clean, and quiet transit to the tool library and then build a community art piece that kicks off a neighborhood celebration, I love it.

If it’s coming from the CEO of Über it sounds line a threat and I start considering a bunker in the wilderness.

Steve’s kind of in between so I’m conflicted.

Jerod Santo (Jun 09 2025 at 13:28):

I took Steve's suggestion and put Claude Code to work writing some scripts I've had on my TODO list for awhile and the results have been pretty impressive...

Brian Buchholz (Jun 09 2025 at 14:36):

Andrew O'Brien said:

It just occurred to me that this agentic AI workflow is kind of the “you’ll own nothing and be happy” of tech for me.
[...]
If it’s coming from the CEO of Über it sounds line a threat and I start considering a bunker in the wilderness.

Interesting analogy!

I also reject the premise that AI skepticism/fear is because engineers are averse to learning, which seems to be Steve's belief. I've always been eager to learn, and I understand that it's part of my job to keep current with latest tech (whether it's new languages+frameworks, cloud architectures, devops practices, or just parts of the stack outside my current expertise). And most engineers I know are very interested in learning. Maybe it's different in some niche positions at big-tech—Ive haven't observed that :man_shrugging:

For me, the skepticism/apprehension is due to the huge gap between the Hype+Narrative of AI and the reality on the ground. For 3 years the narrative has been that AI will continue to get exponentially better, and at best it's a silver bullet for every tech company to increase productivity, at worst it will replace all engineers and white collar jobs. Underpinning this is usually a claim about AGI being right around the corner.

The Reality is that it's a really powerful tool, but nowhere near those claims. But if you point that out, the retort is always, "yes but it will be in [6/12/24] months when the next model drops because look at how much it's improved."

It's unclear to me if AI will continue to improve or plateau. At the very least, it's NOT a guarantee that just because something increased for the first few years, it will continue to do so. (a) we've already trained on all the data, so now it's relying on generated data, which seems problematic (tho I'm not an expert here). And (b) there may just be fundamental limits to the LLM architecture, or practical limits like power consumption, data center size etc required to actually get to an "AGI".

What IS clear to me is how every tech C-Suite and investor seems more influence by the Narrative regardless of the Reality. They seem to be drooling at the mouth for a future where they can fire 90% of their staff, which, I suppose shouldn't be surprising, but is also quite disappointing and jarring, especially for folks driven to tech for the supposed job security.

The closed-door conversation between Steve and Dario referenced in this episode sounds like they believe this stuff is going to lead to a quite dire future. We should be optimistic about new tech, and the fact that it always feels so dystopian I think is because the Narrative. So we're going to destroy society so a handful of companies can make trillions of dollars?

If the narrative is true, then what are we even doing? I shouldn't be learning Claude Code, I should be getting my plumbers license or building a bunker. If the Narrative is false, it almost doesn't matter, because it seems the Narrative itself is driving a lot of changes in the industry.

Anyway, I use AI every day as a tool. It helps with a lot of things. I wouldn't trust it to push production without thorough review. And I'm still not sure if it's more efficient (long or short term) to babysit agents, which means spending all my time reviewing code vs just coding the thing myself, which is not perfect, but at least I know the approach is valid and comprehensible.

image.png
image.png
Source: https://www.goodtechthings.com/

AJ Kerrigan (Jun 09 2025 at 14:50):

jrwren said:

I don't know what to say after listening to this podcast other than I just plain don't believe him.

not sure if he is lying intentionally, or if his experience is some special niche

or if I'm just straight up wrong.

I had that feeling too. I've been in a much tamer AI-curious zone (more baby steps than baby sitting), and Amp's warnings about "if you want your costs limited and predictable don't use this" effectively warned me off it. So my experience/appetite/skill were all too far away from Steve's to practically relate. Still an interesting show though, it's useful to see what people with a ton of resources and AI focus/optimism are getting up to.

Jerod Santo (Jun 09 2025 at 15:08):

Yeah that's why I brought up the costs on the show. If he's right and we can get open models that reach the quality of Claude 4 in 6-12 months though... the costs drop significantly.

AJ Kerrigan (Jun 09 2025 at 15:11):

Yeah I appreciated you keeping things grounded for folks who aren't in the AI stratosphere :thumbs_up:

Andrew O'Brien (Jun 09 2025 at 16:49):

A couple of my coworkers have already placed preorders on Framework's desktops with the hope that the company will cover the cost when they ship in Q3 so we can self-host models (not just for dev). 16x5.1GHz cores + 128GB of DDR5 that the GPU can access directly seems pretty nice. But I'm waiting to see how it will compare for AI workloads with whatever else is out then.

Andrew O'Brien (Jun 09 2025 at 17:30):

Under-reported advantage of this workflow: tests and docs are immediately and undeniably useful.

Christopher Patti (Jun 10 2025 at 14:03):

I really enjoyed this discussion but man am I glad he addressed cost at the end. As he was speaking about running 4 agents at a time I kept thinking "MY GOD MAN just HOW MUCH is this costing?" and then he basically answered: Infinite $$$$. It makes me question whether the agentic AI coder future is just a tease and that most companies or organizations won't get there for a very long time.

Scott Abbey (Jun 10 2025 at 14:19):

Yeah, that's kind of the thing. Luckily my current employer runs the infra that a lot of this is built upon, so we get to play with most of this for "free". I can't imagine how much it'd cost them otherwise. Easily in the tens of thousands per dev per year. Probably more.

Jerod Santo (Jun 10 2025 at 14:46):

It's certainly too early for broad adoption but it appears that cost will be a race to the bottom. Just how long it takes to -10x or -100x the cost is still up in the air, though. Could be "6-12 months" like Steve suggested with open models catching up or could be 5-10 years...

James McNally (Jun 10 2025 at 16:44):

I have to say, off the back of this episode I tried Claude code after trying copilot agent mode and jetbrains Junie and it definitely feels like a much improved experience.

I kind of agree with the sentiment in news though that you need to write at least some tests as that is your spec. I have a couple of side projects waiting so I may try and write some cucumber tests and set it loose on them and see what it can do.

I'm still skeptical of some of the grand claims but I have to admit the progress has been impressive. I'm not yet convinced whether it is actually more productive though!

James McNally (Jun 10 2025 at 16:46):

It's like the claim in the episode that those using it were submitting more PRs but more are rejected. That suggests that those that remain are also lower quality but got under the bar, the real metric will be progress as a team 6 months down the line

Chris Duzan (Jun 10 2025 at 17:02):

Just started this episode and already had a few thoughts:

Cost (based on previous comments, sounds like it's addressed later)

Multiple agents in one codebase - What happens when multiple agents overwrite each other? Do they get in an endless loop, fixing each other back and forth?

Agent orchestration - I appreciate how Steve broke down how he's doing things and saying the agents can't handle a bunch of work at once, you have to take baby steps. My guess is that, while I'm sure the models will still continue to get better, the big value is going to start coming from agent orchestration. Instead of developers having to manage the steps (what would this look like, make a plan for this, do step 1, do step 2, etc...), the platform will do this for you and manage the agents behind the scenes.

Dixit Ram (Jun 11 2025 at 03:28):

Future of Software Engineer
532e5441-1f23-4aa7-98fb-a6d712b48dc4.png

Matthew Sanabria (Jun 11 2025 at 03:33):

Accurate since I can spend more time outside rather than at my desk.

Tim Uckun (Jun 13 2025 at 04:12):

Imagine a future where a generation of young men who are smart and energetic enough to learn to code can't get jobs. Imagine in addition those there are a huge number of young men who are laid off and have nothing but time on their hands and are desperate to make a living.

I am not hopeful for the future after listening to this. We all thought automation would go after unskilled positions and everybody could just skill up and get a better job. Now it's going after the highest skill jobs. Coders, designers, creatives, writers, doctors, lawyers, engineers, financial analysts, stock brokers, etc.

Ron Waldon-Howe (Jun 13 2025 at 04:16):

https://www.smbc-comics.com/comic/sad-2

image.png

Tim Uckun (Jun 13 2025 at 05:09):

devil_stand_color copy 2.jpg

Christopher Patti (Jun 13 2025 at 14:19):

James McNally said:

I have to say, off the back of this episode I tried Claude code after trying copilot agent mode and jetbrains Junie and it definitely feels like a much improved experience.

Thank you for this! You prompted me to actually take a look at the Claude plan I've been paying for, and it's the Pro ($17/mo one) so I can play with Claude Code!

valon-loshaj (Jun 14 2025 at 19:13):

Tim Uckun said:

Imagine a future where a generation of young men who are smart and energetic enough to learn to code can't get jobs. Imagine in addition those there are a huge number of young men who are laid off and have nothing but time on their hands and are desperate to make a living.

I am not hopeful for the future after listening to this. We all thought automation would go after unskilled positions and everybody could just skill up and get a better job. Now it's going after the highest skill jobs. Coders, designers, creatives, writers, doctors, lawyers, engineers, financial analysts, stock brokers, etc.

im more optimistic, but respect your take on this. i think that programmers will eventually be forced to work on higher level architectural problems, less on low level problems like getting the syntax right.

when these agents get really good and dependable, it will make 0 sense to spend time on those low level problems anymore. arguably i think employers are also going to be willing to pay people to spend time on those low level problems. some of this behavior could already be felt in the industry.

but, i think that it’s going to take longer than we think for the performance of these models to get there. at least for large scale code bases.

im reminded of a bill gates quote “…the things we think are going to take 10 years, take 10 months, and the things we think will take 10 months take 10 years…” or something like that.

ability for programmers to spend 80% of their time on the higher level problems is further than we think imo.

but if your company has accepted a f*@$ ton of money during your last funding raise then it’s in your best interest to make sure people think it’s “right around the corner”.

Tim Uckun (Jun 14 2025 at 21:31):

Reading the headlines it feels like the world is sitting on a keg of dynamite and the fuse has already been lit.

I am glad I live in the furthest corner of the world where the damage is likely to be less severe when the whole thing blows up.

Ron Waldon-Howe (Jun 15 2025 at 03:35):

There needs to be a site like https://www.web3isgoinggreat.com/ but for LLMs :P
LLMs are actually useful, unlike cryptocurrency, but there sure are too many people using them irresponsibly, and getting to the good parts of LLMs is probably not worth the carbon emissions

Ryan Nicoletti (Jun 15 2025 at 11:18):

So to summarize this guy thinks that people who are afraid of losing their job to ai and write code by hand lack imagination. But people vibe coding with llms, which by definition can only produce things created by others, are full of imagination??

He also is 100% certain that ai will be better than humans at coding, but somehow at the same time there will be more programming jobs for people to review code (written by machines smarter than the humans reviewing it???).

I mean how can anyone take this seriously??? Some really crazy takes…

valon-loshaj (Jun 15 2025 at 15:55):

i think what steve was describing was a world where the skill of coding as we know it today is transformed. similar to how the past skills of graphic design were transformed by photoshop.

in the future skills like hand-rolling code aren’t going to be as valuable. its gonna come down to taste and judgement.

history is littered with examples of this. i think it cant hurt to get ahead of it. dont give up on the skills that have been honed and mastered over the years, but pickup the skill of wielding llms for software development.

Gerhard (Jun 15 2025 at 19:37):

I have been coding for the last 20+ years almost daily. A big chunk of this was done pair-programming style. I don't default to pairing, but it often makes the activity more enjoyable. Pairing reminds me what is important: talking through the problem, sharing the context, comparing different approaches, etc.

An AI can be a great pairing companion. Not to replace, but to augment (like an exoskeleton). It can do certain things quicker & better than me, and the end-result is overall superior.

In this latest example, I leveraged AI to produce the code, a different AI reviewed the code, while I remained in full control. Some suggestions I didn't take because it was making the code harder to understand. Some of the review suggestions I am not going to take because the AI does not understand the nuances of how all the pieces come together. Overall it was helpful, the end-result is superior and was produced quicker than I could have done on my own.

My high-order bit perspective is:

OR is limiting, always look for the AND angle
Change is constant, inevitable, and always requires curation
Things are never as great - or as bad - as people make them out to be
Coding agents & MCPs are an important milestone, but staying connected to the LLM trajectory is what's more important to me

This is funny, partly true, and it matches my perception of the current state of coding agents:
1744810573844.png

Tim Uckun (Jun 15 2025 at 21:53):

History shows that people use automation to do things they don't want to do or to do things they don't want to take responsibility for.

Now this applies to code.

Let's see where this goes.

Dan Čermák (Jun 21 2025 at 14:45):

Christopher Patti said:

James McNally said:

I have to say, off the back of this episode I tried Claude code after trying copilot agent mode and jetbrains Junie and it definitely feels like a much improved experience.

Thank you for this! You prompted me to actually take a look at the Claude plan I've been paying for, and it's the Pro ($17/mo one) so I can play with Claude Code!

Do you know if there are any additional cost with claudes agentic mode? Just $20 a month sounds way too low compared to the $300 others are spending every day

Ash (Jun 22 2025 at 18:02):

@Dan Čermák Checkout my message here: #friends > 98: Just on the rocks @ 💬 . I'm using Claude Code with with the $20 a month subscription and haven't run into limits or anything

Dan Čermák (Jun 22 2025 at 18:19):

Ash said:

Dan Čermák Checkout my message here: #friends > 98: Just on the rocks @ 💬 . I'm using Claude Code with with the $20 a month subscription and haven't run into limits or anything

Is this your first billing period or already a second or later one?

Ash (Jun 22 2025 at 18:35):

Dan Čermák said:

Ash said:

Dan Čermák Checkout my message here: #friends > 98: Just on the rocks @ 💬 . I'm using Claude Code with with the $20 a month subscription and haven't run into limits or anything

Is this your first billing period or already a second or later one?

It’s my first billing session. But running ‘/cost’ in CC seems to confirm that I should get any surprise costs.

Dan Čermák (Jun 22 2025 at 21:44):

Ash said:

Dan Čermák said:

Ash said:

Dan Čermák Checkout my message here: #friends > 98: Just on the rocks @ 💬 . I'm using Claude Code with with the $20 a month subscription and haven't run into limits or anything

Is this your first billing period or already a second or later one?

It’s my first billing session. But running ‘/cost’ in CC seems to confirm that I should get any surprise costs.

I hope you meant to write "shouldn't" :sweat_smile:

Ash (Jun 22 2025 at 23:26):

Dan Čermák said:

Ash said:

Dan Čermák said:

Ash said:

Dan Čermák Checkout my message here: #friends > 98: Just on the rocks @ 💬 . I'm using Claude Code with with the $20 a month subscription and haven't run into limits or anything

Is this your first billing period or already a second or later one?

It’s my first billing session. But running ‘/cost’ in CC seems to confirm that I should get any surprise costs.

I hope you meant to write "shouldn't" :sweat_smile:

Oops yes shouldn't*** haha

Don MacKinnon (Jun 25 2025 at 17:51):

Dunno how many of you are Sourcegraph Cody subscribers but there is a big uproar in their Discord at the moment about the perceived rug pull with their sudden announcement of their Cody plan deprecations . Folks are angry

Alex Barnes (Jun 25 2025 at 19:18):

I've just being following along, as you say people are not happy. Can only assume they where losing money on it. Amp doesn't seem like a replacement although not used it my self. I use Cody myself daily. So will now be looking for a replacement

AJ Kerrigan (Jun 25 2025 at 20:13):

Yeah I'm in the same boat, Cody felt like a sweet spot for me where Amp was clearly too much. Bah.

Alex Barnes (Jun 25 2025 at 21:29):

I've had a quick play around with https://github.com/RooCodeInc/Roo-Code tonight that seems quite good. It's pay per token use, but feel more like Cody than amp.
I loved that I could use Cody in the browser to, it was my one stop shop for all llm use.

Tim Uckun (Jun 25 2025 at 22:07):

It feels like a rug pull because it is a rug pull. I heard the CEO of sourcegraph say on the changelog podcast that cody was going to be free forever because they were making money in the enterprise and the individual developer was not their market.

Gemini has a generous free tier so I am going to try that next.

Alex Barnes (Jun 25 2025 at 22:20):

I tried Gemini code assist tonight. It was very very slow. Wondering if I could just use Gemini free tier with an API key and Cody. I'm sure you can use your own API keys with it.

Don MacKinnon (Jun 25 2025 at 23:07):

I'm a Cody pro subscriber and I'm pretty disappointed. My whole thing is I wanted something with predictable costs. I paid for co-pilot in the past and found it to be terrible which is the reason I went with Cody.

Alex Barnes (Jun 26 2025 at 05:57):

Tabnine seems a viable alternative to cody, and only $9 a month

Ron Waldon-Howe (Jun 26 2025 at 06:32):

Tabnine is also more ethically and legally trained on BSD and other public domain sources, from memory

Alex Barnes (Jun 26 2025 at 07:27):

Yes I saw that. I think you can choose between the main frontier models and they also have there own protected model which is trained on licensed content.

Last updated: Jun 28 2025 at 14:14 UTC