AI shenanigans · general · Zulip Chat Archive

Stream: general

Topic: AI shenanigans

Chris Duzan (Jun 26 2025 at 22:30):

A colleague just shared this article with me. I find it both hilarious and terrifying! :sweat_smile::scream:

https://www.anthropic.com/research/agentic-misalignment

AJ Kerrigan (Jun 27 2025 at 13:53):

"agentic misalignment" is quite a euphemism

Chris Duzan (Jun 27 2025 at 15:42):

Haha, yeah. A more apt (and dramatic) title might be, "The robots have gone rogue!"

Ron Waldon-Howe (Jun 28 2025 at 04:06):

This is precisely the scenario in https://avogadrocorp.com/ which was a good read
But now it's true, haha

Tim Uckun (Jul 08 2025 at 04:48):

Here is a win for Gemini CLI. I had a directory full of statements which were PDF files and I wanted to print the first page of every statement. I asked Gemini how I could take all the first pages and put them in a separate document and it said it would write a python script to do it and proceeded to do it. It installed the required libraries although it had to try twice because the mac doesn't allow global installs and it had to create an env for it but it did it. The whole process only took five or six minutes. I am not a python programmer but the code looked fine.

Win for Gemini.

Tim Uckun (Jul 08 2025 at 05:53):

Hey I nominate "AI Shenanigans" to be it's own channel.

Tim Uckun (Jul 08 2025 at 06:31):

I was bored so I thought I would ask AI some questions. I said that I wanted to build a cross platform desktop app and asked for a recommendation for a language and framework.

Gemini said Rust + Tauri was the best choice. It also said go was a good choice for backend, python with QT and JS with electron were second tier solutions. I asked why it didn't talk about java or kotlin and it said they would be good choices but listed some downsides it considered wouldn't make it the best tool.

I then asked it what languages it could help me most with and it said it listed python, javascript and java in that order. It listed go, rust, and kotlin as the second tier.

Tim Uckun (Jul 08 2025 at 06:36):

I then asked the same questions to chatgpt

It listed the best choices as (in order) Rust + Tauri, python with QT, C# with avalonia, Electron/JS, as a bonus it reccomended dart and flutter.

When I asked what it could help me most with it said python, rust and javascript in that order and listed other languages as second tier.

Tim Uckun (Jul 08 2025 at 06:39):

Next I went to deepseek.

Deepseek's list was in this order electron, flutter, rust, go, c#

It said it could help me best with rust, python and go in that order and other languages as second tier.

Tim Uckun (Jul 08 2025 at 06:48):

Claude only recommended rust + tauri and go + wails it put javascript, flutter, c++ and Java/kotlin in tier two.

When I asked it what it was best with it said Javascript (excellent), go (very good), rust (good). It used those adverbs.

In the end it said the final choice should be go.

My Recommendation

Start with Go + Wails - I can provide the most comprehensive help with both the backend file operations and the frontend integration. The combination of Go's simplicity and my strong JavaScript knowledge means I can guide you through the entire stack effectively.

Tim Uckun (Jul 08 2025 at 06:50):

Deepseek was very gung ho on rust, it even wrote up a snippet of rust to demonstrate a functionality.

They all offered to immediately start the project of course.

Tim Uckun (Jul 08 2025 at 06:52):

I honestly thought they would all reach for java first to write a cross platform desktop app none of them seemed to think it was a good idea. I was also shocked that any of them reccomeded writing a desktop app in python. That seems nuts to me but clearly they have all been trained on a mountain of python code.

Tim Uckun (Jul 13 2025 at 11:40):

Man I gave myself a headache playing with gemini cli some more. For the last couple of days I have been asking it to write an app in different languages. This time not a web app. I created a simple task. Take a list of directories, analyze the files in those directories, create a catalog in an sqlite file which lists every file and directory, some metadata about the files, and for the directories the number of files in the hierarchy and the total size of the directory (including subdirs). I specifically asked it to take advantage of concurrency features of the language to make it as performant as possible. The languages I asked it to code in were ruby, typescript using bun, go, and rust.

It wrote the ruby code without issues but it wasn't multi threading properly so I asked it to fix it and it did. Then I asked it to also use fibers so I can test the difference and it did. It used unusual strategies to process the directories and to calculate sums. It used SQL queries to calculate sums which was interesting to me.

It wrote the bun code but had no concurrency. I asked it to use workers but it couldn't manage that properly.

It completely fucked up the go code. I tried to work with it going back and forth but tired of it so I abandoned it.

It wrote the rust code and it ran. I don't know rust so when I opened up in vs code I asked the gemini vs code plugin to explain it to me and it said "you should change this code to like this" and so I did. I ran the code a few times running into errors and each time it fixed it's error so in the end I had a working app.

Here is the headache part. The ruby code runs about as fast as the rust code. Like WTF? They both give different results for the number of files and the total size of all the files. What's worse is that both of these numbers disagree with du -s . and find . -type f | wc -l which in turn disagrees with finder when I click on get info.

I guess a man with four watches never knows what time it is.

Tim Uckun (Jul 13 2025 at 11:42):

Oh and I should add that it used completely different strategies to traverse the directory and calculate things for every language. None of them were I would have done so I might have to code it my way and compare that too.

Ron Waldon-Howe (Jul 14 2025 at 00:49):

concurrency is a fun one, i guess if humans so regularly get it wrong then an LLM can probably be forgiven for not making the best use of it, either
except the LLM is convinced it has the correct answer each time

Tim Uckun (Jul 14 2025 at 22:18):

I thought for sure it would be able to do the channels and such properly so that it didn't run into database contention issues. I specifically included instructions to be aware of database contention issues in the specs.

BTW. One common thing in the generated code was that it didn't follow all the instructions in any of them. It just skipped some things I told it do for some odd reason. I had to say "what about this feature" or something and it would do it.

One shot is still a dream in gemini. I would try with claude code but I don't want to pay. Gemini does 60-70% after a few go arounds for free.

Ron Waldon-Howe (Jul 15 2025 at 00:51):

with rolling context windows, i believe some systems summarise the existing context to make room for the user's next prompt, so maybe that's how it's throwing away some of your details?

Nabeel S (Jul 15 2025 at 02:16):

I'm surprised it failed worse in go than in other languages, since go overall is a smaller and simpler language.

Ron Waldon-Howe (Jul 15 2025 at 03:28):

Nabeel S said:

I'm surprised it failed worse in go than in other languages, since go overall is a smaller and simpler language.

If we take "simple" to the Latin "simplex" -> "one thread, no weaving of multiple threads" then I don't think "simple" describes Go
It's far more magical and inconsistent and surprising than one would expect: https://fasterthanli.me/articles/lies-we-tell-ourselves-to-keep-using-golang

That said, I believe the Go team prioritises legibility, and I agree that it's much easier to read Go code than e.g. Rust or C++

Nabeel S (Jul 15 2025 at 05:02):

Fair! I think "small" is a more objectively accurate description :slight_smile:

Tim Uckun (Jul 15 2025 at 22:03):

I think it keeps failing in go because the ecosystem in go is pretty chaotic. People don't use frameworks so everybody hand rolls their own implementation of everything all of which are slightly different than each other. When the LLM takes a snippet of this and a snippet of that they don't work together.

What doesn't help is that go is too simple as a core language so you can write things in myriad of ways, there is not a pythonic "there is only way to do things", all you have is a pile of tiny legos and you have to assemble them into an airplane or an construction site. To strain the analogy other languages give you propellers, wheels, circular bricks etc to build the same thing so it's a lot more obvious with them.

Finally go is ideologically and structurally opposed to building abstractions. There is a reason you don't have classes, or function/method overriding, enums, decorations, macros etc. they want everything to be explicit and verbose as possible. This means every codebase in go contains a lot of setup and ceremony to accomplish simple things and when chunking these projects may be split up in weird places where you include the implementation in one chunk but the setup and ceremony is on another chunk and they need to be together in order to work properly.

Those are my guesses as to why it failed so hard. I honestly thought Gemini would be the best at go given how much go Google has internally to train on.

I might try it with claude to see if it's better.

Ron Waldon-Howe (Jul 15 2025 at 22:28):

Go was the first language I encountered with first class built-in channels, and I think this was the case for a lot of developers
I remember reading somewhere that traditional Mutex approaches in Go tended to have fewer bugs than code using channels
But because channels were the "new thing", there's probably a lot of that in novice code, and a lot of that was probably ingested during LLM training

Tim Uckun (Jul 16 2025 at 22:22):

You bring up another good point. At one time channels were the rage and after a while the community decided it was a footgun to be avoided. So you have a corpus of code that does things both with channels and without. I think there are a couple of other things in the go ecosystem where the community changed it's mind. Generators come to mind in this category. For a long time it was hotness then people basically stopped using them and now it's come back around with the likes of templ.

Chris Duzan (Jul 23 2025 at 15:15):

I was listening to an interview with the founder of Railway (https://railway.com/) - episode (https://open.spotify.com/episode/2XHC2BVsNjKyCt7yFFpC0k?si=Pij5ukx3TXOIACSlFEY4IQ)

It got me thinking. With the rise of vibe coding, do the "easy button" tools win? I'm not familiar with Railway specifically. But, do companies like Vercel and Render get a big uptick in usage because non-technical people are building apps and they want the lowest friction path to deploying it to the public?

Chris Duzan (Jul 23 2025 at 15:17):

My thought is yes. But maybe all of these are locally hosted POCs and once someone gets serious about using a tool/app publicly, they actually get technical people around them.

Chris Duzan (Jul 23 2025 at 15:22):

It seems like the company that can build the easiest route to getting things running publicly has a chance for a big payoff as the path to creating an app is getting a lot easier and therefore, more people want to deploy these apps.

And I'm talking about more than just hosting a self-contained app (I think vibe coding tools already do this).

Tim Uckun (Jul 24 2025 at 22:08):

Honestly I think that's not too difficult. With a robust framework like rails you could easily do this. Fine tune the model for rails (which is an effort already on the way), pick a handful of gems, craft a good prompt and you could generate any mobile or web app quickly and cheaply. Rails has already done 80% of the work for you.

Last updated: Dec 14 2025 at 10:42 UTC