Effective AI code suggestions: less is more

52 points by swyx 8 months ago

turnsout 8 months ago

This matches my experience working with LLMs. I've built several applications that require an LLM to consider several factors or "zoom levels," and I have yet to work with a model that can do that in a single shot. Instead, you need to have multiple passes for each area of focus. Rather than "edit this manuscript," you want "find all the typos," then "find all the run-on sentences," etc.

GavCo 8 months ago

Interesting. I wonder if this is related to the model architecture and attention mechanism.
The author seems to be implying it could be: "Even a single mention of ‘code enhancement suggestions’ in our instructions seemed to hijack the model’s attention"
- jimminyx 8 months ago
  
  The attention is probably just latching on to strong statistical patterns. Obvious errors create sharp spikes in attention weights, and drown out more subtle signals that can actually matter more
timbilt 8 months ago

The weirdness of LLMs is that they're so damn good at so many things but then you see these glaring gaps that instantly make them seem dumb. We desperately need benchmarks and evals that test these kinds of hard to pin down cognitive abilities
- turnsout 8 months ago
  
  Absolutely. This is not a new observation, but another thing they struggle with is self-reporting confidence intervals. When I've asked LLMs to classify/tag things along with a confidence metric, the number seems random and has no connection to the quality or difficulty of the classification.

coldcode 8 months ago

XCode's assistant AI suggestions are 99% ridiculous, often don't make sense, don't compile, or have nothing to do with my project. Every once in a while it suggests a line or so that is fine. The biggest issue is I want code completion and it appends extra items that make no sense at all, requiring that I either backspace a bunch, or have to use a mouse to pick from the menu. I haven't used Copilot or one of the others, so don't know if this is normal or Apple's is just stupid.

sippeangelo 8 months ago

So not normal. Try Cursor.
- zxvkhkxvdvbdxz 8 months ago
  
  As long as the code I'm writing matches the training data it is always correct.
  
  thbb123 8 months ago
  
  Then it's likely an opportunity for factoring. The ideal is to never have duplicate code

intrepidsoldier 8 months ago

So even the LLMs want to keep doing just easy stuff given a choice?

mring33621 8 months ago

"the model consistently gravitated toward style-related suggestions"
sounds very similar to many human-attended code reviews
- dietr1ch 8 months ago
  
  I find some stuff really distracting. Sometimes I just get a local copy to get them out of my way as most are trivial to fix, but hard/draining to ignore.

azhenley 8 months ago

I ran a study at Microsoft years ago that’s relevant to this. Can I inject static analysis warnings into code reviews? How many? Which ones? What about nearby unchanged code?

https://austinhenley.com/pubs/Henley2018CHI_CFar.pdf

itamarcode 8 months ago

AI code review will very probably have signal-to-noise problems. It is good to see practical solutions aimed at addressing this. I wonder if fine-tuning models would help - this isn't addressed in the blog.

esafak 8 months ago

Is there a benchmark for AI code suggestion, or some comparative review? I don't have the time to test them all; I wonder if I could be using a better one?

archon810 8 months ago

I'm intrigued by Qodo after discovering then through this blog post.

Does anyone have experience using its free tier, say, with Github?

gronky_ 8 months ago

we started using it recently at my work. the code changes walkthrough is nice

GodModee9 8 months ago

Love this idea. Fewer but better suggestions make coding simpler and clearer.