coldcode 17 hours ago

XCode's assistant AI suggestions are 99% ridiculous, often don't make sense, don't compile, or have nothing to do with my project. Every once in a while it suggests a line or so that is fine. The biggest issue is I want code completion and it appends extra items that make no sense at all, requiring that I either backspace a bunch, or have to use a mouse to pick from the menu. I haven't used Copilot or one of the others, so don't know if this is normal or Apple's is just stupid.

  • sippeangelo 16 hours ago

    So not normal. Try Cursor.

    • zxvkhkxvdvbdxz 12 hours ago

      As long as the code I'm writing matches the training data it is always correct.

      • thbb123 4 hours ago

        Then it's likely an opportunity for factoring. The ideal is to never have duplicate code

intrepidsoldier 20 hours ago

So even the LLMs want to keep doing just easy stuff given a choice?

  • mring33621 17 hours ago

    "the model consistently gravitated toward style-related suggestions"

    sounds very similar to many human-attended code reviews

    • dietr1ch 16 hours ago

      I find some stuff really distracting. Sometimes I just get a local copy to get them out of my way as most are trivial to fix, but hard/draining to ignore.

turnsout 19 hours ago

This matches my experience working with LLMs. I've built several applications that require an LLM to consider several factors or "zoom levels," and I have yet to work with a model that can do that in a single shot. Instead, you need to have multiple passes for each area of focus. Rather than "edit this manuscript," you want "find all the typos," then "find all the run-on sentences," etc.

  • GavCo 18 hours ago

    Interesting. I wonder if this is related to the model architecture and attention mechanism.

    The author seems to be implying it could be: "Even a single mention of ‘code enhancement suggestions’ in our instructions seemed to hijack the model’s attention"

    • jimminyx 18 hours ago

      The attention is probably just latching on to strong statistical patterns. Obvious errors create sharp spikes in attention weights, and drown out more subtle signals that can actually matter more

  • timbilt 18 hours ago

    The weirdness of LLMs is that they're so damn good at so many things but then you see these glaring gaps that instantly make them seem dumb. We desperately need benchmarks and evals that test these kinds of hard to pin down cognitive abilities

    • turnsout 18 hours ago

      Absolutely. This is not a new observation, but another thing they struggle with is self-reporting confidence intervals. When I've asked LLMs to classify/tag things along with a confidence metric, the number seems random and has no connection to the quality or difficulty of the classification.

itamarcode a day ago

AI code review will very probably have signal-to-noise problems. It is good to see practical solutions aimed at addressing this. I wonder if fine-tuning models would help - this isn't addressed in the blog.

esafak 20 hours ago

Is there a benchmark for AI code suggestion, or some comparative review? I don't have the time to test them all; I wonder if I could be using a better one?

archon810 21 hours ago

I'm intrigued by Qodo after discovering then through this blog post.

Does anyone have experience using its free tier, say, with Github?

  • gronky_ 18 hours ago

    we started using it recently at my work. the code changes walkthrough is nice