vunderba 2 minutes ago

From the article:

> Claude declared victory and pointed me to the output/result.mmd file, which contained only whitespace. So OCR had worked but the result had failed to be written correctly to disk.

Given the importance of TDD in this style of continual agentic loop - I was a bit surprised to see that the author only seems to have provided an input but not an actual expected output.

Granted this is more difficult with OCR since you really don't know how well DeepSeek-OCR might perform, but a simple Jaccard sanity test between a very legible input image and expected output text would have made it a little more interesting and hands-off.

bahmboo 6 hours ago

I see a lot of snark in the comments. Simon is a researcher and I really like seeing his experiments! Sounds like the goal here was to delegate a discrete task to an LLM and have it solve the problem much like one would task a junior dev to do the same.

And like a junior dev it ran into some problems and needed some nudges. Also like a junior dev it consumed energy resources while doing it.

In the end I like that the chunk size of work that we can delegate to LLMs is getting larger.

  • Upvoter33 6 hours ago

    No offense, but I hate all the comparisons to a "junior dev" that I see out there. This process is just like any dev! I mean, who wouldn't have to tinker around a bit to get some piece of software to work? Is there a human out there who would just magically type all the right things - no errors - first try?

    • solumos 6 hours ago

      > And like a junior dev it ran into some problems and needed some nudges.

      There are people who don't get blocked waiting for external input in order to get tasks like this done, which I think is the intended comparison. There's a level of intuition that junior devs and LLMs don't have that senior devs do.

      • the-grump 4 hours ago

        To offer a counterpoint, I had much better intuition as a junior than I do now, and it was also better than the seniors on my team.

        Sometimes looking at the same type of code and the same infra day in and day out makes you rusty. In my olden days, I did something different every week, and I had more free time to experiment.

        • baq 5 minutes ago

          Hobby coding is imho a high entropy signal that you joined the workforce with a junior title but basically senior experience, which is what I see from kids who learned programming young due to curiosity vs those who only started learning in university. IOW I suspect you were not a junior in anything but name and pay.

          There’s also a factor of the young being very confident that they’re right ;)

    • conradev 2 hours ago

      Codex is actually pretty good at getting things working and unblocking itself.

      It’s just that when I review the code, I would do things differently because the agent doesn’t have experience with our codebase. Although it is getting better at in-context learning from the existing code, it is still seeing all of it for the “first time”.

      It’s not a junior dev, it’s just a dev perpetually in their first week at a new job. A pretty skilled one, at that!

      and a lot of things translate. How well do you onboard new engineers? Well written code is easier to read and modify, tests helps maintain correctness while showing examples, etc.

    • pedrosorio an hour ago

      > Is there a human out there who would just magically type all the right things - no errors - first try?

      If they know what they're doing and it's not an exploratory task where the most efficient way to do it is by trial and error? Quite a few. Not always, but often.

      That skill seems to have very little value in today's world though.

    • bahmboo 5 hours ago

      Point taken and I should have known better. I fully agree with you. I suppose I should say inexperienced dev or something more accurate. Having worked with many inexperienced devs there was quite a spread in capabilities. Using terms that are dismissive to individuals is not helpful.

amirhirsch 42 minutes ago

I also use Claude Code to install CUDA and PyTorch and HuggingFace models on my quad A100 machine. Shouldn't feel like debugging a 2000s Linux driver.

HuggingFace has incredible reach but poor UX, and PyTorch installs remain fragile. There’s real space here for a platform that makes this all seamless maybe even something that auto-updates a local SSD with fresh models to try every day.

qingcharles 5 hours ago

I did the opposite yesterday. I used GPT5 to brute force dotnet into Claude Code for Web, which finally involved it writing an entire HTTP proxy in Python to download nuget packages.

hkt 21 minutes ago

> There’s honestly so much material in the resulting notes created by Claude that I haven’t reviewed all of it

I've had the same "problem" and feel like this is the major hazard involved. It is tricky to validate the written work Claude (or any other LLM) produces due to high levels of verbosity, and the temptation to say "well, it works!"

As ever though, it is impressive what we can do with these things.

If I were Simon, I might have asked Claude (as a follow up) to create a minimal ansible playbook, or something of that nature. That might also be more concise and readable than the notes!

BoredPositron 9 hours ago

Compute well spent... finding out to download a version and hardware appropriate wheel.

  • prodigycorp 2 hours ago

    Don't ask how many human compute hours are spent figuring this out.

  • Zopieux 7 hours ago

    Gotta keep the hype up!

cat_plus_plus 6 hours ago

No idea why Nvidia has such crusty torch prebuilds on their own hardware. Just finished installing unsloth on a Thor box for some finnetuning, it's a lengthy build marathon, thankfully aided by Grok giving commands/environment variables for the most part (one finishing touch is to install latest CUDA from nvidia website and then replace compiler executables in triton package with newer ones from CUDA).

  • htrp 6 hours ago

    serious q, why grok vs another frontier model?

    • cat_plus_plus 2 hours ago

      Grok browses a large number of websites for queries that need recent information, which is super handy for new hardware like Thor.

varispeed 6 hours ago

I am the only one seeing this Nvidia Spark as meh?

I had it in my cart, but then watched few videos from influencers and it looks like power of this thing doesn't match the hype.

  • dumbmrblah 6 hours ago

    For inference might as well get a strix halo for half the price.

  • throwaway48476 5 hours ago

    Its also going to be unsupported after a few years.

syntaxing a day ago

Ehh, is it cool and time savings that it figured it out? Yes. But the solution was to get a “better” version prebuilt wheel package of PyTorch. This is a relatively “easy” problem to solve (figuring out this was the problem does take time). But it’s (probably, I can’t afford one) going to be painful when you want to upgrade the cuda version or specify a specific version. Unlike a typical PC, you’re going to need to build a new image and flash it. I would be more impressed when a LLM can do this end to end for you.

  • sh3rl0ck a day ago

    Pytorch + CUDA is a headache I've seen a lot of people have at my uni, and one I've never had to deal with thanks to uv. Good tooling really does go a long way in these things.

    Although, I must say that for certain docker pass through cases, the debugging logs just aren't as detailed

    • ComputerGuru 8 hours ago

      uv doesn’t fundamentally solve the issues. It didn’t invent venv or pip.

      What fundamentally solves the issue is to use an onnx version of the model.

      • simonw 8 hours ago

        Do you know if it's possible to run ONNX versions of models on a Mac?

        I should try those on the NVIDIA Spark, be interesting to see if they are easy to work with on ARM64.

        • ComputerGuru 3 hours ago

          Yup. The beauty of it is that the underlying ai accelerator/hardware is completely abstracted away. There’s a CoreML ONNX execution provider, though I haven’t used it.

          No more fighting with hardcoded cuda:0 everywhere.

          The only pain point is that you’ll often have to manually convert a PyTorch model from huggingface to onnx unless it’s very popular.

  • cat_plus_plus 6 hours ago

    You can still upgrade CUDA within forward compatibility range and install new packages without reflashing.