Show HN: Tool to Automatically Create Organized Commits for PRs

github.com

74 points by edverma2 a day ago

I've found it helps PR reviewers when they can look through a set of commits with clear messages and logically organized changes. Typically reviewers prefer a larger quantity of smaller changes versus a smaller quantity of larger changes. Sometimes it gets really messy to break up a change into sufficiently small PRs, so thoughtful commits are a great way of further subdividing changes in PRs. It can be pretty time consuming to do this though, so this tool automates the process with the help of AI.

The tool sends the diff of your git branch against a base branch to an LLM provider. The LLM provider responds with a set of suggested commits with sensible commit messages, change groupings, and descriptions. When you explicitly accept the proposed changes, the tool re-writes the commit history on your branch to match the LLM's suggestion. Then you can force push your branch to your remote to make it match.

The default AI provider is your locally running Ollama server. Cloud providers can be explicitly configured via CLI argument or in a config file, but keeping local models as the default helps to protect against unintentional data sharing. The tool always creates a backup branch in case you need to easily revert in case of changing your mind or an error in commit re-writing. Note that re-writing commit history to a remote branch requires a force push, which is something your team/org will need to be ok with. As long as you are working on a feature branch this is usually fine, but it's always worth checking if you are not sure.

9dev a day ago

The idea in itself seems good, but I have a lot of hesitation due to the prompt used to rewrite the commit messages. Even looking at the example in the repo there’s this sycophantic, pompous way of describing mundane things that adds nothing, but only makes it harder to understand what has changed. The commits mentioned don’t "implement a complete auth system" and did not add "comprehensive test coverage". They added parts of an authentication system and some tests.

I’m all for proper commit messages, but only if they add clarity, not take it away.

  • diggan 20 hours ago

    Bit more general complaint: I keep seeing projects that use system/user prompts and doesn't let the user override them in any other way than manually modifying the project yourself.

    Since models reacts so differently to different prompts, and people have different requirements, I feel like a basic requirement for a generally useful tool using LLMs should at least facilitate overriding both the system and user prompts wherever they're being used.

    I remember coming across this with Aider as well, and seeing some things that were out of place (for me) in the prompts, but with no way of changing it, I had to rewire things myself so I could override them.

scottgg a day ago

Since moving to jj[1] as a git-compatible alternative, I’ve found it so easy to make clean commits I do it by default for everything - usually 1/ refactor 2/impl, 3/ docs. Because you can always just “jj new” on top of an existing change then squash it down and get automatic rebase past that point it’s quick to keep things organised and makes review life suck less.

[1] https://github.com/jj-vcs/jj

  • diggan 20 hours ago

    > you can always just “jj new” on top of an existing change then squash it down and get automatic rebase past that point

    Never used jj, but isn't that just `git commit --amend`? Lets you add/remove/change changes from the previous commit by basically overwritting it with a new changeset+message.

    • scottgg 14 hours ago

      You can do it to any change in the repo and everything gets automatically rebased downstream of it - so if you three nicely structured changes you can go change the first one if you need and it just works.

      I’m selling it short a bit - it does a lot more! There’s a great Steve Klabnik tutorial [1]. To me the main thing is it makes it very easy to think and work in terms of logical changes.

      [1] https://steveklabnik.github.io/jujutsu-tutorial/

    • trust_bt_verify 16 hours ago

      Sounds like it may be closer to ‘git commit —fixup HEAD’ but same idea.

amenghra a day ago

https://graphite.dev/ provides a way to stack PRs, it's been discussed on HN in the past (e.g. https://news.ycombinator.com/item?id=30681308).

  • mindfulmark 21 hours ago

    I’ve reviewed stacked PRs a couple of times and found it pretty terrible. The only one that ends up making any sense is the first. Better off with either just one single big PR, or don’t ask anyone to look at the next PR until the first one is merged.

  • adobrawy a day ago

    At my company we use git squash PR merge strategy. This makes individual commits irrelevant, but PRs as a whole do. I use git town for stacked PRs. It's very nice to do another brunch when I've finished a logical stage, because small changes are reviewed and I merge often. When I have fixes, "git town sync" propagates up the stack automatically

stpedgwdgfhgdd 18 hours ago

Probably not a popular take on HN; valuable information should not be hidden in commits but in comments. Especially the WHY is crucial to write down in comments.

I’m fine that people squash as long the reasoning is recorded in comments and reflected through automated tests (unit AND system/api).

This is also crucial information for AI coding tools.

bredren a day ago

This is a cool idea, though part of what keeps my work organized and my understanding of my own changes is to do the manual preparation of a series of logical commits.

I make use of interactive partial commits using Pycharm when a single file has changes related to different ideas and rewrite history for clarity.

It does matter to me if someone else has gone to this trouble. And it is sometimes a tip off if a person is seemingly sloppy in commit history.

It makes sense that this project exists but I’m also glad it didn’t when I was learning to work in professional environments.

CityOfThrowaway a day ago

I haven't tried this yet (though I plan to).

One thing I would love is if I could give it a hint and have it extract out certain types of changes into its own branch that could split into a new PR.

I often find myself adding a new, re-usable component or doing a small refactor in the middle of a project. When you're a few commits into a project and start doing side-quests, it's super annoying to untangle that work.

The options are one of:

1. A mega PR (which everybody hates) 2. Methodologically untangling the side quest post-hoc 3. Not doing it

In principle, the "right" thing to do would be to go checkout main, do the side quest, get it merged and then continue.

But that's annoying and I'd rather just jam through, have AI untangle it, and then stack the commits (ala Graphite).

It's easy to verbally explain what stuff is side-quest vs. main quest but it's super annoying to actually do the untangling.

Maybe this tool magically can do that... but I do wonder if some context hints from the dev would help / make it more effective.

  • matijsvzuijlen a day ago

    In such cases, I generally checkout main, do the side quest, and rebase my original branch on top of it. Then, I can just continue without waiting for the side quest to be merged. Depending on the situation, I may make a separate PR for the side quest, but I don't have to wait for it to get merged.

    Alternatively, if you don't like rebasing, you can merge the side quest branch into your project branch instead.

    Either way, you don't have to wait for the side quest to get merged.

  • edverma2 a day ago

    Interesting! I’ve faced the same problem where I have a mega PR and spend a lot of time breaking that up into separate PRs. I agree that what you are suggesting is a different but related problem to what this tool currently solves. I’ll start thinking through how this would look, and I’ll go ahead and make a GitHub issue if you or anyone else wants to start a discussion there.

  • 2YwaZHXV a day ago

    I haven't tried it much but this seems like precisely the problem gitbutler is trying to solve

aaronbrethorst a day ago

Very cool, I’ve been looking for something like this, but I’d love to see this flipped around and become an MCP tool that can be consumed by an LLM instead of requiring an API key.

esafak a day ago

I want something that takes a big commit and splits it up!

jaredsohn a day ago

I didn't look at things too closely, but it would be nice if this each commit would include a ticket number from the branch (such as a linear id) and/or pr id in each commit for people who do not squash.

One huge advantage of squashing branches is if you see a commit in a `git blame` you might have an idea of where it came from within GitHub/Linear/other systems.

  • figmert a day ago

    Surely this would be the job of a prepare-commit-msg/commit-msg hook?

    • jaredsohn a day ago

      Yeah, could be a separate tool. Another tool could be a GitHub action that checks if the ids are in each commit; I built a prototype of that for work but it needs some polish and permission for a more general release.

      Think these tools together make non-squash PRs much easier to deal with for people who don't want to spend the extra time tidying things up.

      BTW, really excited to see this. I was thinking about the concept in a thread last year https://news.ycombinator.com/context?id=40765134

      "I look forward to the day that I can run a local LLM (for confidentiality reasons) to automatically reorganize commits within my PR. Should be very safe compared to generating code or merging/rebasing other code since it is just changing the grouping of commits and the final code should be unchanged."

iandanforth a day ago

Can I suggest, don't do this? The sustainable unit of code modification is the ticket, not the commit. When you're ready to merge to main, squash all commits into one that takes the ticket title as its commit message and appends the ticket description as its description. This aligns code changes with planned, scoped, and documented units of work. Anything more granular than that quickly becomes noise. By following the above pattern your main commit history becomes a clean, consistent log of tickets being completed, each linking directly back to your ticket management system.

  • Fire-Dragon-DoL a day ago

    I don't get it. Assuming your commits have meaningful messages and description, why squashing them? You can use git log --merges-only (I think that's the arg name) and all the sub commit are hidden, however you can still search through them to get a clear idea of why something was refactored a certain way.

    So,where is the noise if it can be hidden but also used to get additional information on code changes?

    • dakiol a day ago

      In my experience, when people need to know what commit belongs to what code, is the IDE the main driver: you see the line of code in the editor and automatically know what commit introduced it (e.g., via the git integration in the gutter). Typically you want to know as well all the other lines of code that were committed together (e.g., what does it take to implement feature X? Ah yeah, all these changes).

      I couldn’t care less what changes were introduced in a particular commit (what for? Debugging? Your features should be small enough that debugging them shouldn’t be a pita anyway)

    • baq a day ago

      If you need to use —-merges-only you might as well squash everything, why bother.

      • Fire-Dragon-DoL a day ago

        I use merges-only when I need to look at the history "index" and I use the full history when I need to find out why a certain line was changed, kinda like a book has an index but there are also all the chapters with all the content

  • CityOfThrowaway a day ago

    The best practice that I've seen done (and enjoy) is:

    1. Always squash and merge 2. But have tidy commits 3. So your squashed commit has a useful list of what is in the commit

  • collingreen a day ago

    I don't agree with most of this but my good faith take is that my experience trends closer to your opinion the more stable and large a codebase is plus how dedicated and multidisciplinary the team is. I disagree the most for young, volatile projects with small eng-only teams that are responsible for many projects at once.

  • bredren a day ago

    This project seems focused on preparing branches for review by giving them clearly communicated commit histories.

    I think the intent is that it be up to the org to decide whether they’ll accept a stream of commits to merge or squash merge after approval.

kelseydh a day ago

I would love Github to integrate this, as that is typically where I am writing my squash commit messages (when merging Pull Requests).

Cthulhu_ a day ago

In our org we squash all commits into one anyway, the main commit title is based on the title of the merge request. We also have an AI code review tool set up (which I usually ignore because there's a lot of extraneous information) that suggests a new title, given that the people making the MR often don't consider that the title ends up in the changelog, becoming the one line that will be used by people using the library to decide whether they need to do something.

3D39739091 20 hours ago

Why not just do your work in an organized way in the first place?

bckr a day ago

I was just talking about writing a tool like this. Bookmarked. Thanks.

maomaomiumiu a day ago

Interesting tool. Automating commit cleanup could save time, especially before reviews. Curious to see how well it handles larger or more complex diffs.

bananapub a day ago

I had a look through the tests, but it doesn’t seem you do any testing of of this does a good job or not?

How did you collate hundreds or thousands of examples of commits being split up and how did you score the results LLMs gave you? Or did it take more than that?

nrvn a day ago

In my org I have enforced linear history, squashing all commits into one in PRs and roughly following the rule from [1]:

> If the request is accepted, all commits will be squashed, and the final commit description will be composed by concatenating the pull request's title and description.

One less thing to think about.

Less is more, not vice versa.

[1]: https://go.dev/doc/contribute#review

  • f1shy 21 hours ago

    Always squashing is a terrible idea. Where I work are some that insist in doing that, and already dozens of times we have los valuable information. Let me quasi-cite somebody with some knowledge about git:

    Linus Torvalds generally prefers not to squash commits when merging pull requests into the Linux kernel, especially when the individual commits have valuable information or context. He believes that squashing can discard useful history and make it harder to understand the evolution of the code. However, he also acknowledges that squashing can be useful in certain situations, such as when dealing with a large number of commits that are not relevant to the main development history

    • scottgg 20 hours ago

      I reckon that nice commits breaking up a big PR that can be merged straight to main is the best outcome.

      But - I also think that always squashing is a natural reaction to "twiddling with the past" being difficult (but possible!) with git - e.g., you start with good intentions, you have your nice commit messages, but inevitably you need to go back and make some changes to changes and the "chore: unfuck it for real this time" style "fixup" commits start creeping in and you throw your hands in the air in despair rather than dare to cross `git rebase` once more.

    • claytonjy 18 hours ago

      I used to like allowing squashing or fast-forward merges. Most PRs would be squashed, because most developers write terrible commit messages and use merge where they could rebase. But, if you had a well-crafted set of commits, we could retain them.

      I’ve recently switched to using conventional commits and release-please everywhere, but that pretty much forces us into a squash-only world, since even the devs who write nice commit messages don’t want to make each commit a conventional commit; much nicer to do it as the PR title, and more visible

h1fra a day ago

I'll never understand people caring about commit in PRs, just push whatever and squash at the end. If commit matters that means you should have done multiple PRs

  • jon-wood 21 hours ago

    And I will never understand people not caring about commits and just going "we'll squash it all anyway". I think of these things as different levels of granularity, the PR is a complete feature, while the commits are the steps taken to get to that feature. By splitting your PR into coherent commits you make review easier by allowing each commit to be reviewed in isolation.

    It will forever infuriate me that Github's code review UI buries commits in favour of a big blob of changes without context, more so because that UI is generally considered Good Enough by most people, discouraging any innovation around code review.

    • habosa 19 hours ago

      Actually, at this point GitHub’s total failure to make a decent code review UI has inspired a lot of innovation!

      I think graphite.dev is the most well-known, but I’m also a fan of others reviewable.io and codepeer.com … I know this space well because I made codeapprove.com to improve code review on GitHub.