Love, Beej
((cd /tmp/t; find . -type f -print) | sort | while read f; do cmp -s {/tmp/t,/tmp/t1}/$f || vim -f -d {/tmp/t,/tmp/x1}/$f 0<&9 || break; done) 9<&0
Typing ^C to vim doesn't get you very far, so if you make a mistake causing the loop to return 1000's of files you are in for a bit of pain without :cq. The :cq triggers the break, exiting the loop.Section 5.1 (https://beej.us/guide/bggit/html/split/branches-and-fast-for...)
> The default branch is called main.
> The default branch used to be called master, and still is called that in some older repos.
This is not true. Git still defaults to `master`, but allows you to change the default (for future `git init` invocations via `git config --global init.defaultBranch <name>`)
See https://github.com/git/git/blob/bc204b742735ae06f65bb20291c9...
Again, thank you. If I find anything else, I will be sure to post here.
*Update*: I also feel that referring to "older repos" sends the wrong message. *GitHub* decided to make this change, causing others to follow, and finally Git itself allows for the aforementioned configuration, but it has little to do with _newer_ or _older_, but rather preference.
> hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 'development'. The just-created branch can be renamed via this command:
That's going to make it more "interesting" to write the fix, that's for sure.
Thanks!
All that said, they are really useful. And, honestly, the chapter would be pretty short to get basic usage down... but also if you've gotten as far as grokking how branches work, it's pretty easy to pick up worktrees. The fact that lots of people don't know they exist is points for adding it just for that reason alone.
I'll mull it over. :) Cheers!
Thanks for all your guides over the years. Truly invaluable.
Appreciate the work! Neat to see you still writing pieces like this all these years later!
There are two things I suggest as workflows for people when I teach them about rebase workflows.
> Since rebase “replays” your commits onto the new base one at a time, each replay is a merge conflict opportunity. This means that as you rebase, you might have to resolve multiple conflicts one after another. ... This is why you can conclude a merge with a simple commit, but ...
For multiple conflicts on several commits being replayed, if it's _not_ useful to go through them all one at a time, I suggest that people do a squash first rebase from the fork point (which by definition can not have conflicts) to collapse their commits into a single commit first, and then rebase again from the branch.
For instance, if forked from main:
git rebase -i `git merge-base main --fork-point`
Squash all of those, and then as usual: git rebase -i main
Second, when rebasing repeatedly onto an evolving branch over time, you'll often find yourself resolving the same merge conflicts over and over again."rerere" (https://git-scm.com/book/en/v2/Git-Tools-Rerere) will allow git to "reuse recorded resolution" so that you don't have to do them manually each time.
My gitconfig for these:
[alias]
forked = "!f() { git merge-base $1 --fork-point; }; f"
squash-first = "!f() { git rebase -i `git merge-base $1 --fork-point`; }; f"
[rerere]
enabled = trueWithout your tutorials I’m not even sure if I would have chosen the carreer I did- thank you for all the love and effort you put into your posts; Im sure that there are many other people who you’ve touched in a similar way
I'm delighted to see that you're still active and still producing guides. Well done!
Along with many others here, your network programming guide helped me so much back in the early days of my education and career. So thanks for that too…
So in figure 5.4 you say we merge 2 commits into a new one and somehow both branches point to new commit. This will definitely confuse people new to git.
I'd say it's better to write we merge anotherBranch into someBranch and leave the former where it is. Same for the next merge.
Just a suggestion
Let me see if I can do that and save the clarity.
> But in this section we’re going to be talking about a specific kind of merge: the fast-forward. This occurs when the branch you’re merging from is a direct ancestor of the branch you’re merging into.
Looks like "from" and "into" are swapped: "main" is "into" there, "newbranch" is "from", and "main" is a direct ancestor of "newbranch".
In 9.4 there's no way reallinux/master points to same commit as master after the merge. It will still be where it was, one commit behind.
in my experience, strong writing and communication skills are one of the best ways to stand out as an engineer -- and your articles are maybe the best example of this out there. keep on setting a great example for us. :)
I've fixed it, and wrote a quick script to list all issues and PRs on all my books so they don't fall through the cracks.
I will be forever grateful for your work and its improvements of my life.
ස්තුතියි
"Let's say you modified foo.txt but didn't add it. You could: <git command>"
Followed by:
"And that would add it and make the commit. You can only do this with files you added before."
Wait, what? So, I modified foo.txt but didn't add it, and then the command to add and commit at the same time can only be done with files I did add before?
Guide was working great to heal years of git trauma up until that point though!
[0] https://beej.us/guide/bgnet/ [1] https://beej.us/guide/bggit/
Thank you Beej.
(I didn't read the IPC guide.)
I didn't even know git switch existed, let alone git checkout was considered the old alternative. I feel old.
To be fair I started learning git a little less than 10 years ago but woah, I can't express how it feels that someone learning git today will be confused of why I use git checkout. Like using old fashioned language.
More on topic, this guide would've been super useful when I was learning. It is really easy to follow and covers common FAQs.
I fondly remember being intimidated by my first merge conflict, aborting it and just doing some workarounds to prevent the conflict.
Here's, respectively, a discussion from 2021, and a discussion from a few weeks ago. In the latter, it's brought up that `git switch` is still considered experimental by the docs:
I don't think "git checkout" is considered the "old alternative", at least not yet. Last time I checked, `switch` is still experimental, I haven't even considered moving away from the workflows/commands I first learned when I picked up Git ~15 years ago. Everything I want to do still works exactly the same (`git checkout` still does the exact same stuff as before), and I'm able to collaborate with everyone else using git, why change workflow then?
Commits are snapshots of a tree. They have a list of ancestors (usually, but not always, just one). Tags are named pointers to a commit that don't change. Branches are named pointers to a commit that do change. The index is a tiny proto-commit still in progress that you "add" to before committing.
There. That's git. Want to know more? Don't read the guide, just google "how to I switch to a specific git commit without affecting my tree?", or "how do I commit only some of my changed files?", or "how to I copy this commit from another place into my current tree?".
The base abstractions are minimalist and easy. The things you want to do with them are elaborate and complicated. Learn the former, google the latter. Don't read guides.
Commits are sets of files. They form a tree. A branch is a named location in this tree. The index aka staging area is a pre-commit that has no message. Workdir is just workdir, it doesn’t go in the repo unless you stage it. HEAD is whereafter commit will put new changes.
Do I understand git? Seems like yes. Let’s run a quiz then! Q? A.
How to make a branch? Git branch -a? Git checkout -b --new? Idk.
How to switch to a branch? Git switch <name>, but not sure what happens to a non-clean workdir. Better make a copy, probably. Also make sure the branch was fetched, or you may create a local branch with the same name.
How to revert a file in a workdir to HEAD? Oh, I know that, git restore <path>! Earlier it was something git reset -hard, but dangerous wrt workdir if you miss a filename, so you just download it from git{hub,lab} and replace it in a workdir.
How to revert a file to what was staged? No idea.
How to return to a few commits back? Hmmm… git checkout <hash>, but then HEAD gets detached, I guess. So you can’t just commit further, you have to… idfk, honestly. Probably move main branch “pointer” to there, no idea how.
If you have b:main with some file and b:br1 with it, and b:br2 with it, and git doesn’t store patches, only files, then when you change b:main/file, then change and merge+resolve b:br1/file, then merge that into b:br2 to make it up-to-date, will these changes, when merged back to already changed b:main become conflicted? Iow, where does git keep track of 3-way diff base for back-and-forth reactualization merges? How does rebase know that? Does it? I have no idea. Better make a copy and /usr/bin/diff [—ignore-pattern] the trees afterwards to make sure the changes were correct.
As demonstrated, knowing the base abstractions doesn’t make you know how to do things in git.
I don’t even disagree, just wanted to say fuck git, I guess. Read guides or not, google or reason, you’re screwed either way.
This is about as useful as "A monad is just a monoid in the category of endofunctors."
It's basically a lot of words which make zero sense for a user starting to use git -- even if it happens to be the most succinct explanation once they've understood git.
> The base abstractions are minimalist and easy. The things you want to do with them are elaborate and complicated. Learn the former, google the latter.
You can't really learn the former -- you can't even see it till you've experienced it for a while. The typical user groks what it means after that experience. Correction, actually: the typical user simply gives up in abject frustration. The user who survived many months of using a tool they don't understand might finally be enlightened about the elegant conceptual model of git.
Because the C and PL/SQL people are on CVS, I can fix this with vi on the ,v archive.
First on TFS repositories, and now with git grep I can easily find exposed passwords for many things. But it's just SQL Server!
We will never be able to use git responsibly, so I will peruse this guide with academic interest.
Don't even get me started on secrecy management.
I am looking forward to retirement!
In my experience people come to git and start using it with the centralised paradigm in their heads: that there is one repo and one DAG etc. They think that their master branch is the same as "the" master branch. You just can't get good at git with this wrong understanding.
Like so much of the porcelain is those kinds of tricks, and make otherwise tedious work much simpler.
Imagine if you didn't have interactive rebases! You could trudge through the work that is done in interactive rebases by hand, but there's stuff to help you with that specific workflow, because it is both complicated yet common.
I think jujutsu is a great layer over git precisely because you end up with much simpler answers to "how do I change up the commit graph", though.... the extra complication of splitting up changes from commits ends up making other stuff simpler IMO. But I still really appreciate git.
The evidence that the git UI is awful is _overwhelming_. Yes, yes, I’m sure the people that defend it are very very very very smart, and don’t own a TV, and only listen to albums of Halloween sounds from the 1950s and are happy to type the word “shrug“ and go on to tell us how they’ve always found git transparent and easy. The fact is that brilliant people struggle with git every single day, and would almost certainly be better served by something that makes more sense.
1. git clone
2. git checkout
3. git pull
4. git add + commit + push
5. git reset / rebase
Git porcelain stuff's plenty good for probably 95% of users. `rebase -i` comes with a guide on which commands do what, and you could write a couple of paragraphs about how to format `git log`'s output with your own preferences and tradeoffs -- and porcelain usually includes stuff as eclectic as `git gc`, `git fsck`, and `git rev-parse` by most accounts.
Git plumbing's definitely a bit more obscure, and does a bunch of stuff on its own that you can't always easily do with porcelain commands because they're optimized for the common use cases.
TL;DR: while Git's big (huge even), a lot of what it provides is way off the beaten path for most devs.
tldr: even if you never plan to use anything advanced, you’ll end up in some weird situation where you need to do something even if you’re in the “95% of the users”
no shade, yes ofc you “could this, could that” to make things work and we have been stuck with this for so long that an alternative doesn’t even seem plausible
Learning git like this is honestly just hampering yourself
The worst part about Git is the bad defaults. Seconded only by mismanaged storage. Or maybe being designed for the use-case most of its users will never have. Or maybe horrible authentication mechanism. Or maybe the lack of bug-tracker or any sensible feedback from its developers.
None of this can be helped by the GUI. In fact, beside Magit, any sort of front-end to Git I've seen is hands down awful and teaches to do the wrong thing, and is usually very difficult to use efficiently, and mystifies how things actually work. But, even with Magit, I'd still advise to get familiar with CLI and configuration files prior to using it: it would make it easier to understand what operations is it trying to improve.
The command line isn't that hard to use if you've ever used the command line before. Beginners trying to learn git and command line at the same time (which is very common) will get utterly confused, though, and for a lot of beginners that's the case. The only difficult part with git over the command line is fixing merge conflicts, I'd recommend anyone to use any IDE rather than doing that manually.
No IDE will be of any help for getting back to normal when you get into a detached HEAD state, which IDEs will gladly let you do if you click the right button.
This is just asking for trouble.
I am aware that beej's guides are typically quite comprehensive, but the vast nuances of git truly eluded me until this.
I guess Jujitsu would wind up being a much slimmer guide, or at least one that would be discoverable largely by humans?
And on that note, I feel like the guide covers maybe 10% of Git :), but hopefully 90% of common usage.
guh
I'm just going to be emailing myself versions of files with MyFile.Final.RealFinal2.txt from now on
I've never found that I need to touch most of it in the 15 or so years I've been using it, but it's there if your project needs it.
The universe doesn't owe you an easy 10 minute video solution to everything, it's an annoying educational expectation that people seem to have developed. Some things are just that difficult and you have to learn them regardless.
We have a basic Git cookbook we share with any new joinees so that they start committing code, but most of them just follow it religiously and don't understand what's going on (unsurprisingly).
However, literally everyone who attends the course comes out with a reasonable working understanding of Git so that they know what's actually happening.
That does NOT mean that they know all the commands well, but those can be trivially Googled. As long as your mental model is right, the commands are not a big deal. And yet, the vast majority of the discussion on HN on every single Git post is about the command line.
Funnily enough the class sounds a lot like the alt text of https://xkcd.com/1597/ (Just think of branches as...), the difference is that that is unironically the right way to teach Git to a technical audience, and they will come out with a fundamental understanding of it that they will never forget.
I honestly think it's such a high ROI time investment that it's silly to not do it.
This is precisely why it enrages me when all HN discussion about Git devolves to the same stuff about how it's complex and this and that.
A technical person who has general sense about basic data structures (Leetcode nonsense not needed) can be taught Git in under 2 hours and they will retain this knowledge forever.
If you can't invest that little time to learning a tool you will use everyday and instead will spend hours Googling and blindly copy-pasting Git commands, that's on you, not on Git.
A priori, I would have assumed this was one of those "just understand how every layer and every part of Linux works, and using Linux is easy" type arguments people used to make in the 90s - i.e. theoretically true, practically infeasible for most people.
Thankfully, I was lucky enough to come across a video explaining (some of) the git internal model early on, and it really doesn't take that much or that deep a knowledge of the internals for it to make a big difference. I'd say I know maybe 5% of how git works, and that already gave me a much better understanding of what the commands do and how to use them.
Does the course material (and perhaps any recordings) have any proprietary information or constraints to prevent you from sharing it publicly? Is this based on something that’s publicly available yet concise enough to fit within two hours? If yes, please share (perhaps in this thread and as a post submission on HN).
I’m asking because I believe that there can never be enough variety of training materials that handle a topic with different assumptions, analogies, focus, etc.
Happy to get any feedback.
I've also blogged about it
- https://looselytyped.com/blog/2014/08/31/gits-guts-part-i/
- https://looselytyped.com/blog/2014/10/31/gits-guts-part-ii/
That is, to set your upstream branch to the branch you want to merge into, aka the integration branch. So instead of setting upstream of "feature/foo" to "origin/feature/foo", you would set it to "master" or "origin/master".
This simplifies a lot of things. When you run `git status` it will now tell you how far you have diverged from the integration branch, which is useful. When you run `git rebase` (without any arguments), it will just rebase you on to upstream.
Setting `origin/feature/foo` to upstream is less useful. Developers tend to "own" their branches on the remote too, so it's completely irrelevant to know if you've diverged from it and you'll never want to rebase there.
If you set `push.default` to "current", then `git push` will do what you expect too, namely push `feature/foo` to `origin/feature/foo`.
Why isn't this a more common setup?
Also, reusing branches for GitHub pull requests (vs. creating new branches for each PR) might warrant some discussion in section 17?
Keep on fighting the good fight, Beej.
On one hand you have the ideal world scenario when each and every change is granular and you can annotate and blame every single line of code with description. On the other hand you have a real world where teams are encouraged to squash changes so that every commit corresponds to a business requirement and you have to engage a whole cabal to smuggle a refactor.
A long time ago I've implemented a routine to use both SVN and GIT, so that I could use GIT on file save, and SVN on feature release. I think it was inspired by Eclipse workflow. Definitely not something I would recommend these days.
I find it hard to judge when things are in a good enough state to commit and especially good enough to have a title.
I might start writing a new function, decide that I want it to be a class only to give up the class and wanting to return to my almost complete function. Snapshot works pretty well for that, but got isn’t really centered around snapshots and doing good snapshots is not straightforward, at least to me.
What do you guys do?
It has this idea of mutable commits, so essentially you can check out a commit, and then whenever you change a file, the commit is updated with the new file contents. Then internally, whenever a commit gets changed (or any aspect of the repository) that gets recorded in an append-only log. At any point in time, you can scroll through that log and restore any previous repository state, including changes to individual files.
By default, Jujutsu does the snapshotting thing (i.e. updating the commit with the contents of the local files) every time you run the `jj` command. However, you can set up file watchers so that it does the snapshotting every time a file changes in the repository. If you do this, you should be able to browse through the op log and see all of the changes you've made over time, including reverting to any of those stages when necessary.
In fairness, I've not tried the file watcher thing out personally, but being able to review the op log is fantastic for trying to go back to previous versions of your repository without having to do teeny-tiny "wip" commits manually all the time.
Stop worrying about titles and content and commit to your heart’s content.
When ready, restructure those snapshots into a coherent story you want to tell others by squashing commits and giving the remaining ones proper titles and commit messages. I use interactive rebase for that, but there are probably other ways too.
Work in a feature branch. Commit often. Squash away the junk commits at the end.
> ...and especially good enough to have a title.
Who needs a title? It's perfectly fine to rapid-fire commits with no comment, to create quick save points as you work. Bind to a key in your editor.
I treat commits in a private branch the same as the undo log of the text editor. No one cares about the undo log of your editor as they never see it. The same should be true of your private feature branch commits. They are squashed away never to be seen by human eyes again.
And then when everything works, compress commits into a few big commits with squash, and actually try to merge that back into the main branch.
> I might start writing a new function, decide that I want it to be a class only to give up the class and wanting to return to my almost complete function.
For me, that would easily be three commits in my dev branch (one with a first implementation of the function, one with a refactor to a class, then another one back to a single function) and when the function is finished, one squashed commit in a merge request. If everything goes right, it's as if the class file was never there.
It has to be said, relying on squashing doesn't work well when you're working in a team that doesn't pay too close attention to merge requests (accidentally merging the many tiny commits). You also have to be careful not to squash over merges/use rebase wherever possible so your squashed commits don't become huge conflicts during merge trains.
When I work on my own stuff that I don't share, I don't bother squashing and just write tons of tiny commits. Half of them leave the code in a non-compiling state but I don't necessarily care, I use them as reference points before I try something that I'm not sure works.
There is something to be said for carefully picking commit points, though. While finding the source of a bug, git becomes incredibly powerful when you can work git bisect right, and for that you need a combination of granularity and precision. Every commit needs to have fully working code, but every commit should also only contain minimal changes. If you can find that balance, you can find the exact moment a bug was introduced in a program within minutes, even if that program is half a decade old. It rarely works perfectly, but when it does, it's a magical troubleshooting tool.
Edit: Does anyone know a good way to convert one of the HTML pages into an epub for reading on an ereader? The PDFs will definitely work, but wanted to see if anyone knew of any tools for HTML -> EPUB conversion.
Try downloading the single-page html and coverting with pandoc:
pandoc index.html -o bla.epub
Maybe it needs some fine tuning, but the result seems good to me.
It's also this sort of work that's becoming less necessary with AI, for better or worse. This appears to be a crazy good guide, but I bet asking e.g. Claude to teach you about git (specific concepts or generate the whole guide outline and go wide on it) would be at least as good.
I also think if you are at the “don’t know what you don’t know” point of learning a topic it’s very hard to direct an AI to generate comprehensive learning material.
The main advantage of LLMs is that you can ask specific questions about things that confuse you, which makes iterating to a correct mental model much faster. It's like having your own personal tutor at your beck and call. Good guidebooks attempt to do this statically... anticipate questions and confusions at the right points, and it's a great skill to do this well. But it's still not the same as full interactivity.
I remember fumbling around for ages when I first started coding trying to work out how to save data from my programs. Obviously I wanted a file but 13 year old me took a surprisingly long time to work that out.
Almost impossible to imagine with AI on hand but we will see more slop-merchants.
I have found that asking AI "You are an expert teacher in X. I'd like to learn about X, where should I start?" is actually wildly effective.
Wild.
Anyway, the comment I really wanted to make was that I tried git lfs for the first time. I downloaded 44TB (https://huggingface.co/datasets/HuggingFaceFW/fineweb/tree/m...) over 3-4 days which was pretty impressive until I noticed that it seems to double disk space (90TB total). I did a little reading just to confirm it, and even learned a new term "git smudge". double disk space isn't an issue, except when you're using git to download terabytes.
I know programmers like everything to be in version control, but AI models and git just aren't compatible.
Branching, making commits, and creating pull requests come easy, but beyond that, I know utterly nothing about it.
I strongly suggest reading Pro Git, the official Git book by Scott Chacon and Ben Straub, available for free here: https://git-scm.com/book/en/v2.
I find it very pleasant to read and it really changed my perspective not only about Git but about how to write code in general. You don't need to read it entirely, but suggest at least these sections:
- 1.3 Getting Started - What is Git?: explains a little about snapshots and the three states
- 10.1 ~ 10.3 Plumbing and Porcelain, Git Objects and Git References: this explains Git in its lowest level, which is surprisingly simple but powerful. Those sections were enough for me to write my own "Git" (you can see it here: https://github.com/lucasoshiro/oshit)
I am an Old and we never were taught anything about coding with other people who were also working on the same project. I have had many successful projects but never with another person.
With that as a background, does your guide cover things like:
1) Merging. I was told that merging happens "automagically" and I cannot, for the life of me, understand how a computer program manages to just ... blend two functions or whatever and it "works." Does your guide make sense of this?
2) Apparently there are holy wars (see also vi versus emacs) about the One True Way to ... decide on branches and whatnot. Are there pros and cons laid out anywhere?
3) Everything seems broken down into teensy tiny functions when I look at someone's git repository, just skillions of files all over the place. Is this a git thing, a code repository thing, or simply that, in order for multiple people to work on the same project, everything must be atomized and then reassembled later? What's your opinion?
Are you concerned that your git exposition is much longer than the other guides you have produced?
Jujutsu VCS: Introduction and patterns
I've also written a guide, targeting devs with basic Git experience. It is much shorter, maybe you or your team can benefit from it [1]
[1] https://www.augmentedmind.de/2024/04/07/ultimate-git-guide-f...
However I'm seriously thinking about patching something together by grabbing appropriate bits of this.
As a cloud security analyst that is thinking of going back to coding or DevSecOps, if I'm honest with myself, there is nothing new here that I have not seen before... (This is not a criticism or anything. If anything the problem is myself: if I can allocate time to learn this or use Anki to retain this).
Thanks!
I’ve been using Git for years, but I bet that I’ll learn something from this.
Thanks for a lot for publishing Beej.
I usually refer people to https://cbea.ms/git-commit/.
At least, that's why I'm in it. That, and to do my best to help students succeed.
¯\_(ツ)_/¯
208 mentions of GitHub.
4 mentions of Gitea.
3 mentions of GitLab.
Why is it so biased and why is it helping to continue to teach people to centralized git.