On undoing, fixing, or removing commits in git (opens in new tab)

(sethrobertson.github.io)

105 pointsDanielShir12y ago73 comments

73 comments

sisk12y ago

Regarding losing data: it's as simple as diving into the reflog. In order to remove something from your history, you must do so very explicitly by walking your commit history, editing each one. There is an automated workflow to accomplish that (`filter-branch`) but it's definitely not a command anyone I know has committed to memory.

Accidental mutations can be undone either by `--abort`ing (if the command supports it) or by checking out an earlier revision from the reflog.

The GC in git is pretty conservative and, while it can be triggered manually, still makes you jump through some hoops to actually get rid of something. Steve Klabnik wrote about it[1] a little while back.

In certain cases, you don't have access to the reflog because a change wasn't made locally. Perhaps someone screwed up a remote you pull from and it destroyed your history. You can, even still, find, view, and re-associate orphaned objects. Yeah, it's not terribly intuitive and, again, not a workflow anyone has probably committed to memory, but the fact that you can recover from a disaster of that magnitude is pretty amazing.

git provides we developers with a set of tools—powerful tools—and that comes with a level of responsibility. I'd rather have the ability to responsibly clean my history than the alternative.

[1] - http://words.steveklabnik.com/git-history-modification-and-l...

mikeash12y ago

"Strongly consider taking a backup of your current working directory and .git to avoid any possibility of losing data as a result of the use or misuse of these instructions."

WTF?

What is the point of a version control system if you have to take backups of it to avoid losing data when performing certain operations?

I use git, I like git, but certain aspects of it are fundamentally broken.

gemma12y ago

No, that advice from the article is fundamentally broken. Outside of the garbage collection system (which runs by default after what, 30 days? 90?), Git doesn't delete committed content. Any commit you "lose" through rebasing, amending, resetting, etc. can always be recovered. It's a little more complicated than renaming a directory, sure, but it's important, and it's not something a Git tutorial should ignore.

Git IS safe, and ANYTHING involving changes to history can be undone without resorting to backups. Data loss can occur when you're mucking about with uncommitted changes, but that's a risk in most other version control systems as well.

prezjordan12y ago

Surprised to see no one in the comments has mentioned the reflog [0]. It's really very easy.

[0]: http://jscal.es/2013/08/05/seriously-the-reflog-isnt-that-sc...

crystaln12y ago

I'm not 100% sure this is true, however it is also a fundamental flaw of git. There should be a way to remove commits permanently in order to remove mistakenly checked in large files or private content.

It's also definitely not true with uncommitted changes, including gitignored files.

2 more replies

crazygringo12y ago

I understand your puzzlement, I found this confusing too at first. But then I realized it makes sense -- one of git's strengths is that you can rewrite the history. The "point" of a version control system, at least with git, is not backup which retains all history, but rather versioning which retains the history you want to retain.

Obviously, if you choose not to edit the history, then you never need to back up in this sense, and you're free to do that. But then you can't ever go back and change things (like remove accidentally committed passwords, etc.)

But if you choose to rewrite the history, and mess up, then you'll be glad you had a backup. And (in response to other comments), even if there are ways of still retrieving/fixing data, it's often easier to just restore from your backup, especially when you're trying out git commands for the first time, and you're not entirely sure if they'll work exactly how you expect. None of us are git experts from the beginning, and I've resorted to git backups numerous times when trying out a command for the first time, and then discovering it wasn't the right way.

simcop238712y ago

An easy way to do that, is the way I tend to do it; Create a new branch based off the one you're rewriting history in, and that will actually keep all of that for you even after you rewrite it all. Makes it really easy to restore later with git reset if you need it.

1 more reply

zimbatm12y ago

Actually `git reflog` contains the HEAD history. Even after a rebase it's possible to checkout to an old commit (unless git has garbage-collected).

_ikke_12y ago

Git is quite safe, and most operations that involve doing things to history can be undone. Unsafe operations happen when the working tree and uncomitted changes are involved.

Also, sometimes it's easier for a user to roll back to an older back up than to untangle the mess they have created.

Third, git itself is not a backup. When your repository gets corrupted, you're out-of-luck when you don't have backups for those files. So it's still good to take backups of your repositories.

mikeash12y ago

First, your use of the word "most" is inherently incompatible with the phrase "quite safe".

Second, why would a version control system make it so difficult to roll back to an old version that it's easier to restore from backup? This is insane.

Third, I'm well aware of this, and of course you should be making backups of your git repositories (and everything else). But those backups should be there to protect against hardware failure and other external data-loss events, not protect against git itself.

1 more reply

nknighthb12y ago

1) Why are you taking a be-careful-don't-blame-me passage from a random article written by some guy as gospel?

2) All version control systems are vulnerable to data loss if you mess around with them in unusual ways. Would you say svn was fundamentally broken if somebody told you to take a backup before you screwed around with the repo?

mikeash12y ago

1) It's the sort of thing I've heard many times from many people over the years.

2) The difference is that svn does not build this functionality into the main command line tool, and there is no culture of doing terrible things with svnadmin to edit svn repositories the way there is of doing terrible things with git to rewrite git history.

1 more reply

perlgeek12y ago

On piece of data that is easy to lose, with any version control system I've worked with so far, is uncommitted data. And that's also the only data I've lost with git so far, after using it for several years. (And yes, it was my own stupidity, saying 'git checkout .' and only noticing later that there was something I wanted to keep).

The advise to take a backup doesn't hurt, and might be helpful if restoring the original state is more effort than doing it with git operations.

oneandoneis212y ago

you don't need to make backups - git won't lose your data. I lost all hope that the article might be worth reading at that line.

mateuszf12y ago

Yep, that's true. Restoring data is as simple as checking latest changes using "git reflog"

mtdewcmu12y ago

I agree that git is both a great advance and seems fundamentally broken at the same time. One of git's advances is that it treats commits as snapshots of the entire tree rather than diffs[1]. A snapshot might as well be a tarball of the whole directory, except that git uses references to previous snapshots to store it efficiently. So in this aspect, git is like a backup tool plus compression. It's not quite a useful tool just for making compressed backups of source code, though, because data is buried in opaque internal files in the .git directory and can't be untangled from the commit history. You can't get at your data without going through git's tools, which means you might need to make your own backups in case git goes insane, and you can't use the backup functionality without creating indelible history.

I'm thinking that the repository could be moved out of the working directory and placed in its own file that's not invisible. If the repo was reified into a visible file, then repos would be portable and you could ftp them. The backup functionality could be separated from the history-tracking functionality, so you could make backups freely without adding noise to the commit history. A backup would basically be a tarball that you could append to a repo file, taking advantage of previous entries for compression. Commits, however they were implemented, could reference snapshots, but they needn't be 1:1.

[1] http://git-scm.com/book/ch1-3.html

eru12y ago

> I'm thinking that the repository could be moved out of the working directory and placed in its own file that's not invisible.

Symlinks are your friends.

> then repos would be portable and you could ftp them.

tar might come in handy.

> The backup functionality could be separated from the history-tracking functionality, so you could make backups freely without adding noise to the commit history. A backup would basically be a tarball that you could append to a repo file, taking advantage of previous entries for compression.

You can already do this. You can have commits without ancestors or descendants in your repository, and they will still benefit from delta compression.

1 more reply

mcv12y ago

Backups are of course always a good idea, but you don't need them specifically to work with git. Git is its own backup system. If you think you might do something potentially harmful, do it in a new branch. If something goes wrong, you can always throw it away.

If something has already gone wrong, and you didn't do it in a separate branch, you can still go back to a previous situation.

Rewriting history in any serious sense (beyond a local reset or rebase for stuff that hasn't been pushed to anyone else yet) is always a bad idea. History is history for a good reason.

Of course any existing commit can always be reverted; that's not rewriting history. A revert is simply a new commit.

Estragon12y ago

It makes it fast and easy to back out if you screw anything up. Even if the data is still there, it can be complex to pull it back out and configure it the way it was when you started (as the commands in this tutorial demonstrate.) So a fast, easy snapshot before executing complex commands is a smart move.

zimbatm12y ago

git is safe but you have to know all the fancy commands like `git reflog`. I remember being puzzled by a merge conflict when I started learning git. I didn't know what it was and `git reset` or `git revert` weren't doing what I expected. All I wanted was to go back to the previous state. In the end it was easier to clone the repo and start over again.

mtdewcmu12y ago

You probably wanted `git merge --abort`. It's not very clear what the various states are that git can be in. There seems to be a 'fixing merge conflict' state, and it's hard to find documentation that warns you about this state and what your options are once you're in it.

rebelidealist12y ago

sigh it seems to me that Git is unnecessarily complicated. Wonder what if "github" started with HG.

tytso12y ago

There are two ways things can be simple or complicated. One is to have a big button labelled "DWIM", which always does the right thing --- until it doesn't, and then you have to go out of your way to work around its assumption of what you want to do.

The other way is to have a number of simple concepts which can be combined in various powerful ways. Once you understand these simple concepts, you can compose them to do whatever you need. Git is simple the same way that RISC is simple, and having a manual transmission is simple. You can do a lot more with a manual transmission car than you can with an automatic --- but if you're not careful you can strip the gears. Yet a manual transmission is simpler to maintain, and more efficient (in the hands of someone who knows how to use it) than a automatic transmission. If you take a look at the post, you'll see that the various recipes only use a handful of git commands. Once you've mastered those commands, things are indeed quite simple.

crystaln12y ago

That would be true if git's command line interface were not so inconsistent and obtuse. I agree the underlying concepts are simple, which is why the command line interface is so baffling.

1 more reply

mtdewcmu12y ago

It's hard to know how elegant and simple something can be made. git is a nice tool, but we shouldn't assume that it's the last word and can't be improved upon.

jordigh12y ago

Hg is working on a feature that is betaish right now:

http://mercurial.selenic.com/wiki/ChangesetEvolution

It's been brewing for some time. Basically, the idea is to be able to make it easy to safely edit history collaboratively, with a consistent UI. Facebook is pumping a lot of money into hg right now, and seems particularly interested in getting this feature off the ground.

A number of pieces have been falling into place for this to occur. The first was to have phases, indicators of which commits are safe to edit collaboratively or not, a feature that some git users have wanted:

https://github.com/peff/git/wiki/SoC-2012-Ideas#published-an...

Mercurial now has this feature and uses it as part of the logic for the evolve extension. With this in place, hg is able to transmit metadata that indicates automatically which commits need to be fixed up if you want to edit a commit that someone else has also edited, or if someone edited a commit on top of which you've based off other commits.

The idea is to make something like "git push --force" obsolete. History is safe to edit, and commits can't get lost, not even by accident:

http://www.infoq.com/news/2013/11/use-the-force

By the way, an epilogue to that Jenkins story is that it wasn't completely trivial to recover all lost history, and at least for some of the smaller repos, they never managed to figure out exactly which version was the canonical one.

RyanZAG12y ago

I love this kind of attitude: something seems complex? Throw it out and start again!

Unfortunately, it's usually the problem domain that is complex and starting over just means you have to rediscover all of that complexity all over again. HG has more than its fair share of complicated tasks.

pseut12y ago

Git is designed for project maintainers, and a lot of the complication is necessary for them (that view helps me, at least)

skylan_q12y ago

Anyone unfamiliar with the most basic of workflows would find this needlessly complex. Just a couple of months ago, I would have.

Now that I have familiarity with local branching, remote branches, how the 3-way merge works (conceptually) and rebasing, this article comes off as a guide on how to do things that you wouldn't have to do to often anyways.

lukasm12y ago

in many ways hg is superior, but you can't win with Linus blessing.

rspeer12y ago

Thanks, this is a useful reference.

I am sad about some of these other comments, which I might paraphrase as "This doesn't help me, and it might help people who are less skilled than me who don't deserve to be helped, therefore it's worthless". It's apparently a common sentiment on this site, but it shouldn't be.

caipre12y ago

Usability note: after a few clicks through this (so my path had a few entries) I instinctively clicked up a few levels in the path expecting to be taken to that point. Instead, that entry was appended as another child.

crystaln12y ago

The inability to, in any remotely easy way, remove mistakenly checked in large files and private data has always seemed like a major flaw with git.

pyre12y ago

Well, the solution to other systems seems to be "it's checked in, therefore it can never be un-checked-in, so deal with it!" (or at least this is the attitude of some vocal proponents of them).

ams611012y ago

The flaw is in having this "private" data in a public repo to begin with. If your data are private, don't put your project on github.

crystaln12y ago

While I'm certain you and your organization have a perfect record of never checking inappropriate things into your git repository, mine does not. Even if all the employees at your company were perfect, there is still a chance of inappropriate information getting into the repository.

mcv12y ago

Rule number one: if you're not sure what you're doing, do it in a new branch. If things go wrong, you can always delete that branch.

And you can always make a branch out of a previous situation. Gitk/gitx make this particularly easy.

elwell12y ago

Sentence 2 has typo "or" -> "of"

j / k navigate · click thread line to collapse

73 comments

sisk12y ago

Accidental mutations can be undone either by `--abort`ing (if the command supports it) or by checking out an earlier revision from the reflog.

git provides we developers with a set of tools—powerful tools—and that comes with a level of responsibility. I'd rather have the ability to responsibly clean my history than the alternative.

[1] - http://words.steveklabnik.com/git-history-modification-and-l...

mikeash12y ago

"Strongly consider taking a backup of your current working directory and .git to avoid any possibility of losing data as a result of the use or misuse of these instructions."

WTF?

What is the point of a version control system if you have to take backups of it to avoid losing data when performing certain operations?

I use git, I like git, but certain aspects of it are fundamentally broken.

gemma12y ago

prezjordan12y ago

Surprised to see no one in the comments has mentioned the reflog [0]. It's really very easy.

[0]: http://jscal.es/2013/08/05/seriously-the-reflog-isnt-that-sc...

crystaln12y ago

It's also definitely not true with uncommitted changes, including gitignored files.

2 more replies

crazygringo12y ago

simcop238712y ago

1 more reply

zimbatm12y ago

Actually `git reflog` contains the HEAD history. Even after a rebase it's possible to checkout to an old commit (unless git has garbage-collected).

_ikke_12y ago

Git is quite safe, and most operations that involve doing things to history can be undone. Unsafe operations happen when the working tree and uncomitted changes are involved.

Also, sometimes it's easier for a user to roll back to an older back up than to untangle the mess they have created.

Third, git itself is not a backup. When your repository gets corrupted, you're out-of-luck when you don't have backups for those files. So it's still good to take backups of your repositories.

mikeash12y ago

First, your use of the word "most" is inherently incompatible with the phrase "quite safe".

Second, why would a version control system make it so difficult to roll back to an old version that it's easier to restore from backup? This is insane.

1 more reply

nknighthb12y ago

1) Why are you taking a be-careful-don't-blame-me passage from a random article written by some guy as gospel?

mikeash12y ago

1) It's the sort of thing I've heard many times from many people over the years.

1 more reply

perlgeek12y ago

The advise to take a backup doesn't hurt, and might be helpful if restoring the original state is more effort than doing it with git operations.

oneandoneis212y ago

you don't need to make backups - git won't lose your data. I lost all hope that the article might be worth reading at that line.

mateuszf12y ago

Yep, that's true. Restoring data is as simple as checking latest changes using "git reflog"

mtdewcmu12y ago

[1] http://git-scm.com/book/ch1-3.html

eru12y ago

> I'm thinking that the repository could be moved out of the working directory and placed in its own file that's not invisible.

Symlinks are your friends.

> then repos would be portable and you could ftp them.

tar might come in handy.

You can already do this. You can have commits without ancestors or descendants in your repository, and they will still benefit from delta compression.

1 more reply

mcv12y ago

If something has already gone wrong, and you didn't do it in a separate branch, you can still go back to a previous situation.

Rewriting history in any serious sense (beyond a local reset or rebase for stuff that hasn't been pushed to anyone else yet) is always a bad idea. History is history for a good reason.

Of course any existing commit can always be reverted; that's not rewriting history. A revert is simply a new commit.

Estragon12y ago

zimbatm12y ago

mtdewcmu12y ago

rebelidealist12y ago

sigh it seems to me that Git is unnecessarily complicated. Wonder what if "github" started with HG.

tytso12y ago

crystaln12y ago

That would be true if git's command line interface were not so inconsistent and obtuse. I agree the underlying concepts are simple, which is why the command line interface is so baffling.

1 more reply

mtdewcmu12y ago

It's hard to know how elegant and simple something can be made. git is a nice tool, but we shouldn't assume that it's the last word and can't be improved upon.

jordigh12y ago

Hg is working on a feature that is betaish right now:

http://mercurial.selenic.com/wiki/ChangesetEvolution

https://github.com/peff/git/wiki/SoC-2012-Ideas#published-an...

The idea is to make something like "git push --force" obsolete. History is safe to edit, and commits can't get lost, not even by accident:

http://www.infoq.com/news/2013/11/use-the-force

RyanZAG12y ago

I love this kind of attitude: something seems complex? Throw it out and start again!

pseut12y ago

Git is designed for project maintainers, and a lot of the complication is necessary for them (that view helps me, at least)

skylan_q12y ago

Anyone unfamiliar with the most basic of workflows would find this needlessly complex. Just a couple of months ago, I would have.

lukasm12y ago

in many ways hg is superior, but you can't win with Linus blessing.

rspeer12y ago

Thanks, this is a useful reference.

caipre12y ago

crystaln12y ago

The inability to, in any remotely easy way, remove mistakenly checked in large files and private data has always seemed like a major flaw with git.

pyre12y ago

Well, the solution to other systems seems to be "it's checked in, therefore it can never be un-checked-in, so deal with it!" (or at least this is the attitude of some vocal proponents of them).

ams611012y ago

The flaw is in having this "private" data in a public repo to begin with. If your data are private, don't put your project on github.

crystaln12y ago

mcv12y ago

Rule number one: if you're not sure what you're doing, do it in a new branch. If things go wrong, you can always delete that branch.

And you can always make a branch out of a previous situation. Gitk/gitx make this particularly easy.

elwell12y ago

Sentence 2 has typo "or" -> "of"

j / k navigate · click thread line to collapse