Second, why would a version control system make it so difficult to roll back to an old version that it's easier to restore from backup? This is insane.
Third, I'm well aware of this, and of course you should be making backups of your git repositories (and everything else). But those backups should be there to protect against hardware failure and other external data-loss events, not protect against git itself.
The simple options are:
- Remove the hard-coded password, and create a new repository with the current state of the code as a starting point.
- Start a new repository with the current code state, but keep the old repository around under lock-and-key, then perform 'complex' patch operations to move changes between the two repositories (e.g. roll back to a previous version of a file before the cut-off).
- Go back through your history, and manually create a new repository from each patch, but removing the password when you get to that commit.
If git always preserves all history, no matter what, then these are your only options.
While operations like `git-filter-branch` sound scary, they don't delete the commit objects from your .git folder. If you created a new branch called (e.g.) master-old because running `git-filter-branch` on your repository, then you can always 'rollback' to master-old if you end up in failure. Or slightly more complex, you could use the reference listing in the reflog to 'rollback' the changes.
Next time, rather than just assume that the poster isn't smart enough to realize that a compromised password should be changed, maybe you could take in the fact that it's probably just an example of data that you might want to extract from your history if it's automatically there. I can think of numerous scenarios where someone might want to remove a password from the history even if it's not compromised (e.g. want to publish a private repo).
IMO the correct option is to create a new repository that has the same history as the old repository minus the offending commit (or possibly with an edited version of that commit that leaves out the offending string).
Because it creates a new repository, there's no risk of data loss in your old repository. Once you're confident that the operation succeeded, you can swap them.
I haven't had to do this for a long time, but as I recall, this is basically how svn does it. It works fine.
The problem with git is that it makes this far too easy and it works by editing existing repositories rather than creating new ones. So instead of once-in-a-blue-moon repository hacking to get rid of that password you accidentally committed, you get people rewriting history because they think the real history isn't "clean". I know a lot of people who routinely edit their local history before pushing changes to a shared repository because they don't want other people to see their true "dirty" history. This is insane.
Finally, I'm confused about something, so maybe you could clear this up for me. I keep seeing assurances that 1) git does not actually destroy any data, and you can always recover if you screw up and 2) editing history is sometimes a vital necessity for cases like when you commit passwords. You yourself made these assurances in this comment. However, 1 and 2 are obviously mutually exclusive. If you can always recover then you can't actually scrub the repository of accidentally committed passwords and the like. Which one is actually true?
1) This is almost true. Anything that is committed to Git is recoverable. When you "re-write" history, Git is creating a new set of commits in the history, an "alternate history path." It does not destroy the original commits, but there is no named reference to them (unless you created a branch/tag pointing to this line of commits).
2) In this case, if you want to actually destroy these unreferenced commits, you must run "git gc". This IS a destructive command. It will remove any unreferenced commits from the repository. (gc = garbage collect). If you never garbage collect, you will always have access to anything that was ever committed. It just might be hard to find since the only reference is the ref-log (if it was recent) or the commit hash.
This is no more insane than editing a source code file before you save it to the file system. Git is used as a development tool as well as version control, and developers are therefore encouraged to commit often, even if the code does not actually compile yet. There is no more need to fill the published history with all of these WIP commits than there is for me to know about every goddamn keystroke you made while you were dicking around with that config file.
There is no such thing as "an edited version" of a commit. A commit is identified by a SHA1 hash of its index of contents. If you change one bit you get a new commit.
You're a C programmer, right? If someone gave you a specification for writing a program to implement git, without telling your what it was, you'd tell them it would take 2 weeks. And that's because you'd reckon it would take 2 hours to knock out a rough version and a couple of days to clean it up.
Seriously, it's that simple. Just go learn how it works.