There is a detailed blog post[1] about it, adding a lot more to understanding what's happening. More than a blog post about how a diary entry as a commit message made somebody feel.
[1]: http://graysoftinc.com/character-encodings/ruby-19s-three-de...
The reason I ask is that I find myself (with Rails particularly) spending an awful lot of time on maintaining tests that were written by a much younger and much less experienced developer (me, a few years ago).
And I've come to the slow realisation that a lot of what I was testing wasn't really important.
I'm not saying the author shouldn't have written the spec, or that the spec wasn't valuable. But maybe the true bug here was that rspec matchers don't honour utf-8 spaces and perhaps they should? I dunno. Just thinking out loud here.
Do you have any tips for establishing a culture of good commit messages?
This tendency toward nice, neat, clean, architecturally pure, and ultimately worse solutions runs deep in software engineering (and in humanity, in general (see e.g. the failure of pristine housing projects that replaced messy-but-functional slums)) and it's one I have to fight within myself, too.
I don't think there's any great way to combat it, though. You need mature, experienced developers on your team who aren't too tired and burned out to fight it.
I'm a strong proponent of good commit messages, so one thing I've been doing since joining the team is encouraging this to be better.
I will offer one nit here, which is that I always start my commit messages with one-line summaries; this makes it really easy to scan commits (and git is tooled to do this!)
-
Remove non-ascii character to fix error from `bundle exec rake`
-
That commit message would add external context (what was broken/where the error was coming from) and intent
EDIT: Hmm, that doesn't seem to be it, here's the commit in question https://github.com/alphagov/govuk-puppet/commit/63b36f93bf75...
52 ~line~ [edit: character] summary
Then write a book, just being sure to wrap at 72 chars. Never really considered what an optimal commit message length was.
Maybe it's something along the lines of "if you are playing poker and you can't figure out who the easy money is, the easy money is you." I've never seen a commit message, PR description, etc and thought "well that was excessive," so maybe I'm the one...
This one stood out to me: 157 lines of rationale, discussion of alternatives, etc for 22 lines of uncontroversial changes[1]. It's much more likely to be useful in and of itself as a piece of documentation than a one-liner, "Changes to prevent strcpy" though.
[1]: https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5e...
Commit messages are not going to serve as a knowledge base for state of the art and best practices, and strcpy is universally recognized as problematic. Plus, about the how, while it is fine to see the dev thought a lot about it, did some testing, etc., the way he did the ban is ultra classic in the end. Also, it might have been even better to put some short explanations in comments instead of in just the commit message, in particular the usefulness of the BANNED macro to preserve some line numbers with gcc. So in this particular case I'm not really convinced that the commit message serves any strong purpose.
But it does not hurt. And it participates to a culture where useful commit messages are important. So I still (weakly) prefer it to my hypothetical "ban strcpy".
#undef strcpy
Yikes…Commit messages need to be as long as needed, period.
Somewhat related, the largest commit message I know is 857d286123acf87ae4a08528a3eef4ce2fbf8db2 from this repo: https://github.com/nanobox-io/nanobox-pkgsrc-lite (loading it crashes github)
It's 100MB big, and contains... an entire git history by itself.
[0] https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...
(Sadly, the commit message isn't great, but clearly it's worth that hash.)
result:
https://android.googlesource.com/platform/system/bt/+/refs/h...
I've encountered bad utf-8 when data is copied from excel or outlook.
My teammate recently wrote a blog post on how we document Why in git commits and PR descriptions.
https://medium.com/better-programming/daily-habits-to-turn-y...
Would also recommend checking out a related talk by a tech lead at StitchFix: https://confreaks.tv/videos/rubyconf2018-documentation-trade...
There is so much superfluous cruft in the message.
Listing the error, and the fix, would give exactly as much context for the next person to run into this issue.
If your goal to explain the process to find similar cases, just a couple more lines would have done just as well:
-
Fixes case where `bundle exec rake` would fail with error message:
ArgumentError: invalid byte sequence in US-ASCII
This is caused by the presence of non-ASCII characters in files.
The following command was used to find files with non-ASCII characters:
`find modules -type f -exec file --mime {} \+ | grep utf`
Running `iconv -f UTF8 -t US-ASCII` on listed files will show the exact location of offending characters.
-
No long winded exposition, no extremely case specific program output (including current machine's name...)
I’m not saying this is the worst commit message ever, and I’d appreciate the intention if I came across it, but if you’re caring enough to give this much context, you can save both yourself, and the next person to look for your message, some cognitive load by sticking to what’s needed.
-
And if you’re wondering how to decide “what is needed”, it varies, but I think a good rule of thumb is: ask yourself how one would find this commit message
What would someone grep for in git's history if they came across this? And if they found it, what information answers their query?
They’re likely to search the command that is failing, and the error.
The answer to that query is, what causes the command to error out, and how to find cases of that cause.
They’re not going to search for the program output of find, or iconv, not going to search for the exact file you were modifying when this happened, or the fixes you tried that didn’t work.
I mean, you're right, but I don't think that's time well spent. This is a commit message, not a blog post. Writing concisely is hard and time consuming.
This was probably written in stream-of-consciousness fashion which strikes a nice balance between time invested and possible future usefulness (if any). I mean, the commit might be useful in the future, but there's also a reasonable chance that nobody except the reviewer of the PR is ever going to read that commit message. (If the OP hadn't blogged about it)
A commit message with such a long explanation helps nobody, because it is not searchable by Google, not even by GitHub[2]. [...] If the commit message requires a blog post to be good (searchable) – it is a poor commit message.
[1]: https://github.com/alphagov/govuk-puppet/commit/63b36f93bf75...
[2]: https://github.com/search?q=invalid+byte+sequence+in+US-ASCI...