I'm a fervent believer in "good code is self-documenting", so I was curious to be proven wrong, clicked randomly until I found code and I saw this.
/*
* Round off to MAX_TIMESTAMP_PRECISION decimal places.
* Note: this is also used for rounding off intervals.
*/
#define TS_PREC_INV 1000000.0
#define TSROUND(j) (rint(((double) (j)) * TS_PREC_INV) / TS_PREC_INV)
Usage of acronyms is one of the worst offenders in bad code. The context makes it clear that TS means timestamp, so that's not too bad (still bad though), but I'm still not sure what INV means, luckily I presume it's the only place it's used.If it was named TIMESTAMP_ROUND, I wouldn't need to know "Round off to MAX_TIMESTAMP_PRECISION decimal places." Now that I've copy/paste that, it seems like the comment is wrong too, it's rounded off based on TS_PREC_INV, so if I was to believe the comment, I wouldn't get the right behaviour.
I'm not saying Postgres codebase isn't good code, just that "good code is self-documenting" is still true. That code was pretty much self-documenting except for the acronyms, but considering it was all used together, it's was fine and I was able to understand what they meant.
For me, comments should only be needed when something isn't clear. Defining what isn't clear is hard to determine for sure, but that's one thing for which code review helps quite a bit.
I'm thinking more of examples like this: https://github.com/postgres/postgres/blob/master/src/backend...
I just picked this at random from the storage subsystem, but I think it highlights what I mean. The comments are mostly about context. The comment for the routine is about when and who calls it, so that someone that reads the routine has that in mind. The specific line I'm linking to highlights in English prose that the correctness checking on the page headers is just a minimum guard and should not be fully trusted.
Back in the day I would have argued "oh, well, but you could break that into a function called "provisional_page_header_check(..)", or something. But.. there is nothing in the compiler that checks that function names stay in sync with their implementation any more than there is for comments. Writing it as a comment lets you use regular English sentences, breaking out a function takes that away and adds no compiler protection.
It's also.. friendly, somehow, to me. Working in this codebase is like participating in an ongoing and very slow conversation, which feels very pleasant.
That does confirm that they are making pretty amazing code. I would have much prefered to get that file instead of the other one :P.
They do have a redundant comment at someplace but it's clearly a tiny minority and they aren't losing any one time.
E.g. the code above is essentially (although somewhat mechanically renamed and moved since), from:
commit 41f1f5b76ad8e177a2b19116cbf41384f93f3851
Author: Thomas G. Lockhart <lockhart@fourpalms.org>
Date: 2000-02-16 17:26:26 +0000
Implement "date/time grand unification".
> Usage of acronyms is one of the worst offenders in bad code.Not really on board with that... It's a balance. Brevity does have it's value too. Everybody is going to understand that TS stands for timestamp, that WAL stands for write ahead log, etc. Especially when dealing with a language that doesn't have namespaces etc, you're going to have to realistically deal with prefixes a good bit. There's plenty of bad abbreviations in postgres code, however, don't get me wrong.
In the above I'm more bothered by the inconsistent naming, which I think is probably one postgres' bigger code quality issues.
> For me, comments should only be needed when something isn't clear.
I pretty strongly disagree. Most of the time comments shouldn't explicitly restate all that code is doing, sure (although there's clearly exceptions to that too). But e.g. stating why an algorithm is doing something, what the higher level goals of some checks are, why some shortcut is reasonable all make a code base a lot more maintainable in the medium to long run.
I work a lot on postres, and occasionally dabble around the corners of linux. For me it's the* defining difference making it much more painful to understand most linux subsystems.
Edit: formatting, typo
E.g. in the above sample “TS” is commented to mean timestamp but that will be lost during scans of code complete options. Also, MAX_TIMESTAMP_PRECISION may not show up in code-complete for timestamp macros/consts, but TIMESTAMP_MAX_PRECISION will.
TSROUND is already as obvious as TIMESTAMP_ROUND. TS is a very common abbreviation of timestamp.
And you would still need to know the decimal places.
The real issue is that it's based on TS_PREC_INV and not MAX_TIMESTAMP_PRECISION as per the comment (though MAX_TIMESTAMP_PRECISION might still agree with the decimals offered by TS_PREC_INV, it's not obvious here, and would need manual work to keep them in sync).
Common != universal. It's known up until someone doesn't know it. We have pretty powerful autocompletes, let use them instead, or just lose 3 seconds writing the 10 letters, it won't be so bad.
> And you would still need to know the decimal places.
Sure, by reading the code and understanding what it does and how it does it. You change a constant that will affect that code, it seems fine to see how it's affected either way.
> The real issue is that it's based on TS_PREC_INV and not MAX_TIMESTAMP_PRECISION as per the comment
Which is bound to happen when your documentation isn't the code directly.
> Common != universal. It's known up until someone doesn't know it. We have pretty powerful autocompletes, let use them instead, or just lose 3 seconds writing the 10 letters, it won't be so bad.
The cost of that compounds though. There's plenty times where there is not just one abbreviation in a symbol name, but multiples. And pretty soon the "logical" lines long enough to contain multiple references to such symbols get considerably slower to read (be it due to long lines, or being broken up into multiple lines).
I've an extremely hard time to believe that the widespread use of ts, xact, wal, ... is a significant factor in how quickly somebody can get started with the postgres code base.
> > The real issue is that it's based on TS_PREC_INV and not MAX_TIMESTAMP_PRECISION as per the comment
> Which is bound to happen when your documentation isn't the code directly.
Hm? Those aren't out of sync? TS_PREV_INC is the relevant factor/divisor to round to MAX_TIMESTAMP_PRECISION here. It'd be nicer if that were explicit in the code by defining TS_PREV_INC based on MAX_TIMESTAMP_PRECISION, sure, but they're in sync. It's just not that trivial to state in C. But they're in sync. Note also that TS_PREC_INV really is just an implementation detail for TSROUND(), it's not used elsewhere. These days we'd just write this in an static inline function, in all likelihood.
Someone might also not know what a timestamp is, or a UNIX timestamp at least, so there's that.
>We have pretty powerful autocompletes, let use them instead, or just lose 3 seconds writing the 10 letters, it won't be so bad.
The problem with the above idea is that it implies "spelling it out fully == better". Which is not necessarily the case, long variable names can make code hard to follow and verbose. Ask the Java community...
Totally agree. Comments should be limited to sections in which the code is unexpected. For example, for a workaround for a bug in another part of the system. That you should comment because if the bug is ever fixed someone reading the code will understand why it looks wonky.
I don't agree with acronyms. They are fine to use as long as you are consistent. For example, if you write "ts" in one place, you can't write "timestamp", "tstamp", "tstmp" for the same data in some other code. In my code, I always use "n" for "length". Since I'm consistent about it, and ever use "n" for anything else, it doesn't make the code harder to read.
With only the name, how would you know how many decimal places? The comment isn't wrong/out of date here btw.
I'm not saying the name is wrong because of the comment, I'm saying it's wrong because of the usage of the acronym.
> The comment isn't wrong/out of date here btw.
Isn't wrong? How? It's true in the sense that they expect TS_PREC_INV to be related to MAX_TIMESTAMP_PRECISION (which would be a perfect example to my mind of a needed comment, if actually it was a requirement), but it's actually false in the sense that it's not what that code does.
You wouldn't get a different rounding if you were to modify MAX_TIMESTAMP_PRECISION, which is what you would expect based on that comment.
I get that. But you also said the comment would be unnecessary with a name change. The comment does communicate more information than your proposed name, IMO, hence is not replaceable by the name. (IMO).
> You wouldn't get a different rounding if you were to modify MAX_TIMESTAMP_PRECISION, which is what you would expect based on that comment.
Good point. The comment isn't wrong with the definition of MAX_TIMESTAMP_PRECISION as it is in the code. If you override it though, the code doesn't do what the comment says.
It's an interesting case: if you trust the comment indicates the desired behavior, then you can see the code may have room to be improved. If you distrust that the comment is correct or has value, then you might just remove the comment, and the code doesn't get better.