undefined | Better HN

0 pointsdwild6y ago0 comments

> Postgres codebase is what got me out of the "good code is self documenting" nonsense.

I'm a fervent believer in "good code is self-documenting", so I was curious to be proven wrong, clicked randomly until I found code and I saw this.

    /*
     * Round off to MAX_TIMESTAMP_PRECISION decimal places.
     * Note: this is also used for rounding off intervals.
     */
    #define TS_PREC_INV 1000000.0
    #define TSROUND(j) (rint(((double) (j)) * TS_PREC_INV) / TS_PREC_INV)

Usage of acronyms is one of the worst offenders in bad code. The context makes it clear that TS means timestamp, so that's not too bad (still bad though), but I'm still not sure what INV means, luckily I presume it's the only place it's used.

If it was named TIMESTAMP_ROUND, I wouldn't need to know "Round off to MAX_TIMESTAMP_PRECISION decimal places." Now that I've copy/paste that, it seems like the comment is wrong too, it's rounded off based on TS_PREC_INV, so if I was to believe the comment, I wouldn't get the right behaviour.

I'm not saying Postgres codebase isn't good code, just that "good code is self-documenting" is still true. That code was pretty much self-documenting except for the acronyms, but considering it was all used together, it's was fine and I was able to understand what they meant.

For me, comments should only be needed when something isn't clear. Defining what isn't clear is hard to determine for sure, but that's one thing for which code review helps quite a bit.

0 comments

jakewins6y ago

I mostly agree with this. Though as I wrote that sentence I realized Go has somewhat softened my position on abbreviations. I think the "note" portion is useful; ultimately a test would stop you breaking that secondary use, but the comment stops you spending time in that direction in the first place. But either way, overall I think you're right this would be fine without a comment.

I'm thinking more of examples like this: https://github.com/postgres/postgres/blob/master/src/backend...

I just picked this at random from the storage subsystem, but I think it highlights what I mean. The comments are mostly about context. The comment for the routine is about when and who calls it, so that someone that reads the routine has that in mind. The specific line I'm linking to highlights in English prose that the correctness checking on the page headers is just a minimum guard and should not be fully trusted.

Back in the day I would have argued "oh, well, but you could break that into a function called "provisional_page_header_check(..)", or something. But.. there is nothing in the compiler that checks that function names stay in sync with their implementation any more than there is for comments. Writing it as a comment lets you use regular English sentences, breaking out a function takes that away and adds no compiler protection.

It's also.. friendly, somehow, to me. Working in this codebase is like participating in an ongoing and very slow conversation, which feels very pleasant.

dwildOP6y ago

That comment is exactly what I mean by when needed.

That does confirm that they are making pretty amazing code. I would have much prefered to get that file instead of the other one :P.

They do have a redundant comment at someplace but it's clearly a tiny minority and they aren't losing any one time.

jakewins6y ago

Yep, I think we're in agreement :)

anarazel6y ago

It's like in any sizable codebase with quite some history. There's substantial difference in quality between parts. Some of the worst parts go back to the early days of postgres - the priorities and resources available back then were just very different than today. Obviously there's also noticeable differences in more recent code, but I don't think to the same degree (although there've been definitely subsystems that worked out better and some that worked out worse).

E.g. the code above is essentially (although somewhat mechanically renamed and moved since), from:

   commit 41f1f5b76ad8e177a2b19116cbf41384f93f3851
   Author: Thomas G. Lockhart <lockhart@fourpalms.org>
   Date:   2000-02-16 17:26:26 +0000

       Implement "date/time grand unification".

> Usage of acronyms is one of the worst offenders in bad code.

Not really on board with that... It's a balance. Brevity does have it's value too. Everybody is going to understand that TS stands for timestamp, that WAL stands for write ahead log, etc. Especially when dealing with a language that doesn't have namespaces etc, you're going to have to realistically deal with prefixes a good bit. There's plenty of bad abbreviations in postgres code, however, don't get me wrong.

In the above I'm more bothered by the inconsistent naming, which I think is probably one postgres' bigger code quality issues.

> For me, comments should only be needed when something isn't clear.

I pretty strongly disagree. Most of the time comments shouldn't explicitly restate all that code is doing, sure (although there's clearly exceptions to that too). But e.g. stating why an algorithm is doing something, what the higher level goals of some checks are, why some shortcut is reasonable all make a code base a lot more maintainable in the medium to long run.

I work a lot on postres, and occasionally dabble around the corners of linux. For me it's the* defining difference making it much more painful to understand most linux subsystems.

Edit: formatting, typo

inlined6y ago

IMO good code needs to be readable at the call site much more than the function definition. That’s why I believe in the Google C/Go style rule that all mutable parameters must be passed by pointer. A call site with &arg communicates mutability quickly. Also, in my framework designs I consider the impact on code complete heavily:

E.g. in the above sample “TS” is commented to mean timestamp but that will be lost during scans of code complete options. Also, MAX_TIMESTAMP_PRECISION may not show up in code-complete for timestamp macros/consts, but TIMESTAMP_MAX_PRECISION will.

coldtea6y ago

>If it was named TIMESTAMP_ROUND, I wouldn't need to know "Round off to MAX_TIMESTAMP_PRECISION decimal places."

TSROUND is already as obvious as TIMESTAMP_ROUND. TS is a very common abbreviation of timestamp.

And you would still need to know the decimal places.

The real issue is that it's based on TS_PREC_INV and not MAX_TIMESTAMP_PRECISION as per the comment (though MAX_TIMESTAMP_PRECISION might still agree with the decimals offered by TS_PREC_INV, it's not obvious here, and would need manual work to keep them in sync).

dwildOP6y ago

> TS is a very common abbreviation of timestamp.

Common != universal. It's known up until someone doesn't know it. We have pretty powerful autocompletes, let use them instead, or just lose 3 seconds writing the 10 letters, it won't be so bad.

> And you would still need to know the decimal places.

Sure, by reading the code and understanding what it does and how it does it. You change a constant that will affect that code, it seems fine to see how it's affected either way.

> The real issue is that it's based on TS_PREC_INV and not MAX_TIMESTAMP_PRECISION as per the comment

Which is bound to happen when your documentation isn't the code directly.

anarazel6y ago

> > TS is a very common abbreviation of timestamp.

> Common != universal. It's known up until someone doesn't know it. We have pretty powerful autocompletes, let use them instead, or just lose 3 seconds writing the 10 letters, it won't be so bad.

The cost of that compounds though. There's plenty times where there is not just one abbreviation in a symbol name, but multiples. And pretty soon the "logical" lines long enough to contain multiple references to such symbols get considerably slower to read (be it due to long lines, or being broken up into multiple lines).

I've an extremely hard time to believe that the widespread use of ts, xact, wal, ... is a significant factor in how quickly somebody can get started with the postgres code base.

> > The real issue is that it's based on TS_PREC_INV and not MAX_TIMESTAMP_PRECISION as per the comment

> Which is bound to happen when your documentation isn't the code directly.

Hm? Those aren't out of sync? TS_PREV_INC is the relevant factor/divisor to round to MAX_TIMESTAMP_PRECISION here. It'd be nicer if that were explicit in the code by defining TS_PREV_INC based on MAX_TIMESTAMP_PRECISION, sure, but they're in sync. It's just not that trivial to state in C. But they're in sync. Note also that TS_PREC_INV really is just an implementation detail for TSROUND(), it's not used elsewhere. These days we'd just write this in an static inline function, in all likelihood.

coldtea6y ago

>Common != universal. It's known up until someone doesn't know it.

Someone might also not know what a timestamp is, or a UNIX timestamp at least, so there's that.

>We have pretty powerful autocompletes, let use them instead, or just lose 3 seconds writing the 10 letters, it won't be so bad.

The problem with the above idea is that it implies "spelling it out fully == better". Which is not necessarily the case, long variable names can make code hard to follow and verbose. Ask the Java community...

bjourne6y ago

> For me, comments should only be needed when something isn't clear. Defining what isn't clear is hard to determine for sure, but that's one thing for which code review helps quite a bit.

Totally agree. Comments should be limited to sections in which the code is unexpected. For example, for a workaround for a bug in another part of the system. That you should comment because if the bug is ever fixed someone reading the code will understand why it looks wonky.

I don't agree with acronyms. They are fine to use as long as you are consistent. For example, if you write "ts" in one place, you can't write "timestamp", "tstamp", "tstmp" for the same data in some other code. In my code, I always use "n" for "length". Since I'm consistent about it, and ever use "n" for anything else, it doesn't make the code harder to read.

tln6y ago

> If it was named TIMESTAMP_ROUND, I wouldn't need to know "Round off to MAX_TIMESTAMP_PRECISION decimal places."

With only the name, how would you know how many decimal places? The comment isn't wrong/out of date here btw.

dwildOP6y ago

> With only the name, how would you know how many decimal places

I'm not saying the name is wrong because of the comment, I'm saying it's wrong because of the usage of the acronym.

> The comment isn't wrong/out of date here btw.

Isn't wrong? How? It's true in the sense that they expect TS_PREC_INV to be related to MAX_TIMESTAMP_PRECISION (which would be a perfect example to my mind of a needed comment, if actually it was a requirement), but it's actually false in the sense that it's not what that code does.

You wouldn't get a different rounding if you were to modify MAX_TIMESTAMP_PRECISION, which is what you would expect based on that comment.

tln6y ago

> I'm not saying the name is wrong because of the comment, I'm saying it's wrong because of the usage of the acronym.

I get that. But you also said the comment would be unnecessary with a name change. The comment does communicate more information than your proposed name, IMO, hence is not replaceable by the name. (IMO).

> You wouldn't get a different rounding if you were to modify MAX_TIMESTAMP_PRECISION, which is what you would expect based on that comment.

Good point. The comment isn't wrong with the definition of MAX_TIMESTAMP_PRECISION as it is in the code. If you override it though, the code doesn't do what the comment says.

It's an interesting case: if you trust the comment indicates the desired behavior, then you can see the code may have room to be improved. If you distrust that the comment is correct or has value, then you might just remove the comment, and the code doesn't get better.

aitchnyu6y ago

Are there guidelines for when abbreviations are ok in Python code? I tend to avoid them except popular abbreviations like admin and ts.

tenebrisalietum6y ago

Possibly INterVal?

IshKebab6y ago

Inverse.

Izkata6y ago

My guess was gonna be "invariant", but that makes more sense.

j / k navigate · click thread line to collapse

0 comments

jakewins6y ago

I'm thinking more of examples like this: https://github.com/postgres/postgres/blob/master/src/backend...

It's also.. friendly, somehow, to me. Working in this codebase is like participating in an ongoing and very slow conversation, which feels very pleasant.

dwildOP6y ago

That comment is exactly what I mean by when needed.

That does confirm that they are making pretty amazing code. I would have much prefered to get that file instead of the other one :P.

They do have a redundant comment at someplace but it's clearly a tiny minority and they aren't losing any one time.

jakewins6y ago

Yep, I think we're in agreement :)

anarazel6y ago

E.g. the code above is essentially (although somewhat mechanically renamed and moved since), from:

   commit 41f1f5b76ad8e177a2b19116cbf41384f93f3851
   Author: Thomas G. Lockhart <lockhart@fourpalms.org>
   Date:   2000-02-16 17:26:26 +0000

       Implement "date/time grand unification".

> Usage of acronyms is one of the worst offenders in bad code.

In the above I'm more bothered by the inconsistent naming, which I think is probably one postgres' bigger code quality issues.

> For me, comments should only be needed when something isn't clear.

I work a lot on postres, and occasionally dabble around the corners of linux. For me it's the* defining difference making it much more painful to understand most linux subsystems.

Edit: formatting, typo

inlined6y ago

coldtea6y ago

>If it was named TIMESTAMP_ROUND, I wouldn't need to know "Round off to MAX_TIMESTAMP_PRECISION decimal places."

TSROUND is already as obvious as TIMESTAMP_ROUND. TS is a very common abbreviation of timestamp.

And you would still need to know the decimal places.

dwildOP6y ago

> TS is a very common abbreviation of timestamp.

Common != universal. It's known up until someone doesn't know it. We have pretty powerful autocompletes, let use them instead, or just lose 3 seconds writing the 10 letters, it won't be so bad.

> And you would still need to know the decimal places.

Sure, by reading the code and understanding what it does and how it does it. You change a constant that will affect that code, it seems fine to see how it's affected either way.

> The real issue is that it's based on TS_PREC_INV and not MAX_TIMESTAMP_PRECISION as per the comment

Which is bound to happen when your documentation isn't the code directly.

anarazel6y ago

> > TS is a very common abbreviation of timestamp.

> Common != universal. It's known up until someone doesn't know it. We have pretty powerful autocompletes, let use them instead, or just lose 3 seconds writing the 10 letters, it won't be so bad.

I've an extremely hard time to believe that the widespread use of ts, xact, wal, ... is a significant factor in how quickly somebody can get started with the postgres code base.

> > The real issue is that it's based on TS_PREC_INV and not MAX_TIMESTAMP_PRECISION as per the comment

> Which is bound to happen when your documentation isn't the code directly.

coldtea6y ago

>Common != universal. It's known up until someone doesn't know it.

Someone might also not know what a timestamp is, or a UNIX timestamp at least, so there's that.

>We have pretty powerful autocompletes, let use them instead, or just lose 3 seconds writing the 10 letters, it won't be so bad.

bjourne6y ago

> For me, comments should only be needed when something isn't clear. Defining what isn't clear is hard to determine for sure, but that's one thing for which code review helps quite a bit.

tln6y ago

> If it was named TIMESTAMP_ROUND, I wouldn't need to know "Round off to MAX_TIMESTAMP_PRECISION decimal places."

With only the name, how would you know how many decimal places? The comment isn't wrong/out of date here btw.

dwildOP6y ago

> With only the name, how would you know how many decimal places

I'm not saying the name is wrong because of the comment, I'm saying it's wrong because of the usage of the acronym.

> The comment isn't wrong/out of date here btw.

You wouldn't get a different rounding if you were to modify MAX_TIMESTAMP_PRECISION, which is what you would expect based on that comment.

tln6y ago

> I'm not saying the name is wrong because of the comment, I'm saying it's wrong because of the usage of the acronym.

> You wouldn't get a different rounding if you were to modify MAX_TIMESTAMP_PRECISION, which is what you would expect based on that comment.

Good point. The comment isn't wrong with the definition of MAX_TIMESTAMP_PRECISION as it is in the code. If you override it though, the code doesn't do what the comment says.

aitchnyu6y ago

Are there guidelines for when abbreviations are ok in Python code? I tend to avoid them except popular abbreviations like admin and ts.

tenebrisalietum6y ago

Possibly INterVal?

IshKebab6y ago

Inverse.

Izkata6y ago

My guess was gonna be "invariant", but that makes more sense.

j / k navigate · click thread line to collapse