`three = 1` in the linux sourcecode | Better HN

`three = 1` in the linux sourcecode | Better HN

81 comments

lifthrasiir12y ago

https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d2...

Variables `three`, `five` and `seven` are better described as `next_power_of_three`, `next_power_of_five` and `next_power_of_seven`. Since the `ext4_list_backups` function should iterate through 1 (= 3^0 = 5^0 = 7^0), 3, 5, 7, 3^2, 5^2, 3^3, 7^2, ... and 1 should not repeat three times, the initial value of `next_power_of_three` (or any of others) should be 1 and those of others should not be 1. The naming is a bit unfortunate (couldn't they be `threes` etc., for example?) but actually makes sense.

Syssiphus12y ago

A perfect example of a missing code-comment.

dangrossman12y ago

The explanatory comment is 40 lines earlier in the same file where the function those variables are being passed to is defined. It need not be repeated every couple lines; "being clear to outsiders linked to a specific line of a specific file without context" is not a reasonable concern.

mattmanser12y ago

More like a terribly named variable.

CrystalCuckoo12y ago

Not really, the problem lies in its naming. Comments that explain what the variables represent signify that they should have been better named in the first place to avoid any confusion.

simias12y ago

Read the comment above ext4_list_backups right above:

https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d2...

    /*
     * Iterate through the groups which hold BACKUP superblock/GDT copies in an
     * ext4 filesystem.  The counters should be initialized to 1, 5, and 7 before
     * calling this for the first time.  In a sparse filesystem it will be the
     * sequence of powers of 3, 5, and 7: 1, 3, 5, 7, 9, 25, 27, 49, 81, ...
     * For a non-sparse filesystem it will be every group: 1, 2, 3, 4, ...
     */

I'm not sure what's so noteworthy about that. There's ton of arcane code in any kernel, as long as it's commented correctly there's no issue.

h2s12y ago

See this comment: https://news.ycombinator.com/item?id=7296586

Good commenting is no substitute for good naming. For a variable containing the number 1, "three" is a shitty name.

simias12y ago

Okay? Is that really worthy of being submitted to hn however?

The function being called is directly above this declaration. It's a static function not used outside of this file. What are the chances that someone would edit this code without understanding what "three" is used for in this context? Pretty slim I wager.

It's good that people are auditing the linux source code but if you stumble upon some weird looking code (which again, is not the case here in my opinion) the right way to deal with it is not to post it on hacker news. Contact the maintainer (look at the MAINTAINER file at the root of the kernel) if possible with a patch to fix the issue.

thaumasiotes12y ago

But the variable doesn't contain the number 1. As is clearly explained in the comment, it's only initialized to 1; it contains any arbitrary power of 3 (I'm assuming "arbitrary" because the variable is passed by address to a function accepting a pointer; as long as that kind of thing is going on, who knows what's inside the variable). And that's true from initialization onwards, as 1 is the 0th power of 3.

If I adjust your comment to "for a variable whose only purpose is to be passed to another function as the parameter called 'three', 'three' is a shitty name", would you endorse that sentiment?

troels12y ago

I don't know. Upon seeing a variable named "three", I wouldn't immediately assume it contained the literal number 3, since that would be rather pointless use of a variable. It's an odd name, but I don't think it'll be mistaken for its literal meaning.

grannyg00se12y ago

"as long as it's commented correctly there's no issue"

How do you know it is commented correctly? Now you have to understand the code and the comment and mentally confirm them to be in sync. You also have to maintain the comment when you make changes to the code and you have to trust that others are going to do the same.

It would be better if the code itself was expressive enough so that a comment wasn't needed.

aduitsis12y ago

If I'm not mistaken, the first number (1) is not considered a power of three, hence the initial (seemingly strange) three=1.

See http://www.nongnu.org/ext2-doc/ext2.pdf#page15 for a slightly more detailed explanation.

"The first version of ext2 (revision 0) stores a copy at the start of every block group, along with backups of the group descriptor block(s). Because this can consume a considerable amount of space for large filesystems, later revisions can optionally reduce the number of backup copies by only putting backups in specific groups (this is the sparse superblock feature). The groups chosen are 0, 1 and powers of 3, 5 and 7."

(Edit: Yes, I know that 3^0=1, but the wording "0, 1 and powers of 3, 5" does not imply 3^0. Thanks!)

glomph12y ago

1 is the zeroth power of 3

keypusher12y ago

If only there was a system for fixing the code yourself instead of having to post about it to a widely read tech site.

VierScar12y ago

It's not a bug, as shown by the currently top comment. It is however, misleading and could be better documented or named. Still, it is at least a little amusing, worth posting here?

spinlock12y ago

I think its worth posting. It's the first piece of code I've seen on hacker news in a while. It also makes a great point about good variable names making code clear.

cauliturtle12y ago

> There are only two hard things in Computer Science: cache invalidation and naming things.

> -- Phil Karlton

ProblemFactory12y ago

I've always thought that the "naming things" refers to the more subtle problem of giving things unique identifiers in distributed systems, rather than coming up with variable names.

"Naming things" includes systems like MAC address allocation, IP address allocation, DNS, URLs for documents, process IDs, name to inode mapping in filesystems, autoincrement primary keys in databases, etc.

Any ideas on what Karlton really meant with the quote?

bnegreve12y ago

Let me overthink this a little bit:

It seems to me that this quote makes a rather standard usage of the "Zeugma" [1] figure of speech. It consists in putting next to each other two things of very different nature. In that case we have the very technical problem of invalidating caches on the one hand, and the much higher level problem of naming things on the other hand.

That produces a rather stylish quote.

But if Phil Karlton was actually referring to the technical problem of assigning unique identifier as you suggest, the figure of speech is lost and the quote becomes less stylish.

So I prefer tho read "naming things" as the very general problem of naming things.

[1] http://en.wikipedia.org/wiki/Zeugma

omh12y ago

I always assumed it was the more general "coming up with names" - for variables, hostnames, project names etc.

The non-human-readable things like MACs, IPs and auto-increment keys are easy. But when I have to come up with a short, memorable and non-confusing name for something like a script that can take me longer than writing the code.

dexen12y ago

Perhaps the Funarg problem [1]:

>The difficulty only arises if the body of a nested function refers directly (i.e., not via argument passing) to identifiers defined in the environment in which the function is defined, but not in the environment of the function call.

Also, the version I know has much more bite:

>There is also a variation on this that says there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

[1] http://en.wikipedia.org/wiki/Funarg_problem

Roonerelli12y ago

there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors

peteretep12y ago

there are three hard things concurrency, in computer science: cache invalidation, naming things, and off-by-one errors

Roonerelli12y ago

http://martinfowler.com/bliki/TwoHardThings.html

Encosia12y ago

I think you were off by one.

koverstreet12y ago

/yawn

https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d2...

codereflection12y ago

The comment on that function is a bit more entertaining than the original post.

jakobe12y ago

Lot's of people suggesting to add explanatory comments. I don't think that's a good idea, it would be better to change the names of the variables:

    unsigned counter1 = 1;
    unsigned counter2 = 5;
    unsigned counter3 = 7;

or use an array:

    unsigned counter[3] = {1,5,7};

No more confusion. That being said, I doubt that the names of these local variables is actually a real source of confusion and is not a problem that needs to be fixed.

jcalvinowens12y ago

> or use an array

That's absolutely horrible: you've now introduced the possibility of accidental out-of-bounds access where there previously was none, not to mention that you just forced the compiler to put those variables on the stack instead of using registers.

Contrived example:

  int array(int lol)
  {
        unsigned counter[3] = {1,5,7};

        for (int i = 0; i < lol; i++)
                counter[i % 3]++;

        return counter[0] + counter[1] + counter[2];
  }

  int vars(int lol)
  {
        unsigned counter1 = 1;
        unsigned counter2 = 5;
        unsigned counter3 = 7;

        for (int i = 0; i < lol; i++)
                switch(i % 3) {
                case 0: counter1++;
                case 1: counter2++;
                case 2: counter3++;
                }

        return counter1 + counter2 + counter3;
  }

The output is quite different (gcc -c -O3 -std=gnu99):

  Disassembly of section .text:

  0000000000000000 <array>:
   0:   85 ff                   test   %edi,%edi
   2:   c7 44 24 e8 01 00 00    movl   $0x1,-0x18(%rsp)
   9:   00 
   a:   c7 44 24 ec 05 00 00    movl   $0x5,-0x14(%rsp)
  11:   00 
  12:   c7 44 24 f0 07 00 00    movl   $0x7,-0x10(%rsp)
  19:   00 
  1a:   7e 5f                   jle    7b <array+0x7b>
  1c:   41 b8 01 00 00 00       mov    $0x1,%r8d
  22:   31 c9                   xor    %ecx,%ecx
  24:   be 56 55 55 55          mov    $0x55555556,%esi
  29:   eb 1f                   jmp    4a <array+0x4a>
  2b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  30:   89 c8                   mov    %ecx,%eax
  32:   f7 ee                   imul   %esi
  34:   89 c8                   mov    %ecx,%eax
  36:   c1 f8 1f                sar    $0x1f,%eax
  39:   29 c2                   sub    %eax,%edx
  3b:   8d 04 52                lea    (%rdx,%rdx,2),%eax
  3e:   89 ca                   mov    %ecx,%edx
  40:   29 c2                   sub    %eax,%edx
  42:   48 63 c2                movslq %edx,%rax
  45:   44 8b 44 84 e8          mov    -0x18(%rsp,%rax,4),%r8d
  4a:   89 c8                   mov    %ecx,%eax
  4c:   f7 ee                   imul   %esi
  4e:   89 c8                   mov    %ecx,%eax
  50:   c1 f8 1f                sar    $0x1f,%eax
  53:   29 c2                   sub    %eax,%edx
  55:   8d 04 52                lea    (%rdx,%rdx,2),%eax
  58:   89 ca                   mov    %ecx,%edx
  5a:   83 c1 01                add    $0x1,%ecx
  5d:   29 c2                   sub    %eax,%edx
  5f:   39 f9                   cmp    %edi,%ecx
  61:   48 63 c2                movslq %edx,%rax
  64:   41 8d 50 01             lea    0x1(%r8),%edx
  68:   89 54 84 e8             mov    %edx,-0x18(%rsp,%rax,4)
  6c:   75 c2                   jne    30 <array+0x30>
  6e:   8b 44 24 ec             mov    -0x14(%rsp),%eax
  72:   03 44 24 e8             add    -0x18(%rsp),%eax
  76:   03 44 24 f0             add    -0x10(%rsp),%eax
  7a:   c3                      retq   
  7b:   b8 0d 00 00 00          mov    $0xd,%eax
  80:   c3                      retq   
  81:   66 66 66 66 66 66 2e    data32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
  88:   0f 1f 84 00 00 00 00 
  8f:   00 

  0000000000000090 <vars>:
  90:   85 ff                   test   %edi,%edi
  92:   7e 52                   jle    e6 <vars+0x56>
  94:   31 c9                   xor    %ecx,%ecx
  96:   41 b8 07 00 00 00       mov    $0x7,%r8d
  9c:   41 b9 05 00 00 00       mov    $0x5,%r9d
  a2:   41 bb 01 00 00 00       mov    $0x1,%r11d
  a8:   41 ba 56 55 55 55       mov    $0x55555556,%r10d
  ae:   89 c8                   mov    %ecx,%eax
  b0:   89 ce                   mov    %ecx,%esi
  b2:   41 f7 ea                imul   %r10d
  b5:   c1 fe 1f                sar    $0x1f,%esi
  b8:   89 c8                   mov    %ecx,%eax
  ba:   29 f2                   sub    %esi,%edx
  bc:   8d 14 52                lea    (%rdx,%rdx,2),%edx
  bf:   29 d0                   sub    %edx,%eax
  c1:   83 f8 01                cmp    $0x1,%eax
  c4:   74 09                   je     cf <vars+0x3f>
  c6:   83 f8 02                cmp    $0x2,%eax
  c9:   74 08                   je     d3 <vars+0x43>
  cb:   41 83 c3 01             add    $0x1,%r11d
  cf:   41 83 c1 01             add    $0x1,%r9d
  d3:   83 c1 01                add    $0x1,%ecx
  d6:   41 83 c0 01             add    $0x1,%r8d
  da:   39 f9                   cmp    %edi,%ecx
  dc:   75 d0                   jne    ae <vars+0x1e>
  de:   45 01 d9                add    %r11d,%r9d
  e1:   43 8d 04 01             lea    (%r9,%r8,1),%eax
  e5:   c3                      retq   
  e6:   b8 0d 00 00 00          mov    $0xd,%eax
  eb:   c3                      retq

Arrays mean memory - don't use them unless you actually want the compiler to use memory.

jakobe12y ago

Your example has nothing to do with the actual code in the kernel. In the original code, the counters are passed by reference to a function. That means they need to be in memory anyway, right? The compiler can't pass the address of a register.

userbinator12y ago

This is filesystem code. Any miniscule performance gain would be eclipsed by the actual reads/writes. This code should really be -Os'd to save cache.

adobriyan12y ago

Why don't you all "deserves better comment" people send a patch?

_pmf_12y ago

> Why don't you all "deserves better comment" people send a patch?

Trying to get a patch into Linux as a new developer without personally knowing one of the (sub-)lieutenants is a futile exercise.

apaprocki12y ago

I did it (and repeated a few times) and I didn't know anyone. I did it by reading about the process, formatting my patch correctly, and sending it to the appropriate list. If it were so difficult, Linux would not be a success. Now, just like other large projects, maintainers might dislike comment-only nit patches, but if it objectively increases readability it might work.

ah-12y ago

That doesn't match my experience at all.

Find the maintainer of the relevant subsystem, talk to them via mail or IRC about what's bugging you and how to improve it, implement it, send a patch and it will most likely be accepted.

Where I have had difficulties getting something accepted is if your patch might cause problems later or doesn't solve the problem in a way that aligns with the vision of the maintainer. But while that might be annoying as a developer that wants a problem fixed now, I'd say that it's probably for the better of the whole kernel.

adobriyan12y ago

> Trying to get a patch into Linux as a new developer without [personally] knowing one of the (sub-)lieutenants is a futile exercise.

You're totally wrong about "personal" part.

vanderZwan12y ago

Yes, this opportunity to make a reference to De La Soul should not be wasted!

userbinator12y ago

That's a very... awkward of doing quite a simple thing, generating ascending interleaved powers. I'd probably use one or more arrays of multipliers to maintain the counters. Simpler and more general -- not often that you can get a solution with both these two attributes.

raverbashing12y ago

And here's how the "rainbows and butterflies" world of "code perfectionists" come crashing down.

Yes, the naming is unfortunate. No, there's probably no way to make it better (and if you think code reviews suck, wait until you see a kernel code review) (Well, this came directly from Linus but there's usually some discussion as well)

But in the end this has shipped and is running. (Well, the double goto from Apple as well, but I doubt that went through a code review)

jheriko12y ago

there are many, many problems in this code style which lead to this kind of problem.

localised comments may help but naming things properly would help more (and negate a comment being required at all), actually adopting some better practices would be a better fix going forward.

looking at the called function's comment:

The counters should be initialized to 1, 5, and 7 before * calling this for the first time.

so how about making a function with the same name, ending in '_first' (maybe rename the original _next) which initialises these variables internally before calling the version intended to be called multiple times...?

since the parameters are not well described by their names renaming them would help too... it is not a pointer to a 3 a 5 or a 7. it might not even be pointing at multiples... but current_multiple_of_three etc. would be better. there is some reason why these values are significant in this algorithm (which I do not know) and the best names would capture that.

the code is moderately difficult to read in the function itself - the method of swapping around a local copy of a pointer based on the conditions is not very easy to follow. although makes sense here to avoid excessive code... comments would help make it clearer what is going on.

yusukeshinyama12y ago

At line 655: "... In a sparse filesystem it will be the sequence of powers of 3, 5, and 7: ..."

So I think this means three = pow(3, 0);

rockdoe12y ago

Doesn't explain why it's not 1, 1, 1 or 1, 5, 49.

thaumasiotes12y ago

It does, you're just not paying attention. The comment appears over the function that uses those variables; it's not 1,1,1 because they want to hit 1 only once in the sequence 1,3,5,7,9,25,27,49,81,125, etc.

It doesn't explain why it's not 3,1,7 or 3,5,1, but that's because there is no reason for that.

chrisBob12y ago

Wasn't there a similar thread a few months back about "two" being corrected to equal 2?

adobriyan12y ago

Speaking of...

  kernel/sysctl.c

  #ifdef CONFIG_LOCKUP_DETECTOR
  static int sixty = 60;
  #endif
  
  static int __maybe_unused neg_one = -1;
  
  static int zero;
  static int __maybe_unused one = 1;
  static int __maybe_unused two = 2;
  static int __maybe_unused three = 3;
  static unsigned long one_ul = 1;
  static int one_hundred = 100;
  #ifdef CONFIG_PRINTK
  static int ten_thousand = 10000;
  #endif

Vextasy12y ago

Seems like a perfectly reasonable naming scheme to me. Anyone foolish enough to think that the variables are there to hold the values 3, 5 and 7 is probably wasting their time on the code in the first place.

Besides, it made you read the code and the comments.

blueskin_12y ago

Reminds me of 'how to write unmaintainable code':

http://mindprod.com/jgloss/unmaincamouflage.html

athenagoras12y ago

Trinitarian theology?

Trindaz12y ago

I finally have an answer to the "what's your favorite line of linux source code" question.

virtualSatai12y ago

This is a great one too: https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d2...

zamalek12y ago

Is this because there is some inconsistency in the GDT documentation?

wazoox12y ago

That deserves an explanation in the code comments...

h2s12y ago

Better still, it deserves to be replaced with a better name. Comments should not be our first port of call when we notice an issue like this in code. Fix the name. If you can't fix the name, you hang your head in shame and then you write a comment.

dkersten12y ago

https://news.ycombinator.com/item?id=7296583

j / k navigate · click thread line to collapse