What the heck is the value of “-n % n” in programming languages? (opens in new tab)

(lemire.me)

27 pointsvii5y ago29 comments

29 comments

Did I miss something? I got to the end and still don't know what -n % n actually accomplishes in practice, nor why it's mainly used in high-performance code

edflsafoiewq5y ago

It computes (2^b) % n, assuming n is an unsigned b-bit integer. You can't do this directly, since 2^b itself doesn't fit into a b-bit integer.

bertr4nd5y ago

First it's important to note that `n` is unsigned; if it's signed the value of `-n % n` is 0, intuitively.

For unsigned n, the value is: MAX - n + 1 (where max is the maximum representable value in the type of n, e.g., UINT_MAX). The article explains this nicely. (I thought of 2's complement when reasoning through this, but you don't actually need to assume 2's complement to follow the reasoning).

So, `-n % n` computes `(MAX - n + 1) % n` efficiently, without needing to worry about corner cases.

I suspect this is useful when you want to generate random numbers with a limited range, where the range doesn't cleanly divide UINT_MAX. You need to cut a bit off the top from your underlying random number generator.

kazinator5y ago

-n % n (where n is unsigned) suffers from the problem that -n calculates a two's complement. That value is implementation-defined, due to the implementation-defined width of the unsigned type.

It's calculating ((-n) mod (2^bits)) mod n, where bits is compiler/platform-dependent.

  ;; TXR Lisp
  1> (defun -n%n (n bits)
       (mod (mod (- n) (expt 2 bits)) n))
  -n%n
  2> (-n%n 7 8)
  4
  3> (-n%n 7 16)
  2
  4> (-n%n 7 17)
  4
  5> (-n%n 7 18)
  1
  6> (-n%n 7 32)
  4
  7> (-n%n 7 64)
  2

I would not use this, except possibly with a precise-width type like uint32_t.

wahern5y ago

> -n % n (where n is unsigned) suffers from the problem that -n calculates a two's complement. That value is implementation-defined, due to the implementation-defined width of the unsigned type.

In C a "computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type." (C11 6.2.5p9) In other words, it behaves as-if it were two's complement. So the "problem" of assuming two's complement isn't actually a problem at all. (On machines that use signed-magnitude representation for integer types, for example, such as some only recently retired Unisys mainframes, C compilers actually emulate unsigned arithmetic in the generated code.)

While the number of value and padding bits of integer types (other than the fixed-width types) is implemented-defined, that dilemma is precisely what the construct -n % n is intended to deal with, and to do so in a type-safe manner (i.e. no risk of somebody changing the underlying types without changing UFOO_MAX to UBAR_MAX, presuming the macros even exist).

Your alternative construct just assumes "bits" out of thin air, but because the number of padding bits is implementation-defined in C you can only deduce this from UFOO_MAX[1]. In the next C version there should be new macros that specify the number of value bits, but there's still the type-safety problem of the compiler unable to enforce the constraint that a macro (or any other parameter, for that matter) represents a characteristic of the type your primary expression actually uses. This can be problematic even in languages where you can directly query properties of a type.

-n % n is well suited to its purpose. Elegant, even. And not just in the context of C.

[1] EDIT: Or, alternatively, (unsigned type)-1.

bertr4nd5y ago

Woah, there's even a scary gotcha for types that aren't unsigned. E.g., check out what happens with "unsigned short": https://gcc.godbolt.org/z/4xWo1G. It looks like -n is implicitly converted (to signed int, maybe?) so unless you explicitly re-cast it to "unsigned short", you get something unexpected.

kazinator5y ago

If n is unsigned short then -n means n is first promoted to either int, or unsigned int: the first of those two types which holds all values of unsigned short. Then the - operation is taking place on the resulting value in that promoted type.

An example where unsigned short promotes to unsigned int are platforms where sizeof(short) == sizeof(int). E.g. 16 bit systems where short and int is 16 bits, and long is 32: compilers for 8086 and such.

If the promotion goes to int, the subsequent - is safe; there is no way the resulting value can be that problematic most negative int of two's complement.

On two's complement systems, even if the - is signed, if you convert the value back to unsigned short, you will get the two's complement value, as if the calculation was done in the unsigned type all along.

For instance a 16 bit unsigned short value of 65535 (0xFFFF) will go to 65535 of type int, which will go to -65535, which will then map to 1 if converted to unsigned short.

1 more reply

seppel5y ago

Well, it computes "(MAX - n + 1) % n", where MAX obviously depends on the type.

r-w5y ago

  let threshold = (0 &- range) % range

> It is somewhat inconvenient that Swift forces us to write so much code, but we must admit that the result is probably less likely to confuse a good programmer.

Oh come on, it's basically just one extra character.

kazinator5y ago

> warning C4146: unary minus operator applied to unsigned type, result still unsigned

   #ifdef _MSVC
   x = ~x + 1; // "Manual" two's complement to avoid warning.
   #else
   x = -x; // Regular two's complement any good C coder knows
   #endif

Arnavion5y ago

If you're going to go to the effort of writing the complement version instead of pragma-suppressing the warning on that line, just remove the ifdef and use that with all compilers.

kazinator5y ago

Then I would look like someone who doesn't know that unary minus on an unsigned int obtains the two's complement. That aspect can be taken care of by a comment. Why I might go for the #ifdef is to avoid changing working code, except for that one compiler that is complaining.

gumby5y ago

This only works on two's complement machines. The C++ standard only required two's complement with c++20.

That being said I am not sure I've ever programmed a one's complement machine.

Arnavion5y ago

>This only works on two's complement machines.

No it does not. The behavior of -n where n is an unsigned integer is defined by the standard to be the result of subtracting n from 2^(number of bits in the type), independent of the characteristics of the machine.

This is true of both C and C++; and has been true since at least C99 and C++03

Random C++ standard draft from 2005: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n190... Section 5.3.1

>The operand of the unary - operator shall have arithmetic or enumeration type and the result is the negation of its operand. Integral promotion is performed on integral or enumeration operands. The negative of an unsigned quantity is computed by subtracting its value from 2^n, where n is the number of bits in the promoted operand. The type of the result is the type of the promoted operand.

Random C standard draft from 2007: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf Section 6.2.5

>The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

gumby5y ago

I don’t read those extracts as implying the same result as you do. The result type (promoted type in the C++ case) will still be unsigned, yet will be in most cases (especially small ones) full of 1s where there were zeros, and won’t compute the expected modulus. Draw out the bit patterns of 1’s vs 2’s complement for (say) n=5 and n=65535 and see if I have it wrong.

Arnavion5y ago

Not sure what you're trying to say. The point of what I wrote, and what the article says, is that `-n` is equivalent to `2^N - n`, and thus `-n % n` is equivalent to `(2^N - n) % n`.

If 2^N - n happens to be a number that's "full of 1s where there were zeros", then that's what it's supposed to be. I don't understand why you think it "won't compute the expected modulus". That's exactly what it will do.

If N is 32 and n is 5, then -n is 4294967291 and -n % n is 1. It doesn't matter whether that was calculated on a ones complement or twos complement machine.

lights01235y ago

> The ampersand (%) in this expression

s/ampersand/percent sign/

baconshirt5y ago

I read that too. And it was also were I decided to stop.

nkurz5y ago

> And it was also were I decided to stop.

I can see why you might think this is a good heuristic---don't trust blog authors who don't know the right names for common typographic symbols---but in this case I think you went wrong. Daniel is an excellent computer scientist, with a talent for writing straightforward papers about difficult topics. He's also a non-native English speaker who is usually very gracious about accepting corrections to his normally excellent prose. In this case, it was probably just a typo. It's already been corrected---likely because of a comment on the post. But if you ignore everything that contains small temporary errors like this, you're probably missing out on a lot of otherwise good articles.

jhayward5y ago

Well, in this instance your knee-jerk isn't helping you. Take a look at Dr. Lemire's list of papers:

https://lemire.me/en/#publications

There's a lot of high-performance machine code knowledge to be gained there.

musicale5y ago

It's zero in Python, which makes perfect sense to me. Quotient is negative, remainder is zero.

zamadatix5y ago

Because:

> When the variable range is a signed integer, then the expression -range % range is zero. In a programming language with only signed integers, like Java, this expression is always zero.

Not because of your intuition on the quotient. E.g. try -8 % 5, -8 % 6, and -8 % 7 and this feeling of perfect sense compared to the languages with unsigned types should melt away.

hexo5y ago

int main(void) { unsigned int n = 7; printf("%i", (-n % n)); return 0; }

this gives 4 :(

kazinator5y ago

Note that %i requires an int argument, not unsigned; you want %u.

Arnavion5y ago

If your int is 32 bits wide, 4 is exactly what it should give.

foldr5y ago

Apart from the overall oddness of this, the result of applying the modulo operator to signed arguments is technically undefined behavior in C.

shric5y ago

This isn't correct, section 6.5.5 states merely that "The operands of the % operator shall have integer type."

foldr5y ago

Ah, my bad, it seems this was the case for K&R C but the behavior was since standardized.

j / k navigate · click thread line to collapse

29 comments

brundolf5y ago

Did I miss something? I got to the end and still don't know what -n % n actually accomplishes in practice, nor why it's mainly used in high-performance code

edflsafoiewq5y ago

It computes (2^b) % n, assuming n is an unsigned b-bit integer. You can't do this directly, since 2^b itself doesn't fit into a b-bit integer.

bertr4nd5y ago

First it's important to note that `n` is unsigned; if it's signed the value of `-n % n` is 0, intuitively.

So, `-n % n` computes `(MAX - n + 1) % n` efficiently, without needing to worry about corner cases.

kazinator5y ago

-n % n (where n is unsigned) suffers from the problem that -n calculates a two's complement. That value is implementation-defined, due to the implementation-defined width of the unsigned type.

It's calculating ((-n) mod (2^bits)) mod n, where bits is compiler/platform-dependent.

  ;; TXR Lisp
  1> (defun -n%n (n bits)
       (mod (mod (- n) (expt 2 bits)) n))
  -n%n
  2> (-n%n 7 8)
  4
  3> (-n%n 7 16)
  2
  4> (-n%n 7 17)
  4
  5> (-n%n 7 18)
  1
  6> (-n%n 7 32)
  4
  7> (-n%n 7 64)
  2

I would not use this, except possibly with a precise-width type like uint32_t.

wahern5y ago

> -n % n (where n is unsigned) suffers from the problem that -n calculates a two's complement. That value is implementation-defined, due to the implementation-defined width of the unsigned type.

-n % n is well suited to its purpose. Elegant, even. And not just in the context of C.

[1] EDIT: Or, alternatively, (unsigned type)-1.

bertr4nd5y ago

kazinator5y ago

If the promotion goes to int, the subsequent - is safe; there is no way the resulting value can be that problematic most negative int of two's complement.

For instance a 16 bit unsigned short value of 65535 (0xFFFF) will go to 65535 of type int, which will go to -65535, which will then map to 1 if converted to unsigned short.

1 more reply

seppel5y ago

Well, it computes "(MAX - n + 1) % n", where MAX obviously depends on the type.

r-w5y ago

  let threshold = (0 &- range) % range

> It is somewhat inconvenient that Swift forces us to write so much code, but we must admit that the result is probably less likely to confuse a good programmer.

Oh come on, it's basically just one extra character.

kazinator5y ago

> warning C4146: unary minus operator applied to unsigned type, result still unsigned

   #ifdef _MSVC
   x = ~x + 1; // "Manual" two's complement to avoid warning.
   #else
   x = -x; // Regular two's complement any good C coder knows
   #endif

Arnavion5y ago

If you're going to go to the effort of writing the complement version instead of pragma-suppressing the warning on that line, just remove the ifdef and use that with all compilers.

kazinator5y ago

gumby5y ago

This only works on two's complement machines. The C++ standard only required two's complement with c++20.

That being said I am not sure I've ever programmed a one's complement machine.

Arnavion5y ago

>This only works on two's complement machines.

This is true of both C and C++; and has been true since at least C99 and C++03

Random C++ standard draft from 2005: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n190... Section 5.3.1

Random C standard draft from 2007: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf Section 6.2.5

gumby5y ago

Arnavion5y ago

Not sure what you're trying to say. The point of what I wrote, and what the article says, is that `-n` is equivalent to `2^N - n`, and thus `-n % n` is equivalent to `(2^N - n) % n`.

If N is 32 and n is 5, then -n is 4294967291 and -n % n is 1. It doesn't matter whether that was calculated on a ones complement or twos complement machine.

lights01235y ago

> The ampersand (%) in this expression

s/ampersand/percent sign/

baconshirt5y ago

I read that too. And it was also were I decided to stop.

nkurz5y ago

> And it was also were I decided to stop.

jhayward5y ago

Well, in this instance your knee-jerk isn't helping you. Take a look at Dr. Lemire's list of papers:

https://lemire.me/en/#publications

There's a lot of high-performance machine code knowledge to be gained there.

musicale5y ago

It's zero in Python, which makes perfect sense to me. Quotient is negative, remainder is zero.

zamadatix5y ago

Because:

> When the variable range is a signed integer, then the expression -range % range is zero. In a programming language with only signed integers, like Java, this expression is always zero.

Not because of your intuition on the quotient. E.g. try -8 % 5, -8 % 6, and -8 % 7 and this feeling of perfect sense compared to the languages with unsigned types should melt away.

hexo5y ago

int main(void) { unsigned int n = 7; printf("%i", (-n % n)); return 0; }

this gives 4 :(

kazinator5y ago

Note that %i requires an int argument, not unsigned; you want %u.

Arnavion5y ago

If your int is 32 bits wide, 4 is exactly what it should give.

foldr5y ago

Apart from the overall oddness of this, the result of applying the modulo operator to signed arguments is technically undefined behavior in C.

shric5y ago

This isn't correct, section 6.5.5 states merely that "The operands of the % operator shall have integer type."

foldr5y ago

Ah, my bad, it seems this was the case for K&R C but the behavior was since standardized.

j / k navigate · click thread line to collapse