That XOR Trick (2020) (opens in new tab)

(florian.github.io)

320 pointshundredwatt11mo ago130 comments

130 comments

For calculating the XOR of 1 to n there is a closed form solution, so no need to XOR them together in a loop.

  (n & ((n & 1) - 1)) + ((n ^ (n >> 1)) & 1)

Or a much more readable version

  [ n, 1, n + 1, 0 ][n % 4]

which makes it clear that this function cycles through a pattern of length four.

Why this works can be seen if we start with some n that is divisible by four, i.e. it has the two least significant bits clear, and then keep XORing it with its successors. We start with xxxxxx00 which is our n. Then we XOR it with n + 1 which is xxxxxx01 and that clears all the x's and leaves us with 00000001. Now we XOR it with n + 2 which is xxxxxx10 and that yields xxxxxx11 which is n + 3. The cycle finishes when we now XOR it it with n + 3 which yields 00000000. So we get n, 1, n + 3, 0 and then the cycle repeats as we are back at zero and at n + 4 which is again divisible by four.

sdenton410mo ago

Nice!

My offhand solution not using xor is to subtract from the sum of 1 to n, which has a closed form solution. The closed form roughly halves the execution time, as we only have to iterate over the range once.

Good to know there's a similar speedup available on the xor path...

tomtomtom77710mo ago

Fascinating. It can see it work but I still can't really wrap my head around where the magic cycle length of 4 comes from.

danbruc10mo ago

Combining two consecutive integers starting with an even one yields one.

  xxxxxxx0 ^ xxxxxxx1 = 00000001

If we start at a number divisible by four and do this twice, we get one twice.

  xxxxxx00 ^ xxxxxx01 = 00000001
  xxxxxx10 ^ xxxxxx11 = 00000001

And combining the two of course yields zero and we are right back at the start.

betasilly10mo ago

Another interesting fact is that each time you make the xor of four consecutive numbers, beginning with an even number, the result is zero. Example in J.

  xor =: (16 + 2b0110) b.
  f =: 3 : 'xor/ y + i. 4'
  f"0 ] 2 * 1 + i. 100

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Summing a hundred millions: +/ f"0 ] 2 * i 100000000 gives zero (it takes a few seconds). So it seems the stated property holds for every even n.

1 more reply

NickPollard10mo ago

There are essentially two bits of information in the 'state' of this iterated algorithm: a) Are all the non-lowest bits zero, or are they the value of the latest N b) the value of the lowest bit

So the cycle of (N, 1, N+3, 0) corresponds to (A) and (B) being: (0,0), (0,1), (1,1), (1, 0) - i.e. the 4 possible combinations of these states.

HappyPanacea10mo ago

If we generalize the problem to base k (they are k-1 duplicate of each number except the missing number, find missing one using base k-wise addition) then we can see the cycle is the smallest number such the base k-wise addition from 1 to the number is zero and it is power of k will form a cycle. I'm not sure if all such numbers are power of k if they exists or if there is an upper bound on them. For example in base 4 there appears to be no such cycle.

1 more reply

Thorrez10mo ago

In your array-based equation, you say n+1, but in your explanation you say n+3. Is that a mistake?

danbruc10mo ago

No, that is correct, those two n represent slightly different things. n + 3 is the value after n XOR n + 1 XOR n + 2, so the n in the array index expression is n + 2 from the explaination and n + 3 results from (n + 2) + 1. I thought about how I could make this less confusing but it just became more confusing in my mind, so just used n in both cases.

1 more reply

skullt10mo ago

There's a bit of a trick in that solution: n is assumed to have the lower two bits clear so for an arbitrary n the array would really be:

[(n & ~3), 1, (n & ~3) + 3, 0][n % 4]

where the (n & ~3) makes sure those lower 2 bits are cleared. But note that we only ever can look at the first element when n % 4 == 0. In that case, (n & ~3) == n already. And further, we only ever can look at the third element when n % 4 == 2. In that case (n & ~3) == n - 2, so (n & ~3) + 3 == n + 1. Hence the array can be simplified to the one given in the other comment.

antirez10mo ago

About one month ago I applied XOR in a similar (but a bit more complicated way) to Redis Vector Sets implementation, in the context of sanity check of loading a vset value from the RDB file. I believe the way it works is quite interesting and kinda extends the applicability of the trick in the post.

The problem is that in vector sets, the HNSW graph has the invariant that each node has bidirectional links to a set of N nodes. If A links to B, then B links to A. This is unlike most other HNSW implementations. In mine, it is required that links are reciprocal, otherwise you get a crash.

Now, combine this with another fact: for speed concerns, Redis vector sets are not serialized as

    element -> vector

And then reloaded and added back to the HNSW. This would be slow. Instead, what I do, is to serialize the graph itself. Each node with its unique ID and all the links. But when I load the graph back, I must be sure it is "sane" and will not crash my systems. And reciprocal links are one of the things to check. Checking that all the links are reciprocal could be done with an hash table (as in the post problem), but that would be slower and memory consuming, so how do we use XOR instead? Each time I see a link A -> B, I normalize it swapping A and B in case A>B. So if links are reciprocal I'll see A->B A->B two times, if I use a register to accumulate the two IDs and XOR them, at the end, if the register is NOT null I got issues: some link may not be reciprocal.

However, in this specific case, there is a problem: collisions. The register may be 0 even if there are non reciprocal links in case they are fancy, that is, the non-reciprocal links are a few and they happen to XOR to 0. So, to fix this part, I use a strong (and large) hash function that will make the collision extremely unlikely.

It is nice now to see this post, since I was not aware of this algorithm when I used it a few weeks ago. Sure, at this point I'm old enough that never pretend I invented something, so I was sure this was already used in the past, but well, in case it was not used for reciprocal links testing, this is a new interview questions you may want to use for advanced candidates.

hundredwattOP10mo ago

A neat trick to make the accumulator both collision-resistant and self-diagnosing.

  For every normalized link id x:
      y = (x << k) | h(x)   # append a k-bit hash to the id
      acc ^= y

If acc is zero, all links are reciprocal (same guarantee as before).

If acc is non-zero, split it back into (x', h'):

* Re-compute h(x').

* If it equals h', exactly one link is unpaired and x' tells you which one (or an astronomically unlikely collision). Otherwise there are >= 2 problems.

This has collision-resistance like the parent comment and adds the ability to pinpoint a single offending link without a second pass or a hash table.

jonathanlydall10mo ago

This was a go to interview question to be solved in C# at a place I worked at a while back which had developers allocated to projects working on pretty standard line of business systems.

The XOR solution was a valid answer, but not the only answer we would have happily accepted.

The interview question was chosen such that it's very easy to understand and quick to solve, meaning it would indicate the candidate knew at least the basics of programming in C#. Almost surprisingly, we actually had candidates applying for "senior" level positions who struggled with this.

It could be solved in a multitude of ways, e.g:

- XOR as above

- Use of a HashSet<int>

- Use for loop and List which contains a number and its count.

- Use LINQ to group the numbers or something and then find the one with the count.

As long as what they did worked, it was a "valid" answer, we could then often discuss the chosen solution with the candidate and see how they reacted when we let them know of other valid solutions.

It was really great for not being a "one clever trick" question and could act as a springboard to slightly deeper discussions into their technical thought processes and understanding.

dahcryn10mo ago

you are missing the most obvious one, no? Sum both lists and take the difference, that's the missing number, since the items are guaranteed unique

vbezhenar10mo ago

It is interesting for me to remember my very first programming task. The very first day I was introduced to programming with Pascal (I think I was 14), I was taught variables, assignments, arithmetic and was given a task to switch two variables (swap). I quickly solved it using third variable, but then I was asked to do it without third variable. It was very hard task for me, I spent few hours at home tackling it, but finally I solved it with a trick conceptually similar to XOR:

    a := a + b;
    b := a - b;
    a := a - b;

I'm still proud of little me and I always remember this solution when I encounter XOR tricks. I didn't knew about bitwise arithmetic at that time, but sometimes simple `+` can work just as well.

zeroq10mo ago

overflow

1 more reply

TacticalCoder10mo ago

> "You are given an array A of n - 1 integers"

It's an array of integers so it fits in memory (otherwise it wouldn't be called an array). As it fits in memory, n cannot be that big. I'd still ask for more requirements, TopCoder problem style: I want to know how big n can be that the array fits in memory.

I didn't know that XOR trick. My solution would be a bit arrays with n bits and two for loops: one to light each bit corresponding to a number and one for loop to find the missing number.

And if my bit array doesn't fit in memory, then neither does the array from the problem (and certainly not the HashSet etc.).

williamdclt10mo ago

You could make the problem harder with "you are given a stream of n - 1 integers". N could then be any number, unbound by available memory.

That makes the problem harder which makes it more interesting, a lot of the solutions wouldn't work anymore (this isn't necessarily a good interview question though)

1 more reply

jonathanlydall10mo ago

In our case we gave the list of numbers for the input which was around a dozen so memory was not a concern, again keeping the problem pretty simple.

praptak10mo ago

Fun fact: the xor swap fails when the variables are aliases. This was the trick used in one of the underhanded code competitions.

Basically xor swapping a[i] with a[j] triggered the evil logic when i was equal to j.

CodesInChaos10mo ago

The submission by David Wagner, Philipe Biondi at https://bingweb.binghamton.edu/~scraver/underhanded/_page_id...

The state of RC4 consists of a random permutation of bytes. Whenever it outputs a value, it further permutes the state by swapping some bytes of the state. Th xor swap trick sets one of these values to zero, whenever RC4 attempts to swap the same item within the permutation. This gradually zeros out the state, until RC4 outputs the plaintext.

vaylian10mo ago

It would set a[i] to zero instead of swapping two values, right?

praptak10mo ago

Yes. Now we only need a legit use case for code that swaps values only if they are in different locations, otherwise zeroes the aliased location. Then we can finally do it using the xor swap!

tromp10mo ago

It's funny how the author fails to apply the XOR trick in the two missing values problem:

> We can thus search for u by applying this idea to one of the partitions and finding the missing element, and then find v by applying it to the other partition.

Since you already have u^v, you need only search for u, which immediately gives you v.

ethan_smith10mo ago

Indeed - once you have u^v, finding u in one partition immediately gives you v = (u^v)^u, eliminating the need for the second search.

FabHK10mo ago

How can you find u? That's what the author explains next.

Arnavion10mo ago

The article says to use the "XOR of all elements" method to find u^v, then do the partitioning, then use the "XOR of all elements" method on the first partition to find u, then use the "XOR of all elements" method on the second partition to find v.

tromp is saying the last step can be simplified. There is no need to use the "XOR of all elements" method on the second partition to find v, since the earlier steps have given us u^v and u, so simply XORing those two values together gives v.

1 more reply

analog3110mo ago

The first thing that occurred to me is that if a number is missing from a list, the sum of that list will fall short. But I like XOR's.

anitil10mo ago

It really tickles my brain in a lovely way that it avoids all overflow risk as well

repiret10mo ago

There is no overflow risk. The trick works on any Abelian group. N-bit values form an Albanian group with xor where 0 is the identity and every element is its own inverse. But N-bit values also form an Abelian group under addition with overflow, where 0 is the identity and 2s-compliment is the inverse.

If you’re working on an architecture where a single multiplication and a bit shift is cheaper than N xor’s, and where xor, add, and sub are all the same cost, then you can get a performance win by computing the sum as N(N+1)/2; and you don’t need a blog post to understand why it works.

5 more replies

analog3110mo ago

True, I hadn't thought of that. I'm spoiled by Python. ;-)

meindnoch10mo ago

Sum and xor are the same, but over different fields.

iotasilly10mo ago

Since J allow you to write short code, here are three example in J. The first use iota1000, the second a random permutation, and the third use matrix notation to create a little guessing game.

Example 1: Find the missing number

  xor =: (16 + 2b0110) b.
  iota1000 =: (i. 1000) 
  missingNumber =: (xor/ iota1000) xor (xor/ iota1000 -. 129) 
  echo 'The missing number is ' , ": missingNumber

This print 'The missing number is 129'

Example 2: Using a random permutation, find the missing number.

   permuted =: (1000 ? 1000)
   missingNumber = (xor/ permuted) xor (xor/ permuted -. ? 1000)

Example 3: find the missing number in this matrix.

  _ (< 2 2) } 5 5 $ (25 ? 25) 

   12  9  1 20 19
    6 18  3  4  8
   24  7  _ 15 23
   11 21 10  2  5
    0 16 17 22 14

Final test: repeat 10 times the example 3 (random matrices) and collect the time it takes you to solve it in a list of times, then compute the linear regression best fit by

  times %. (1 ,. i. 10)

Did you get better at solving it by playing more times?

I am not affiliated with J, but in case you want to try some J code there is a playground: https://jsoftware.github.io/j-playground/bin/html2/

Edited: It seems I am procrastinating a lot about something I have to do but don't want to.

Pompidou10mo ago

the top comment trick [ n, 1, n + 1, 0 ][n % 4] can be implemented in J as following :

   f =: ]`1:`>:`0:@.(4&|)"0

Then:

   (,. ; #: ; [: #: f) i.16

 0    0 0 0 0    0 0 0 0    
 1    0 0 0 1    0 0 0 1    
 2    0 0 1 0    0 0 1 1    
 3    0 0 1 1    0 0 0 0    
 4    0 1 0 0    0 1 0 0    
 5    0 1 0 1    0 0 0 1    
 6    0 1 1 0    0 1 1 1    
 7    0 1 1 1    0 0 0 0    
 8    1 0 0 0    1 0 0 0    
 9    1 0 0 1    0 0 0 1

....

betasilly10mo ago

Hello fellow J programmer. In statistic you can estimate the population size by coloring fishes that you put in a lake and some time later you fish in that lake, the proportion of colored fishes allow you to estimate the size of the population. Four month ago I posted some J identities [1] and you were the only one which commented, that means that in this capture I only got one fish and it was the colored fish. This imply that there must be very few J programmers HN, or more precisely, very few of them that post J related material.

The parent's comment (also mine) has a style that was designed not to scare non J programmers. One should also consider that some people dislike J code so downvotes are the usual result except when the post provides some additional insight.

Finally, thank you for this small J lesson, is a pleasure to find here fellow J programmers.

[1] https://news.ycombinator.com/item?id=42859077

hsfzxjy10mo ago

To derive "The XOR trick" I think both *associativity* and communitativity are needed.

That is, one should also prove a ^ (b ^ c) = (a ^ b) ^ c. Instinctive, but non-trivial.

kavouras10mo ago

Yeah that's what I was thinking, you need both

OjotCewIo10mo ago

yep, you need both; and in fact the definition includes both: https://en.wikipedia.org/wiki/Abelian_group

akovaski10mo ago

The partitioning algorithm to find two missing/duplicate numbers is clever, I wouldn't have thought of that. It should also work if you have a list with 1 missing and 1 duplicate, yeah? You'd probably have to do an extra step to actually find out which number is missing and which is a duplicate after you find the two numbers.

> If more than two elements are missing (or duplicated), then analyzing the individual bits fails because there are several combinations possible for both 0 and 1 as results. The problem then seems to require more complex solutions, which are not based on XOR anymore.

If you consider XOR to be a little bit more general, I think you can still use something like the partitioning algorithm. That is to say, considering XOR on a bit level behaves like XOR_bit(a,b)=a+b%2, you might consider a generalized XOR_bit(a,b,k)=a+b%k. With this I think you can decide partitions with up to k missing numbers, but I'm too tired to verify/implement this right now.

woadwarrior0110mo ago

Anyone interested in bit-level tricks like this, should have a copy of Hacker's Delight on their bookshelf.

cyberax10mo ago

I think it misses the XOR trick for bi-directional lists: https://en.wikipedia.org/wiki/XOR_linked_list

They're really evil on modern CPUs.

nullc10mo ago

Generalizing an 'xor accumulator' support set difference of more than one element is interesting: https://github.com/bitcoin-core/minisketch

mrbluecoat10mo ago

PTSD for me on this topic due to a week wasted cleaning up PHP malware using XOR for obfuscation and encryption: https://www.godaddy.com/resources/news/php-malware-and-xor-e...

less_less10mo ago

Adding to some other comments in the thread: finding missing or extra numbers is closely related to error-correcting codes, especially binary linear codes. In an error-correcting code, you have a string of bits or symbols, with symbol x_i appearing at position i. You choose the code so that valid sequences have a certain mathematical property, and then if one or a few symbols are corrupted, then you can use that property to correct the errors. The property is typically that a certain linear function called the "syndrome" is zero, meaning that sum(x_i * G_i) = 0 where each G_i is some strategically chosen vector, particular to the code. The math for how to correct is particular to the chosen G_i, and it's a really interesting field of study.

In a typical error-correcting code usage, you have an encoder which takes your message, and adds some extra symbols at the end which are calculated so that the syndrome is zero. Then when receiving your message, the receiver calculates the syndrome and if it's not zero, they know that at least one error has occurred. By using the code's decoding algorithm, they can figure out the fewest (and thus hopefully most likely) number of changes which would result in that error syndrome, and use this information to (hopefully) correct the transmission error.

For the missing numbers problem, you can set x_i to "how many times does the number i appear?". Then since the syndrome is sum(x_i * G_i), you can compute the syndrome on an unordered list of the i's. You are expecting the syndrome to be the same as the syndrome of full set 1...n, so when it is not, you can figure out which few x_i's are wrong that would lead to the syndrome you observed. You have an advantage because you know how many numbers are missing, but it's only a slight one.

The author's solution is called the Hamming code: you set F(i) = i, and you do the additions by xoring. Using error-correcting codes generalize to more missing numbers as well, including using xor, but the math becomes more complicated: you would want to use a fancier code such as a BCH or Goppa code. These also use xor, but in more complicated ways.

burnt-resistor10mo ago

In ye olden days, bit manip operations were faster than algebraic operations.

And sometimes even faster than a load immediate, hence XOR AX, AX instead of MOV AX, 0.

GuB-4210mo ago

"xor ax, ax" is still in use today. The main advantage is that it is shorter, just 2 bytes instead of 3 for the immediate, the difference is bigger in 32 and 64 bit mode as you have to have all these zeroes in the instruction.

Shorter usually mean faster, even if the instruction itself isn't faster.

sparkie10mo ago

In long mode, compilers will typically emit `xor eax, eax`, as it only needs 2 bytes: The opcode and modrm byte. `xor ax, ax` takes 3 bytes due to the operand size override prefix (0x66), and `xor rax, rax` takes 3 bytes due to the REX.W prefix. `xor eax, eax` will still clear the full 64-bit register.

Shorter basically means you can fit more in instruction cache, which should in theory improve performance marginally.

1 more reply

tyfighter10mo ago

Modern x86 implementations don't even do the XOR. It just renames the register to "zero".

burnt-resistor10mo ago

Barely. x86 is fading. Arm doesn't do this in GCC or Clang.

> Shorter usually means faster

It depends, so spouting generalities doesn't mean anything. Instruction cache line filling vs. cycle reduction vs. reservation station ordering is typically a compiler constraints optimization problem(s).

1 more reply

heisenbit10mo ago

And in these modern days it matters that an algorithm can use divide and conquer and can be parallelized. Xor plays nice here. Also the lack of carry bits and less branching help in the crypto space.

Straw10mo ago

One can generalize this to k missing numbers the same way as we typically do for the addition case by using finite fields:

XOR is equivalent to addition over the finite field F_2^m. So, in this field, we're calculating the sum. If we have two numbers missing, we calculate the sum and sum of squares, so we know:

x + y

x^2 + y^2

From which we can solve for x and y. (Note all the multiplications are Galois Field multiplications, not integer!)

Similarly for k numbers we calculate sums of higher powers and get a higher order polynomial equation that gives our answer. Of course, the same solution works over the integers and I'd imagine modular arithmetic as well (I haven't checked though).

less_less10mo ago

This will depend on the field, and for F_2^m you want odd powers: sum(x), sum(x^3), sum(x^5) etc. Using sum(x^2) won't help because squaring over F_2^m is a field homomorphism, meaning that sum(x^2) = sum(x)^2.

This is also how BCH error-correction codes work (see https://en.wikipedia.org/wiki/BCH_code): a valid BCH codeword has sum(x^i where bit x is set in the codeword) = 0 for t odd powers i=1,3,5, ... Then if some bits get flipped, you will get a "syndrome" s_i := sum(x^i where bit x was flipped) for those odd powers. Solving from the syndrome to get the indices of the flipped bits is the same problem as here.

The general decoding algorithm is a bit involved, as you can see in the Wikipedia article, but it's not horribly difficult:

  • First, extend the syndrome: it gives sum(x^i) for odd i, but you can compute the even powers s_2i = s_i^2.

  • The syndrome is a sequence of field values s_i, but we can imagine it as a "syndrome polynomial" S(z) := sum(s_i z^i).  This is only a conceptual step, not a computational one.

  • We will find a polynomial L(z) which is zero at all errors z=x and nowhere else.  This L is called a "locator" polynomial.  It turns out (can be checked with some algebra) that L(z) satisfies a "key equation" where certain terms of L(z) * S(z) are zero.  The key equation is (almost) linear: solve it with linear algebra (takes cubic time in the number of errors), or solve it faster with the Berlekamp-Massey algorithm (quadratic time instead, maybe subquadratic if you're fancy).

  • Find the roots of L(z).  There are tricks for this if its degree is low.  If the degree is high then you usually just iterate over the field.  This takes O(#errors * size of domain) time.  It can be sped up by a constant factor using Chien's search algorithm, or by a logarithmic factor using an FFT or AFFT.

You can of course use a different error-correcting code if you prefer (e.g. binary Goppa codes).

Edit: bullets are hard.

Further edit just to note: the "^" in the above text refers to powers over the finite field, not the xor operator.

nullc10mo ago

Yesterday I linked to an implementation (with complexity quadratic in the number of errors) I helped to create in another comment in this thread.

> constant factor using Chien's search algorithm

Chien's search is only really reasonable for small field sizes... which I think doesn't really make sense in this application, where the list is long and the missing elements are relatively few.

Fortunately in characteristic 2 it's quite straight forward and fast to just factor the polynomial using the berlekamp trace algorithm.

1 more reply

Straw10mo ago

Good catch, thank you!

noman-land10mo ago

Can you explain a bit about how and why the higher powers work?

less_less10mo ago

If you imagine a polynomial L(z) that's zero at all the missing numbers, you can expand the coefficients out. For example, with 2 missing numbers (x,y), you have:

   L(z) = z^2 - (x+y)z + xy.

You already have x+y, but what's xy? You can compute it as ((x+y)^2 - (x^2 + y^2))/2. This technique generalizes to higher powers, though I forget the exact details: basically you can generate the coefficients of L from the sums of powers with a recurrence.

Then you solve for the roots of L, either using your finite field's variant of the quadratic formula, or e.g. just by trying everything in the field.

* But wait, this doesn't actually work! *

Over fields of small characteristic, such as F_2^m, you need to modify the approach and use different powers. For example, in the equations above, I divided by 2. But over F_2^m in the example shown above, you cannot divide by 2, since 2=0. In fact, you cannot solve for (x,y) at all with only x+y and x^2 + y^2, because

  (x+y)^2   =   x^2 + y^2 + 2xy   =   x^2 + y^2 + 0xy (since 2=0)   =   x^2 + y^2

So having that second polynomial gives you no new information. So you need to use other powers such as cubes (a BCH code), or some other technique (e.g. a Goppa code). My sibling comment to yours describes the BCH case.

moron4hire10mo ago

Why do people hate traditional for loops so much? In a conversation about petty micro optimizations, we end up performing two loops instead of one, all because sticking three operations in one statement is "yucky"?

ToValueFunfetti10mo ago

xor(1..n) = switch(n % 4) { case 0: return n; case 1: return 1; case 2: return n + 1; default: return 0; }

So you don't actually need the first loop (at least for the set of integers 1..n example), but bringing that up is probably out of scope for this article.

delifue10mo ago

Its main benefit is to avoid having extra data structure (like hash map) to find the missing or duplicate, using O(n) time and O(1) space.

moron4hire10mo ago

No, again, that's not my point. The code from the article is O(2n) when it could be O(n). I know we're not supposed to care about constant factors, but I've lived in a world where not micro optimizing the ever loving shit out of my software could potentially make people throw up, so this sort of stuff kind of stands out to me.

3 more replies

devjab10mo ago

I think you raise a good question, but Python doesn't have a traditional for loop. To do it in one loop, you'd either have to simulate a traditional for loop with something like range, or you'd have to build a c/zig/rust lib and use it with cffi (or whatever rust uses that I forgot what was named). Or you're going to do it the "pythonic" way and write two loops, probably with a generator. As far as micro optimisation I'd argue that it depends on what you want. Speed or stable memory consumption? The single loop will be faster (for the most part) but the flip side is that there is a limit on how big of a data set it can handle.

It's all theoretical though. On real world data sets that aren't small I don't see why you wouldn't hand these tasks off to C/Zig/Rust unless you're only running them once or twice.

anitil10mo ago

I think it's just an interesting approach to solving particular limited problems. If I needed to solve this I'd end up either using set arithmetic or sorting the list, both of which use more memory and time. Maybe down low in some compiler loop or JVM loop this could be the difference between a sluggish application and a snappy one

moron4hire10mo ago

That's not my point. My point is that the exact same code from the original article could be done in a single, traditional for-loop, instead of two for-each loops.

1 more reply

ameliaquining10mo ago

Ah, my least favorite technical interview question. (I've been asked it, but only after I first read about it online.)

phendrenad210mo ago

Indeed, it kind of feels like asking if someone knows what the number 5318008 means.

anthomtb10mo ago

Horses for courses.

It's silly to as ask a web dev these questions and expect these XOR approaches.

Low-level developers ("bare metal" as the kids say), on the other hand? They should have a deep enough understanding of binary representation and bitwise operations to approach these problems with logic gates.

motorest10mo ago

> Ah, my least favorite technical interview question.

The epitome of turning technical interviews into a trivia contest to make them feel smart. Because isn't that the point of a tech interview?

matusp10mo ago

Is there any other field where they give you random brain teasers for an interview? My friends outside of IT were laughing their heads off when they hears about the usual interview process.

1 more reply

ur-whale10mo ago

In what way is that question trivia?

I believe you under-estimate what a good interviewer is trying to do with questions such as these:

Either you've seen the trick before and you get an opportunity to show the interviewer that you're an honest person by telling him you have. Huge plus and the interview can move on to other topics.

Either you haven't and you can demonstrate to the interviewer your analytical skills by dissecting the problem step by step and understanding what the code actually does and how.

Bonus if you can see the potential aliasing problem when used to swap two variables.

Not a trivia question at all.

3 more replies

cubefox10mo ago

Pet peeve: It is common to describe XOR as a special logical operator ("either or"), but it is arguably easier to just describe it as ≠ (!=, not equal) for Boolean inputs.

However, then it is clearly still easier to just phrase everything in terms of = (equality) instead!

Equality is for binary inputs is also called XNOR, biconditional, iff, ↔, etc, which is the negation of XOR. But thinking of it immediately as "=" is much more straightforward.

Another advantage of = over ≠/xor is that equality is not just commutative and associative, it's intuitively obvious that it is associative. The associativity of ≠/xor is less obvious. Moreover, equality is also transitive, unlike inequality/xor.

Overall, equality seems a much more natural concept to reason with, yet I don't know of any languages which have a bitwise equality/XNOR/↔ operator, i.e. one that operates on integers rather than Booleans.

mzs10mo ago

I like the 'store prev ^ next' trick for lists that can be walked from the front or from the back.

mytailorisrich10mo ago

> XOR on the same argument: x ^ x = 0

For those who do/did assembly, this is the common way to set a register to zero in x86 assembly (probably not only) because the instruction does not need an operand, so is shorter, and executes in one cycle only.

gblargg10mo ago

Gray code is something semi-related. For hardware encoders of a position you want only one transition between states, that is, the XOR of the two to have only one bit set. Normal binary has multiple transitions between some values (e.g. three bit changes between 011 and 100). Gray code could be 000, 001, 011, 010, 110, 111, 101, 100.

anthk10mo ago

https://en.wikipedia.org/wiki/Hamming_distance

Findecanor10mo ago

I figured out the solution of using addition directly. A caveat with addition is that addition can grow the number of significant bits needed, and thus overflow (for large-enough values of n).

One aspect of XOR is that it is the same as binary addition without carry, and therefore it does not overflow.

gblargg10mo ago

Use unsigned (modulo) and overflow doesn't affect the result.

XeO310mo ago

Apart from these applications of XOR, a favourite one is using Bitwise AND to find Even/Odd numbers.

ur-whale10mo ago

One interesting problem related to the trick (which as pointed out elsewhere in the thread, fails spectacularly when the two variables alias to the same memory location) is to find other dyadic functions of integers that have the same property.

canyp10mo ago

The more interesting xor trick beyond interview trivia is the xor list:

https://en.wikipedia.org/wiki/XOR_linked_list

st0le10mo ago

Another fun trick I've discovered.

`XOR[0...n] = 0 ^ 1 .... ^ n = [n, 1, n + 1, 0][n % 4]`

nullc10mo ago

Tables yuck :P, maybe

XOR[0...x] = (x&1^(x&2)>>1)+x*(~x&1)

bsdz10mo ago

~Is there a simple proof for this type of identity?~

Actually I found something through Gemini based on the table mod 4 idea in previous post. Thanks.

tialaramex10mo ago

Right, or in summary, no you don't need to all that extra work up front.

makeset10mo ago

Fun fact: you can show that there is another binary operator that performs the same triple assignment swap.

gciruelos10mo ago

a shameless plug of another blog post containing more xor trivia, for those interested: https://gciruelos.com/xor.html

daitangio10mo ago

Very well written article! I used xor just as fast clear register :)

lsllc10mo ago

Yes! in the old MS-DOS days (circa 286?), it was quicker in terms of cycles to do:

  xor ax, ax

Than:

  mov ax, 0h

ZoomZoomZoom10mo ago

> XOR is commutative, meaning we can change the order in which we apply XOR. To prove this, we can check the truth table for both x ^ y and y ^ x

This is nonsensical, where does the second truth table come from? Instead you just observe that, by definition, 1^0 == 0^1.

johnea10mo ago

Wow! This is a flashback! Hope you're doing well Andy W!

TZubiri10mo ago

Tldr. I'm sorting that array all day baby

j / k navigate · click thread line to collapse

130 comments

danbruc10mo ago

For calculating the XOR of 1 to n there is a closed form solution, so no need to XOR them together in a loop.

  (n & ((n & 1) - 1)) + ((n ^ (n >> 1)) & 1)

Or a much more readable version

  [ n, 1, n + 1, 0 ][n % 4]

which makes it clear that this function cycles through a pattern of length four.

sdenton410mo ago

Nice!

Good to know there's a similar speedup available on the xor path...

tomtomtom77710mo ago

Fascinating. It can see it work but I still can't really wrap my head around where the magic cycle length of 4 comes from.

danbruc10mo ago

Combining two consecutive integers starting with an even one yields one.

  xxxxxxx0 ^ xxxxxxx1 = 00000001

If we start at a number divisible by four and do this twice, we get one twice.

  xxxxxx00 ^ xxxxxx01 = 00000001
  xxxxxx10 ^ xxxxxx11 = 00000001

And combining the two of course yields zero and we are right back at the start.

betasilly10mo ago

Another interesting fact is that each time you make the xor of four consecutive numbers, beginning with an even number, the result is zero. Example in J.

  xor =: (16 + 2b0110) b.
  f =: 3 : 'xor/ y + i. 4'
  f"0 ] 2 * 1 + i. 100

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Summing a hundred millions: +/ f"0 ] 2 * i 100000000 gives zero (it takes a few seconds). So it seems the stated property holds for every even n.

1 more reply

NickPollard10mo ago

There are essentially two bits of information in the 'state' of this iterated algorithm: a) Are all the non-lowest bits zero, or are they the value of the latest N b) the value of the lowest bit

So the cycle of (N, 1, N+3, 0) corresponds to (A) and (B) being: (0,0), (0,1), (1,1), (1, 0) - i.e. the 4 possible combinations of these states.

HappyPanacea10mo ago

1 more reply

Thorrez10mo ago

In your array-based equation, you say n+1, but in your explanation you say n+3. Is that a mistake?

danbruc10mo ago

1 more reply

skullt10mo ago

There's a bit of a trick in that solution: n is assumed to have the lower two bits clear so for an arbitrary n the array would really be:

[(n & ~3), 1, (n & ~3) + 3, 0][n % 4]

antirez10mo ago

Now, combine this with another fact: for speed concerns, Redis vector sets are not serialized as

    element -> vector

hundredwattOP10mo ago

A neat trick to make the accumulator both collision-resistant and self-diagnosing.

  For every normalized link id x:
      y = (x << k) | h(x)   # append a k-bit hash to the id
      acc ^= y

If acc is zero, all links are reciprocal (same guarantee as before).

If acc is non-zero, split it back into (x', h'):

* Re-compute h(x').

* If it equals h', exactly one link is unpaired and x' tells you which one (or an astronomically unlikely collision). Otherwise there are >= 2 problems.

This has collision-resistance like the parent comment and adds the ability to pinpoint a single offending link without a second pass or a hash table.

jonathanlydall10mo ago

This was a go to interview question to be solved in C# at a place I worked at a while back which had developers allocated to projects working on pretty standard line of business systems.

The XOR solution was a valid answer, but not the only answer we would have happily accepted.

It could be solved in a multitude of ways, e.g:

- XOR as above

- Use of a HashSet<int>

- Use for loop and List which contains a number and its count.

- Use LINQ to group the numbers or something and then find the one with the count.

As long as what they did worked, it was a "valid" answer, we could then often discuss the chosen solution with the candidate and see how they reacted when we let them know of other valid solutions.

It was really great for not being a "one clever trick" question and could act as a springboard to slightly deeper discussions into their technical thought processes and understanding.

dahcryn10mo ago

you are missing the most obvious one, no? Sum both lists and take the difference, that's the missing number, since the items are guaranteed unique

vbezhenar10mo ago

    a := a + b;
    b := a - b;
    a := a - b;

I'm still proud of little me and I always remember this solution when I encounter XOR tricks. I didn't knew about bitwise arithmetic at that time, but sometimes simple `+` can work just as well.

zeroq10mo ago

overflow

1 more reply

TacticalCoder10mo ago

> "You are given an array A of n - 1 integers"

I didn't know that XOR trick. My solution would be a bit arrays with n bits and two for loops: one to light each bit corresponding to a number and one for loop to find the missing number.

And if my bit array doesn't fit in memory, then neither does the array from the problem (and certainly not the HashSet etc.).

williamdclt10mo ago

You could make the problem harder with "you are given a stream of n - 1 integers". N could then be any number, unbound by available memory.

That makes the problem harder which makes it more interesting, a lot of the solutions wouldn't work anymore (this isn't necessarily a good interview question though)

1 more reply

jonathanlydall10mo ago

In our case we gave the list of numbers for the input which was around a dozen so memory was not a concern, again keeping the problem pretty simple.

praptak10mo ago

Fun fact: the xor swap fails when the variables are aliases. This was the trick used in one of the underhanded code competitions.

Basically xor swapping a[i] with a[j] triggered the evil logic when i was equal to j.

CodesInChaos10mo ago

The submission by David Wagner, Philipe Biondi at https://bingweb.binghamton.edu/~scraver/underhanded/_page_id...

vaylian10mo ago

It would set a[i] to zero instead of swapping two values, right?

praptak10mo ago

Yes. Now we only need a legit use case for code that swaps values only if they are in different locations, otherwise zeroes the aliased location. Then we can finally do it using the xor swap!

tromp10mo ago

It's funny how the author fails to apply the XOR trick in the two missing values problem:

> We can thus search for u by applying this idea to one of the partitions and finding the missing element, and then find v by applying it to the other partition.

Since you already have u^v, you need only search for u, which immediately gives you v.

ethan_smith10mo ago

Indeed - once you have u^v, finding u in one partition immediately gives you v = (u^v)^u, eliminating the need for the second search.

FabHK10mo ago

How can you find u? That's what the author explains next.

Arnavion10mo ago

1 more reply

analog3110mo ago

The first thing that occurred to me is that if a number is missing from a list, the sum of that list will fall short. But I like XOR's.

anitil10mo ago

It really tickles my brain in a lovely way that it avoids all overflow risk as well

repiret10mo ago

5 more replies

analog3110mo ago

True, I hadn't thought of that. I'm spoiled by Python. ;-)

meindnoch10mo ago

Sum and xor are the same, but over different fields.

iotasilly10mo ago

Since J allow you to write short code, here are three example in J. The first use iota1000, the second a random permutation, and the third use matrix notation to create a little guessing game.

Example 1: Find the missing number

  xor =: (16 + 2b0110) b.
  iota1000 =: (i. 1000) 
  missingNumber =: (xor/ iota1000) xor (xor/ iota1000 -. 129) 
  echo 'The missing number is ' , ": missingNumber

This print 'The missing number is 129'

Example 2: Using a random permutation, find the missing number.

   permuted =: (1000 ? 1000)
   missingNumber = (xor/ permuted) xor (xor/ permuted -. ? 1000)

Example 3: find the missing number in this matrix.

  _ (< 2 2) } 5 5 $ (25 ? 25) 

   12  9  1 20 19
    6 18  3  4  8
   24  7  _ 15 23
   11 21 10  2  5
    0 16 17 22 14

Final test: repeat 10 times the example 3 (random matrices) and collect the time it takes you to solve it in a list of times, then compute the linear regression best fit by

  times %. (1 ,. i. 10)

Did you get better at solving it by playing more times?

I am not affiliated with J, but in case you want to try some J code there is a playground: https://jsoftware.github.io/j-playground/bin/html2/

Edited: It seems I am procrastinating a lot about something I have to do but don't want to.

Pompidou10mo ago

the top comment trick [ n, 1, n + 1, 0 ][n % 4] can be implemented in J as following :

   f =: ]`1:`>:`0:@.(4&|)"0

Then:

   (,. ; #: ; [: #: f) i.16

 0    0 0 0 0    0 0 0 0    
 1    0 0 0 1    0 0 0 1    
 2    0 0 1 0    0 0 1 1    
 3    0 0 1 1    0 0 0 0    
 4    0 1 0 0    0 1 0 0    
 5    0 1 0 1    0 0 0 1    
 6    0 1 1 0    0 1 1 1    
 7    0 1 1 1    0 0 0 0    
 8    1 0 0 0    1 0 0 0    
 9    1 0 0 1    0 0 0 1

....

betasilly10mo ago

Finally, thank you for this small J lesson, is a pleasure to find here fellow J programmers.

[1] https://news.ycombinator.com/item?id=42859077

hsfzxjy10mo ago

To derive "The XOR trick" I think both *associativity* and communitativity are needed.

That is, one should also prove a ^ (b ^ c) = (a ^ b) ^ c. Instinctive, but non-trivial.

kavouras10mo ago

Yeah that's what I was thinking, you need both

OjotCewIo10mo ago

yep, you need both; and in fact the definition includes both: https://en.wikipedia.org/wiki/Abelian_group

akovaski10mo ago

woadwarrior0110mo ago

Anyone interested in bit-level tricks like this, should have a copy of Hacker's Delight on their bookshelf.

cyberax10mo ago

I think it misses the XOR trick for bi-directional lists: https://en.wikipedia.org/wiki/XOR_linked_list

They're really evil on modern CPUs.

nullc10mo ago

Generalizing an 'xor accumulator' support set difference of more than one element is interesting: https://github.com/bitcoin-core/minisketch

mrbluecoat10mo ago

PTSD for me on this topic due to a week wasted cleaning up PHP malware using XOR for obfuscation and encryption: https://www.godaddy.com/resources/news/php-malware-and-xor-e...

less_less10mo ago

burnt-resistor10mo ago

In ye olden days, bit manip operations were faster than algebraic operations.

And sometimes even faster than a load immediate, hence XOR AX, AX instead of MOV AX, 0.

GuB-4210mo ago

Shorter usually mean faster, even if the instruction itself isn't faster.

sparkie10mo ago

Shorter basically means you can fit more in instruction cache, which should in theory improve performance marginally.

1 more reply

tyfighter10mo ago

Modern x86 implementations don't even do the XOR. It just renames the register to "zero".

burnt-resistor10mo ago

Barely. x86 is fading. Arm doesn't do this in GCC or Clang.

> Shorter usually means faster

1 more reply

heisenbit10mo ago

And in these modern days it matters that an algorithm can use divide and conquer and can be parallelized. Xor plays nice here. Also the lack of carry bits and less branching help in the crypto space.

Straw10mo ago

One can generalize this to k missing numbers the same way as we typically do for the addition case by using finite fields:

XOR is equivalent to addition over the finite field F_2^m. So, in this field, we're calculating the sum. If we have two numbers missing, we calculate the sum and sum of squares, so we know:

x + y

x^2 + y^2

From which we can solve for x and y. (Note all the multiplications are Galois Field multiplications, not integer!)

less_less10mo ago

The general decoding algorithm is a bit involved, as you can see in the Wikipedia article, but it's not horribly difficult:

  • First, extend the syndrome: it gives sum(x^i) for odd i, but you can compute the even powers s_2i = s_i^2.

  • The syndrome is a sequence of field values s_i, but we can imagine it as a "syndrome polynomial" S(z) := sum(s_i z^i).  This is only a conceptual step, not a computational one.

  • We will find a polynomial L(z) which is zero at all errors z=x and nowhere else.  This L is called a "locator" polynomial.  It turns out (can be checked with some algebra) that L(z) satisfies a "key equation" where certain terms of L(z) * S(z) are zero.  The key equation is (almost) linear: solve it with linear algebra (takes cubic time in the number of errors), or solve it faster with the Berlekamp-Massey algorithm (quadratic time instead, maybe subquadratic if you're fancy).

  • Find the roots of L(z).  There are tricks for this if its degree is low.  If the degree is high then you usually just iterate over the field.  This takes O(#errors * size of domain) time.  It can be sped up by a constant factor using Chien's search algorithm, or by a logarithmic factor using an FFT or AFFT.

You can of course use a different error-correcting code if you prefer (e.g. binary Goppa codes).

Edit: bullets are hard.

Further edit just to note: the "^" in the above text refers to powers over the finite field, not the xor operator.

nullc10mo ago

Yesterday I linked to an implementation (with complexity quadratic in the number of errors) I helped to create in another comment in this thread.

> constant factor using Chien's search algorithm

Chien's search is only really reasonable for small field sizes... which I think doesn't really make sense in this application, where the list is long and the missing elements are relatively few.

Fortunately in characteristic 2 it's quite straight forward and fast to just factor the polynomial using the berlekamp trace algorithm.

1 more reply

Straw10mo ago

Good catch, thank you!

noman-land10mo ago

Can you explain a bit about how and why the higher powers work?

less_less10mo ago

If you imagine a polynomial L(z) that's zero at all the missing numbers, you can expand the coefficients out. For example, with 2 missing numbers (x,y), you have:

   L(z) = z^2 - (x+y)z + xy.

Then you solve for the roots of L, either using your finite field's variant of the quadratic formula, or e.g. just by trying everything in the field.

* But wait, this doesn't actually work! *

  (x+y)^2   =   x^2 + y^2 + 2xy   =   x^2 + y^2 + 0xy (since 2=0)   =   x^2 + y^2

moron4hire10mo ago

ToValueFunfetti10mo ago

xor(1..n) = switch(n % 4) { case 0: return n; case 1: return 1; case 2: return n + 1; default: return 0; }

So you don't actually need the first loop (at least for the set of integers 1..n example), but bringing that up is probably out of scope for this article.

delifue10mo ago

Its main benefit is to avoid having extra data structure (like hash map) to find the missing or duplicate, using O(n) time and O(1) space.

moron4hire10mo ago

3 more replies

devjab10mo ago

It's all theoretical though. On real world data sets that aren't small I don't see why you wouldn't hand these tasks off to C/Zig/Rust unless you're only running them once or twice.

anitil10mo ago

moron4hire10mo ago

That's not my point. My point is that the exact same code from the original article could be done in a single, traditional for-loop, instead of two for-each loops.

1 more reply

ameliaquining10mo ago

Ah, my least favorite technical interview question. (I've been asked it, but only after I first read about it online.)

phendrenad210mo ago

Indeed, it kind of feels like asking if someone knows what the number 5318008 means.

anthomtb10mo ago

Horses for courses.

It's silly to as ask a web dev these questions and expect these XOR approaches.

motorest10mo ago

> Ah, my least favorite technical interview question.

The epitome of turning technical interviews into a trivia contest to make them feel smart. Because isn't that the point of a tech interview?

matusp10mo ago

Is there any other field where they give you random brain teasers for an interview? My friends outside of IT were laughing their heads off when they hears about the usual interview process.

1 more reply

ur-whale10mo ago

In what way is that question trivia?

I believe you under-estimate what a good interviewer is trying to do with questions such as these:

Either you've seen the trick before and you get an opportunity to show the interviewer that you're an honest person by telling him you have. Huge plus and the interview can move on to other topics.

Either you haven't and you can demonstrate to the interviewer your analytical skills by dissecting the problem step by step and understanding what the code actually does and how.

Bonus if you can see the potential aliasing problem when used to swap two variables.

Not a trivia question at all.

3 more replies

cubefox10mo ago

Pet peeve: It is common to describe XOR as a special logical operator ("either or"), but it is arguably easier to just describe it as ≠ (!=, not equal) for Boolean inputs.

However, then it is clearly still easier to just phrase everything in terms of = (equality) instead!

Equality is for binary inputs is also called XNOR, biconditional, iff, ↔, etc, which is the negation of XOR. But thinking of it immediately as "=" is much more straightforward.

mzs10mo ago

I like the 'store prev ^ next' trick for lists that can be walked from the front or from the back.

mytailorisrich10mo ago

> XOR on the same argument: x ^ x = 0

gblargg10mo ago

anthk10mo ago

https://en.wikipedia.org/wiki/Hamming_distance

Findecanor10mo ago

I figured out the solution of using addition directly. A caveat with addition is that addition can grow the number of significant bits needed, and thus overflow (for large-enough values of n).

One aspect of XOR is that it is the same as binary addition without carry, and therefore it does not overflow.

gblargg10mo ago

Use unsigned (modulo) and overflow doesn't affect the result.

XeO310mo ago

Apart from these applications of XOR, a favourite one is using Bitwise AND to find Even/Odd numbers.

ur-whale10mo ago

canyp10mo ago

The more interesting xor trick beyond interview trivia is the xor list:

https://en.wikipedia.org/wiki/XOR_linked_list

st0le10mo ago

Another fun trick I've discovered.

`XOR[0...n] = 0 ^ 1 .... ^ n = [n, 1, n + 1, 0][n % 4]`

nullc10mo ago

Tables yuck :P, maybe

XOR[0...x] = (x&1^(x&2)>>1)+x*(~x&1)

bsdz10mo ago

~Is there a simple proof for this type of identity?~

Actually I found something through Gemini based on the table mod 4 idea in previous post. Thanks.

tialaramex10mo ago

Right, or in summary, no you don't need to all that extra work up front.

makeset10mo ago

Fun fact: you can show that there is another binary operator that performs the same triple assignment swap.

gciruelos10mo ago

a shameless plug of another blog post containing more xor trivia, for those interested: https://gciruelos.com/xor.html

daitangio10mo ago

Very well written article! I used xor just as fast clear register :)

lsllc10mo ago

Yes! in the old MS-DOS days (circa 286?), it was quicker in terms of cycles to do:

  xor ax, ax

Than:

  mov ax, 0h

ZoomZoomZoom10mo ago

> XOR is commutative, meaning we can change the order in which we apply XOR. To prove this, we can check the truth table for both x ^ y and y ^ x

This is nonsensical, where does the second truth table come from? Instead you just observe that, by definition, 1^0 == 0^1.

johnea10mo ago

Wow! This is a flashback! Hope you're doing well Andy W!

TZubiri10mo ago

Tldr. I'm sorting that array all day baby

j / k navigate · click thread line to collapse