To be pedantic, not guaranteed. The xor of multiple elements may erroneously have a passing checksum, resulting in an undetected false decode. You can make the probability of this as low as you like by using a larger checksum, but the initial HN example was IIRC a list of 32-bit integers, so using a (say) 128 bit checksum to make false decodes 'cryptographically unlikely' would come at a rather high cost since it's a size added to each bucket.
If your sent members are multi-kilobyte contact database entries or something than the overhead required to make false decode impossible would be insignificant.
This limitation also applies somewhat to the alternative algebraic approach in my comments-- an overfull sketch could be falsely decoded--, except the added size needed to make a false decode cryptographically unlikely is very small and goes down relative to the size of the sketch as the sketch grows instead of being linear in the size of the sketch.
I haven't looked at your implementation but it can be useful to have at least 1 bit counter or just make the LSB of your checksum always 1. Doing so prevents falsely decoding an overfull bucket with an even number of members in it, and since the distribution of members to bucket is binomial 2 is an extremely common number for overfull buckets. You can use a counter bigger than 1 bit (and combine it with addition in its ring rather than xor), but the tradeoff vs just having more checksum bits is less obvious.
It's probably an interesting open question about the existence of checksums such that the xor of 2..N valid codewords is unlikely to be a valid codeword... the "always emit 1" function has perfect performance for even values but are there schemes that still contribute distance even in cases were the N isn't completely precluded?