All recordings were successfully compressed. Original size (bytes): 146,800,526 Compressed size (bytes): 123,624 Compression ratio: 1187.47
The eval.sh script was downloaded, and the files were decode and encode without loss, as verified using the "diff" function.
What do you think? Is this true?
https://www.linkedin.com/pulse/neuralink-compression-challen... context: https://www.youtube.com/watch?v=X5hsQ6zbKIo
Until this A/D linearity problem is fixed, there is no point pursuing compression schemes. The data is so badly mangled it makes it pretty near impossible to find patterns.
As a trivial example, if your dataset is one trillion binary digits of pi, it is essentially incompressible by any regular compressor, but you can fit a generator well under 1 kB.
Why didn't every other company think of this?
Yup:
"Submit with source code and build script."
But hey, the reward is a job. Maybe.
I mean, not everyone can be privileged enough to experience Ultra Hardcore™ toxic work culture.
The sample data compresses poorly, getting down to 4.5 bits per sample easily with very simple first-order difference encoding and an decent Huffman coder.
However, lets assume there is massive cross-correlation between the 1024 channels. For example, in the extreme they are all the same, meaning if we encode 1 channel we get the other 1023. That means a lower limit of 4.5/1024 = about 0.0045 bits per sample, or a compression rate of 2275. Viola!
If data patterns exist and can be found, then more complicated coding algorithms could achieve better compression, or tolerate more variations (i.e. less cross-correlation) between channels.
We may never know unless Neuralink releases a full data set, i.e. 1024 channels at 20KHz and 10 bits for 1 hour. That's a lot of data, but if they want serious analysis they should release serious data.
Finally, enforcing the requirement for lossless compression has no apparent reason. The end result -- correct data to control the cursor and so on -- is the key. Neuralink should allow challengers to submit DATA to a test engine that compares cursor output for noiseless data to results for the submitted data, and reports the match score, and maybe a graph or something. That sort of feedback might allow participants to create a satisfactory lossy compression scheme.
It's 2275X
That's the compression ratio for complete cross correlation. It's (10 bits uncompressed / 4.5 bits compressed on 1 channel) * 1024 channels
I'm all for challenges, but it is fairly standard to have prizes.
Here’s why: https://x.com/raffi_hotter/status/1795910298936705098
https://x.com/JohnSmi48253239/status/1794328213923188949?t=_...
Does it mean radio is using portion of this 10mW? If so, how much?