All those class variables are already in __slots__ so in theory it shouldnt matter. Your advice is good
self.shift_index -= 16
shift_byte = (self.shift >> self.shift_index) & 0x5555
shift_byte = (shift_byte + (shift_byte >> 1)) & 0x3333
shift_byte = (shift_byte + (shift_byte >> 2)) & 0x0F0F
self.shift_byte = (shift_byte + (shift_byte >> 4)) & 0x00FF
but only for exactly 2-4 milliseconds per 1 million pulses :) Declaring local variable in a tight loop forces Python into a cycle of memory allocations and garbage collection negative potential gains :(
SWAR : 0.288 seconds -> 0.33 MiB/s
SWAR local : 0.284 seconds -> 0.33 MiB/s
This whole snipped is maybe what 50-100 x86 opcodes? Native code runs at >100MB/s while Python 3.14 struggles around 300KB/s. Python 3.4 (Sigrok hardcoded requirement) is even worse:
SWAR : 0.691 seconds -> 0.14 MiB/s
SWAR local : 0.648 seconds -> 0.14 MiB/s
You can try your luck
https://github.com/raszpl/sigrok-disk/tree/main/benchmarks I will appreciate Pull requests if anyone manages to speed this up. I give up at ~2 seconds per one RLL HDD track.
This is what I get right now decoding single tracks on i7-4790 platform:
fdd_fm.sr 0.9385 seconds
fdd_mfm.sr 1.4774 seconds
fdd_fm.sr 0.8711 seconds
fdd_mfm.sr 1.2547 seconds
hdd_mfm_RQDX3.sr 1.9737 seconds
hdd_mfm_RQDX3.sr 1.9749 seconds
hdd_mfm_AMS1100M4.sr 1.4681 seconds
hdd_mfm_WD1003V-MM2.sr 1.8142 seconds
hdd_mfm_WD1003V-MM2_int.sr 1.8067 seconds
hdd_mfm_EV346.sr 1.8215 seconds
hdd_rll_ST21R.sr 1.9353 seconds
hdd_rll_WD1003V-SR1.sr 2.1984 seconds
hdd_rll_WD1003V-SR1.sr 2.2085 seconds
hdd_rll_WD1003V-SR1.sr 2.2186 seconds
hdd_rll_WD1003V-SR1.sr 2.1830 seconds
hdd_rll_WD1003V-SR1.sr 2.2213 seconds
HDD_11tracks.sr 17.4245 seconds <- 11 tracks, 6 RLL + 5 MFM interpreted as RLL
HDD_11tracks.sr 12.3864 seconds <- 11 tracks, 6 RLL + 5 MFM interpreted as MFM