Do you think that some form of dynamic time warping algorithm, performed on the data of each distinct group, could help give a better overall view of that group's behavior in a static heatmap image? Presuming participants share similar scanpaths within a group.
Just a thought, I understand the data may be too noisy / have too much variance for this to work in practice.