> specialized workloads like matrix multiplications
They're not so specialized in the scope of supercomputers.
Matrix multiplications are pretty much any "simulation of reality". Be it a nuclear explosion, weather modeling, finite element analysis (aka: simulated car crashes), protein folding, chemical atom-to-atom simulations and more.
> how much faster would it be at SAT solving?
Dumb WalkSAT is embarrassingly parallel and probably would scale to GPUs very easily actually. I admit that I'm not in the field of SAT but there seems to be plenty of research into how to get SAT onto GPUs.
I've personally been playing with BDDs (or really, MDDs for my particular application). Traversal of BDDs / MDDs is embarrassingly parallel, assuming your BDD / MDD is wide enough (which for any "complicated" function, it should be).
BDD / MDD based traversal of SAT / constrain satisfaction is likely going to be a growing area of research, since the problems are closely related. I've also seen some BDD/MDD "estimate" methodologies, akin to how arc-consistency / path-consistency estimates a SAT/Constraint solver.
In effect, if you say "BDD that underestimates the function" and a 2nd BDD that "overestimates the function", you can use BDDs / MDDs to have a lower-bound and upper-bound on some estimates. And those structures can scale from kilobytes to gigabytes, depending on how much or little "estimation" you want.
Does such a thing exist yet? I don't think so. The papers I've read on this subject are from 2018, and they weren't from parallel programmers who know much about high-speed GPU programming. But I definitely believe that there's some synergy here if researchers combine the two fields. You use the GPU to embarassingly parallel calculate the BDD every step to guide a sequential-algorithm search over the SAT/Constraint Satisfaction problems.
Using BDDs (instead of arc-consistency) as your "heuristic" of search is still a relatively unexplored field, but is "very obviously" a way to utilize a GPU if you know how they work.
-----------
Speaking of embarrassingly parallel, I'm pretty sure arc-consistency is also an embarrassingly parallel problem, but arc-consistency is "too inflexible", leading to "too small" (not efficiently utilizing the GBs or TBs of RAM we have today).
Extending the arc-consistency to path-consistency or K-consistency increases the size of the data-structure (and therefore the parallelism and work involved), but not very smoothly.
Instead, the estimated BDD/MDD methodology seems like a magical methodology that scales to different memory sizes better. That is, relaxed BDDs or restricted BDDs for lower-bounds and upper-bounds estimations.