This definitely seems like the "Kafka" way to solve this problem, but I fear there are implications to this partitioning scheme I'd love to see answered. For example, partition counts aren't infinite, and aren't easily adjusted after the fact. So if you choose, say, 10 partitions originally, for a SKU space that is nearly infinite, then in reality you can only handle 10 parallel streams of work. Any SKU that is partitioned behind a bit of slow work is then blocked by that work.
It's doable to repartition to 100 partitions or more, but you basically need to replay the work kept in the log based on 10 partitions onto the new 100 partitions, and that operation gets more expensive over time. Then of course you're basically stuck again once your traffic increases to a high enough level that the original problem returns. If the unit of horizontal scaling is the partition, but the partition count can't be easily changed, consumers eventually lose their horizontal scalability in Kafka, from my perspective.