Woah, yeah that's a serious problem. Data loss under that scenario is nothing to sneeze at.
You should be running Kafka in multiple DCs/AZs for high availability and scalability.
And in that scenario fsync is nice but not necessary.
Or I suppose just pay for a managed service from someone who does that for you.
If you need reliable data storage do not use Kafka or similar technologies.
Just checked kafka’s homepage, it mentions mission critical, durable, fault tolerant, stores data safely, zero message loss, trusted… Seems they’ve moved on from their original goal.
Why would I not just turn on fsync or deploy in a distributed pattern for reliability so I can just continue using it instead of ripping it out, benchmarking something new, teaching the entire org something new, potentially negotiating a new contract, and then executing a huge migration?
Instead of reading the marketing claims I like to read what @aphyr has to say about data storage systems.
I suppose that might have been the original goal, but the current tag line includes "data integration" and "mission-critical".
"Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications."