"don't trust customer input without stripping or escaping it" feels obvious, but I don't think it stands up to scrutiny. What exactly do you strip or escape when you're trying to prevent an unknown multitude of legacy spreadsheet clients that you don't control from mishandling data in an unknown variety of ways? How do you know you're not disrupting downstream customer data flows with your escaping? The core issue, as I understand it, stems from possible unintended formula execution – which can be prevented by prefixing certain cells with a space or some invisible character (mentioned in the linked post above). This _does_ modify customer data, but hopefully in a way that unobtrusive enough to be acceptable. All in all, it seems to be a problem without a perfect solution.
Definitely agree there's no perfect solution. There's some escaping that seems to work ok, but that's going to break CSV-imports.
An imperfect solutions is that applications should be designed with task-driven UIs so that they know the intended purpose of a CSV export and can make the decision to escape/not escape then. Libraries can help drive this by designing their interfaces in a similar manner. Something like `export_csv_for_eventual_import()`, `export_csv_for_spreadsheet_viewing()`.
Another imperfect solution would be to ... ugh...generate exports in Excel format rather than CSV. I know, I know, but it does solve the problem.
Or we could just get everyone in the world to switch to emacs csv-mode as a csv viewer. I'm down with that as well.
The intention-based philosophy of all this makes a lot of sense, was eye opening, and I agree it should be the first approach. Unfortunately after considering our use cases, we quickly realized that we'd have no way of knowing how customers intend to use the csv exports they've requested - we've talked to some of them and it's a mix. We could approach things case by case but we really just want a setup which works well 99% of the time and mitigates known risk. We settled on the prefixing approach and have yet to receive any complaints about it, specifically using a space character with the mind that something unobtrusive (eg. easily strippable) but also visible, would be best - to avoid quirks stemming from something completely hidden.
Thank again for your writing and thoughts, like I said above I haven't found much else of quality on the topic.
Or you could just use the ISO standard .xlsx, which is a widely supported format that is not Excel specific but has first class support in Excel.
That's honestly a good solution and the end users prefer it anyway.
Then, the user doesn’t have Excel has the default program for opening the file and has to jump through a couple safety hoops
Nah, they just quickly learn to rename it .csv or .xls and excel will open it
I need to interface with a lot of non-technical people who exclusively use Excel. I give them .xlsx files. It's just as easy to export .xlsx as it is to export .CSV and my customers are happy.
If xlsx works for all your use cases that's great, a much better solution that trying to sidestep these issues by lightly modifying the data. It's not an option for us, and (I'd imagine) a large contingent of export tools which can't make assumptions about downstream usage.
I responded that this was Excel's problem, not ours, and that nobody would assign a CVE to our product for such a "vulnerability". How naive I was! They forwarded me several such CVEs assigned to products that create CSVs that are "unsafe" for Excel.
Terrible precedent. Ridiculous security theater.
I agree with the characterization ("security theater") of these bug reports. The problem is that the intentions of these reports don't make the potential risk less real, depending on the setting, and I worry that the "You're just looking for attention" reaction (a very fair one!) leads to a concerning downplaying of this issue across the web.
As a library author, I agree this very well may not be something that needs to be addressed. But as someone working in a company responsible for customers, employees, and their sensitive information, disregarding this issue disregards the reality of the tools these people will invariably use, downstream of software we _are_ responsible for. Aiming to make this downstream activity as safe as possible seems like a worthy goal.