Meanwhile, RFC4180 takes less time to read than this entire article.
So true about RFC4180. Admittedly this post kind of got out a little early, support for the format was slated for the first of next month...
CSV is a simple storage format for data. Its simplicity, readability and portability makes it popular. I think that any attempt to improve it will be a failure.
I must say that CSV generally suffices for table data. The only annoyance is that internationally, there are differences in the use of the column separator, as the comma is often used as decimal separator. I think CSV should always be implemented with a comma as column separator and a dot as decimal separator, regardless of the country. But applications such as Excel do not accept this format internationally.
It's a problem solved decades ago with solutions we've failed to adopt. Weird, buggy, poorly parsable CSV is still somehow the norm.
Not saying you should, but if you want to change, the answer is already there. Change has to start somewhere...
> We tried using the control characters, and also tried configuring various editors to show the control characters by rendering the control picture characters.
> First, we encountered many difficulties with editor configurations, attempting to make each editor treat the invisible zero-width characters by rendering with the visible letter-width characters.
> Second, we encountered problems with copy/paste functionality, where it often didn't work because the editor implementations and terminal implementations copied visible letter-width characters, not the underlying invisible zero-width characters.
>Third, users were unable to distinguish between the rendered control picture characters (e.g. the editor saw ASCII 31 and rendered Unicode Unit Separator) versus the control picture characters being in the data content (e.g. someone actually typed Unicode Unit Separator into the data content).
https://github.com/SixArm/usv/tree/main/doc/faq#why-use-cont...
CSO is a stormwater industry term for "Combined Sewer Overflow." They happen in older cities where storm runoff and raw sewage (poop) go into the same sewer system. When there is a lot of rain, the wastewater treatment plants overflow, and then raw sewage runs into waterways.
https://en.wikipedia.org/wiki/Combined_sewer#Combined_sewer_...
If you're typing in CSV manually, escape with \
If you're exporting to CSV, the program already know which part is data and which part is the next cell, so again the program can escape with \
Most good implementations are flexible enough that they might be configurable to your proposed pseudo CSV. (Or even DSV. Or USV. Etc.) But I'd rather just not need to, and the sanest default for any CSV library is the standard format.
(Or even better … just emit newline-terminated JSON. Richer format, less craziness than CSV, parsers still abound.)
¹(RFC 4180. "," is field sep, CRLF is row sep. You can escape a comma or a CRLF by surrounding the entire field in double-quotes, and a double quote itself can be escaped by escaping the field, doubling the internal double quote.)
And why would you "highly prefer you just emit standard CSV"? What is the benefit to insisting adherence to the original standard, especially if the modification fixes something that is broken?
n.b. not worth your time. tl;dr: lets replace the comma with the poop emoji because commas occur in data.
There's already a solution to that (obviously). Best argument a contrarian could make is you "learn about unicode", by which they'd mean, the words "basic multilingual plane" are included at one point.