S
skip
The csv module contains a Sniffer class which is supposed to deduce the
delimiter and quote character as well as the presence or absence of a header
in a sample taken from the start of a purported CSV file. I no longer
remember who wrote it, and I've never been a big fan of it. It determines
the delimiter based almost solely on character frequencies. It doesn't
consider what the actual structure of a CSV file is or that delimiters and
quote characters are almost always taken from the set of punctuation or
whitespace characters. Consequently, it can cause some occasional
head-scratching:
... abc8def
... def8ghi
... ghi8jkl
... """ ... a8bcdef
... ab8cdef
... abc8def
... abcd8ef
... """ 'f'
It's not clear to me that people use letters or digits very often as
delimiters. Both samples above probably represent data from single-column
files, not double-column files with '8' or 'f' as the delimiter.
I would be happy to get rid of it in 3.0, but I'm also aware that some
people use it. I'd like feedback from the Python community about this. If
I removed it is there someone out there who wants it badly enough to
maintain it in PyPI?
Thanks,
delimiter and quote character as well as the presence or absence of a header
in a sample taken from the start of a purported CSV file. I no longer
remember who wrote it, and I've never been a big fan of it. It determines
the delimiter based almost solely on character frequencies. It doesn't
consider what the actual structure of a CSV file is or that delimiters and
quote characters are almost always taken from the set of punctuation or
whitespace characters. Consequently, it can cause some occasional
head-scratching:
... abc8def
... def8ghi
... ghi8jkl
... """ ... a8bcdef
... ab8cdef
... abc8def
... abcd8ef
... """ 'f'
It's not clear to me that people use letters or digits very often as
delimiters. Both samples above probably represent data from single-column
files, not double-column files with '8' or 'f' as the delimiter.
I would be happy to get rid of it in 3.0, but I'm also aware that some
people use it. I'd like feedback from the Python community about this. If
I removed it is there someone out there who wants it badly enough to
maintain it in PyPI?
Thanks,