csv.Sniffer: wrong detection of the end of line delimiter

Laurent Laporte · Dec 28, 2005

hello,

I'm using cvs standard module under Python 2.3 / 2.4 to read a CSV
file. The file is opened in binary mode, so I keep the end of line
terminator.

It appears that the csv.Sniffer force the line terminator to be
'\r\n'. It's fine under Windows but wrong under Linux or
Macintosh.

More about this line terminator: Potential bug in the
_guess_delimiter() method.
The first line of code does a wrong splitting:
data = filter(None, data.split('\n'))
It doesn't take care of the real line terminator!

Here is a patch (not a perfect one):
# ------- begin of patch -------
class PatchedSniffer(csv.Sniffer):

def __init__(self):
csv.Sniffer.__init__(self)

def sniff(self, p_data, p_delimiters = None):
t_dialect = csv.Sniffer.sniff(self, p_data, p_delimiters)
t_dialect.lineterminator = self._guessLineTerminator(p_data)
return t_dialect

def _guessLineTerminator(self, p_data):
for t_lineTerminator in ['\r\n', '\n', '\r']:
if t_lineTerminator in p_data:
return t_lineTerminator
else:
return '\r\n' # Windows default (Excel)

def _formatDataForGuess(self, p_data):
t_lineTerminator = self._guessLineTerminator(p_data)
return '\n'.join(p_data.split(t_lineTerminator))

def _guess_delimiter(self, p_data, p_delimiters):
t_data = self._formatDataForGuess(p_data)

(t_delimiter, t_skipInitialSpace) = \
csv.Sniffer._guess_delimiter(self, t_data, p_delimiters)

if t_delimiter == '' and '\t' in p_data:
t_delimiter = '\t'

return (t_delimiter, t_skipInitialSpace)
# ------- end of patch -------

Bye.
------- Laurent.

Steve Holden · Dec 29, 2005

Laurent said:
hello,

I'm using cvs standard module under Python 2.3 / 2.4 to read a CSV
file. The file is opened in binary mode, so I keep the end of line
terminator.

It's not advisable to open a file like a CSV, intended for use as text,
in binary mode.

It appears that the csv.Sniffer force the line terminator to be
'\r\n'. It's fine under Windows but wrong under Linux or
Macintosh.

Perhaps you should try opening the file in text mode, as this will
normally end up giving you a "\n" terminator on all platforms: that's
what text mode is intended to ensure, and that's probably why the csv
module assumes that splitting on "\n" is safe.

More about this line terminator: Potential bug in the
_guess_delimiter() method.
The first line of code does a wrong splitting:
data = filter(None, data.split('\n'))
It doesn't take care of the real line terminator!
> [...]

I suspect it's not supposed to be trying to!

regards
Steve

Marc 'BlackJack' Rintsch · Dec 29, 2005

Steve Holden said:
It's not advisable to open a file like a CSV, intended for use as text,
in binary mode.

But the docs "demand" this explicitly and all examples in the docs fulfill
that demand.

From http://docs.python.org/lib/csv-contents.html :

If csvfile is a file object, it must be opened with the 'b' flag on
platforms where that makes a difference.

I guess the reason is the same as for "text" pickle format: If you don't
use binary mode the file is not platform independend anymore because some
OSes "manipulate" the data in text mode.

Ciao,
Marc 'BlackJack' Rintsch

Scan until random delimiter.	0	Jun 27, 2010
Cannot figure out line of code, also not understanding error	9	Feb 20, 2014
How to make the tip of a QSlider in PySide2 look like a triangle?	1	Mar 21, 2023
Python point location of intersect between two lines	0	Feb 28, 2018
Text file with mixed end-of-line terminations	2	Aug 31, 2011
New line conversion with Popen attached to a pty	1	Jun 20, 2013
New line conversion with Popen attached to a pty	2	Jun 20, 2013
Better crypto hash functions, long, with code	2	Aug 26, 2005

csv.Sniffer: wrong detection of the end of line delimiter

Laurent Laporte

Steve Holden

Marc 'BlackJack' Rintsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads