how can i change the text delimiter

S

sonald

Hi,
Can anybody tell me how to change the text delimiter in FastCSV Parser
?
By default the text delimiter is double quotes(")
I want to change it to anything else... say a pipe (|)..
can anyone please tell me how do i go about it?
 
A

Amit Khemka

sonald said:
Hi,
Can anybody tell me how to change the text delimiter in FastCSV Parser
?
By default the text delimiter is double quotes(")
I want to change it to anything else... say a pipe (|)..
can anyone please tell me how do i go about it?

You can use the parser constructor to specify the field seperator:
Python >>> parser(ms_double_quote = 1, field_sep = ',', auto_clear = 1)

cheers,
amit.

--
 
A

Amit Khemka

Btw, I forgot to mention that AFAIK, in CSV '"' (double quotes) are
*not* field seperator delimiters character !

',' (comma) is the default field seperator, and my earlier suggestion
was considering that you want to change the default value for the
field seperator.

http://en.wikipedia.org/wiki/Comma-separated_values

cheers,
amit.
--
 
S

sonald

Hi Amit,
Thanks for a quick response...
E.g record is: "askin"em"

This entire text is extracted as one string but since the qualifier is
double quotes("), therefore fastcsv parser is unable to parse it.

If we can change the text qualifier to pipe(|), then the string will
look like this:
|askin"em|

But for this the default text qualifier in fastcsv parser needs to be
changed to pipe(|). how to do this?

Also please note that the string cannot be modified at all. Thanks.
 
F

Fredrik Lundh

sonald said:
Thanks for a quick response...
E.g record is: "askin"em"

that's usually stored as "askin""em" in a CSV file, and the csv module
has no problem handling that:
[['askin"em']]

to use another quote character, use the quotechar option to the reader
function:
[['askin"em']]

Also please note that the string cannot be modified at all.

not even by the Python program that reads the data? sounds scary.

what's fastcsv, btw? the only thing google finds with that name is a
Ruby library...

</F>
 
S

skip

sonald> Can anybody tell me how to change the text delimiter in FastCSV
sonald> Parser? By default the text delimiter is double quotes(") I
sonald> want to change it to anything else... say a pipe (|).. can
sonald> anyone please tell me how do i go about it?

I'm not familiar with a FastCSV module in Python. Google suggests there is
a FastCsv module in Ruby.

Just in case what you are really referring to is the csv module in Python,
here's how you do it. Suppose your CSV file looks like an Excel-generated
CSV file save for the weird quote character:

import csv

class PipeQuote(csv.excel):
quotechar='|'

...

reader = csv.reader(open("somefile", "rb"), dialect=PipeQuote)
for row in reader:
print row

Full documentation is here:

http://docs.python.org/dev/lib/module-csv.html

Skip
 
S

sonald

Hi ,
thanks for the reply...

fast csv is the the csv module for Python...
and actually the string cannot be modified because
it is received from a third party and we are not supposed to modify the
data in any way..


for details on the fast CSV module please visit

www.object-craft.com.au/projects/csv/ or

import fastcsv
csv = fastcsv.parser(strict = 1,field_sep = ',') // part of
configuration

and somewhere in the code... we are using

data = csv.parse(line)

all i mean to say is, csv.reader is nowhere in the code
and somehow we got to modify the existing code.

looking forward to ur kind reply ...




Fredrik said:
sonald said:
Thanks for a quick response...
E.g record is: "askin"em"

that's usually stored as "askin""em" in a CSV file, and the csv module
has no problem handling that:
[['askin"em']]

to use another quote character, use the quotechar option to the reader
function:
source = StringIO.StringIO('|askin"em|\n')
list(csv.reader(source, quotechar='|'))
[['askin"em']]

Also please note that the string cannot be modified at all.

not even by the Python program that reads the data? sounds scary.

what's fastcsv, btw? the only thing google finds with that name is a
Ruby library...

</F>
 
F

Fredrik Lundh

sonald said:
fast csv is the the csv module for Python...

no, it's not. the csv module for Python is called "csv".
and actually the string cannot be modified because
it is received from a third party and we are not supposed to modify the
data in any way..

that doesn't prevent you from using Python to modify it before you pass it to
the csv parser, though.
for details on the fast CSV module please visit

www.object-craft.com.au/projects/csv/ or

that module is called "csv", not "fastcsv". and as it says on that page, a much
improved version of that module was added to Python in version 2.3.

what Python version are you using?

</F>
 
S

skip

sonald> fast csv is the the csv module for Python... and actually the
sonald> string cannot be modified because it is received from a third
sonald> party and we are not supposed to modify the data in any way..

sonald> for details on the fast CSV module please visit

sonald> www.object-craft.com.au/projects/csv/ or

sonald> import fastcsv
sonald> csv = fastcsv.parser(strict = 1,field_sep = ',') // part of
sonald> configuration

sonald> and somewhere in the code... we are using

sonald> data = csv.parse(line)

sonald> all i mean to say is, csv.reader is nowhere in the code and
sonald> somehow we got to modify the existing code.

You're using Object Craft's old csv module. They are also the primary
authors of the csv module that's been shipped with Python since 2.3. I
suggest you convert your code to use the newer module if possible. If
that's not possible (it's not all that hard - I did it a couple years ago),
try posting a note to (e-mail address removed). The Object Craft folks are on that
list and may be able to help you out. I think they're here as well, though
may not watch this group/list as closely.

Skip
 
S

sonald

Hi,
I am using
Python version python-2.4.1 and along with this there are other
installables
like:
1. fastcsv-1.0.1.win32-py2.4.exe
2. psyco-1.4.win32-py2.4.exe
3. scite-1.63-setup.exe

We are freshers here, joined new... and are now into handling this
module which validates the data files, which are provided in some
predefined format from the third party.
The data files are provided in the comma separated format.

The fastcsv package is imported in the code...
import fastcsv
and
csv = fastcsv.parser(strict = 1,field_sep = ',')

can u plz tell me where to find the parser function definition, (used
above)
so that if possible i can provide a parameter for
text qualifier or text separator or text delimiter..
just as {field_sep = ','} (as given above)

I want to handle string containing double quotes (")
but the problem is that the default text qualifier is double quote

Now if I can change the default text qualifier... to say pipe (|)
the double quote inside the string may be ignored...
plz refer to the example given in my previous query...

Thanks..
 
F

Fredrik Lundh

sonald said:
Python version python-2.4.1 and along with this there are other
installables like:
1. fastcsv-1.0.1.win32-py2.4.exe

I get zero hits for that file on google. are you sure that's not an
in-house tool ? asking comp.lang.python for help on internal tools
isn't exactly optimal.

any reason you cannot switch to the built-in "csv" module instead, so
you can use the solutions you've already gotten ?

</F>
 
A

Amit Khemka

Hi,
I am using
Python version python-2.4.1 and along with this there are other
installables
like:
1. fastcsv-1.0.1.win32-py2.4.exe
2. psyco-1.4.win32-py2.4.exe
3. scite-1.63-setup.exe

We are freshers here, joined new... and are now into handling this
module which validates the data files, which are provided in some
predefined format from the third party.
The data files are provided in the comma separated format.

The fastcsv package is imported in the code...
import fastcsv
and
csv = fastcsv.parser(strict = 1,field_sep = ',')

can u plz tell me where to find the parser function definition, (used
above)
so that if possible i can provide a parameter for
text qualifier or text separator or text delimiter..
just as {field_sep = ','} (as given above)

I want to handle string containing double quotes (")
but the problem is that the default text qualifier is double quote

Now if I can change the default text qualifier... to say pipe (|)
the double quote inside the string may be ignored...
plz refer to the example given in my previous query...

Thanks..

As Fredrik and Skip mentioned earlier, The csv(from
www.object-craft.com.au/projects/csv/) module you are using is
obsolete. And an improved version is a part of standard python
distribution. You should the standard module, in the way as suggested.

You may like to do the following.
1. Convert all your data-file with say '|' as the quotechar (text delimiter)
2. Use (Import) the *standard* python cvs module ()

A sample code would look like:

import csv # standard csv module distributed with python

inputFile = "csv_file_with_pipe_as_quotchar.txt"
reader = csv.reader(open(inputFile), quotechar='|')

for row in reader:
# do what ever you would like to do

cheers,
amit.
 
J

John Machin

sonald said:
Hi,
I am using
Python version python-2.4.1 and along with this there are other
installables
like:
1. fastcsv-1.0.1.win32-py2.4.exe

Well, you certainly didn't get that from the object-craft website --
just go and look at their download page
http://www.object-craft.com.au/projects/csv/download.html -- stops dead
in 2002 and the latest windows kit is a .pyd for Python 2.2. As you
have already been told and as the object-craft csv home-page says,
their csv was the precursor of the Python csv module.

2. psyco-1.4.win32-py2.4.exe
3. scite-1.63-setup.exe

We are freshers here, joined new... and are now into handling this
module which validates the data files, which are provided in some
predefined format from the third party.
The data files are provided in the comma separated format.

The fastcsv package is imported in the code...
import fastcsv
and
csv = fastcsv.parser(strict = 1,field_sep = ',')

Aha!! Looks like some misguided person has got a copy of the
object-craft code, renamed it fastcsv, and compiled it to run with
Python 2.4 ... so you want some docs. The simplest thing to do is to
ask it, e.g. like this, but with Python 2.4 (not 2.2) and call it
fastcsv (not csv):

.... command-prompt...>\python22\python
Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.Help on built-in function parser:

parser(...)
parser(ms_double_quote = 1, field_sep = ',',
auto_clear = 1, strict = 0,
quote_char = '"', escape_char = None) -> Parser

Constructs a CSV parser object.

ms_double_quote
When True, quotes in a fields must be doubled up.

field_sep
Defines the character that will be used to separate
fields in the CSV record.

auto_clear
When True, calling parse() will automatically call
the clear() method if the previous call to parse() raised
an
exception during parsing.

strict
When True, the parser will raise an exception on
malformed fields rather than attempting to guess the right
behavior.

quote_char
Defines the character used to quote fields that
contain the field separator or newlines. If set to None
special characters will be escaped using the escape_char.
##### That's what you are looking for #####
escape_char
Defines the character used to escape special
characters. Only used if quote_char is None.
Help on module csv:

NAME
csv - This module provides class for performing CSV parsing and
writing.

FILE
SOMEWHERE\csv.pyd

DESCRIPTION
The CSV parser object (returned by the parser() function) supports
the
following methods:
clear()
Discards all fields parsed so far. If auto_clear is set to
zero. You should call this after a parser exception.

parse(string) -> list of strings
Extracts fields from the (partial) CSV record in string.
Trailing end of line characters are ignored, so you do not
need to strip the string before passing it to the parser.
If
you pass more than a single line of text, a csv.Error
exception will be raised.

join(sequence) -> string
Construct a CSV record from a sequence of fields.
Non-string
elements will be converted to string.

Typical usage:

import csv
p = csv.parser()
file = open('afile.csv')
while 1:
line = file.readline()
if not line:
break
fields = p.parse(line)
if not fields:
# multi-line record
continue
# process the fields
[snip remainder of docs]
can u plz tell me where to find the parser function definition, (used
above)
so that if possible i can provide a parameter for
text qualifier or text separator or text delimiter..
just as {field_sep = ','} (as given above)

I want to handle string containing double quotes (")
but the problem is that the default text qualifier is double quote

Now if I can change the default text qualifier... to say pipe (|)
the double quote inside the string may be ignored...
plz refer to the example given in my previous query...

It *appears* from this message that you have data already in a file,
and that data is *NOT* (as some one has already told you) in standard
CSV format.

Let me explain: The magic spell for quoting a field in standard CSV
format is:
quote = '"'
sep = ','
twoquotes = quote + quote
if quote in fld:
fld = quote + fld.replace(quote, twoquotes) + quote
elif sep in fld:
fld = quote + fld + quote

Note carefully that if the quote character appears in the raw input
data, it must be *doubled* in the output. If it is not, the standard
reader can't decode the input unambiguously. If is possible that the
using ms_double_quote=0 with the [fast]csv module will do the job for
you. If not, it is possible, if the original data contains *pairs* of
quotes e.g. -- He said "Hello" to his friend -- to decode that using a
different state machine. If that's what you've got, e-mail me; I may be
able to help. However the example you gave had just one quote :-(

*But* are you reading or writing this data? On one hand you say that
you are getting the data from a 3rd party and can't change it [which
implies that you are reading] but on the other hand you want to know
how to tell the [fast]csv module use a "|" as the quote character; that
would be appropriate under two circumstances (1) you are reading a file
that already has "pipe" as the quote character (2) you want to create a
file that quotes using "pipe" ... IOW, it's not guaranteed to work for
reading an existing file that uses " as the quote character. If there
is a pipe character in the original data, it will fail. If (more
likely) there are commas in the original data, then you will get one
extra field per comma.

A quick simple question: after the above csv = fastcsv.parser(.......),
does it do csv.parse(.....) or csv.join(...)???? Can you see any
fread() or fwrite() calls in the code??? If so, which???

HTH -- but you will have to describe what's going on a lot more
precisely.

Cheers,
John
 
S

sonald

Hi,
Thanks a lot for the snips you have included in your post...
those were quite helpful...

And about the 3rd party data....
we receive the data in csv format ... but we are not supposed to modify
the files provided by the user directly...

Instead we make another file with the same name & different
extensions... and use the new files created by the python for further
processing....
quote_char
Defines the character used to quote fields that
contain the field separator or newlines. If set to None
special characters will be escaped using the escape_char.
##### That's what you are looking for #####

Yes you got me right....
I was indeed looking for the quote_char...
Aha!! Looks like some misguided person has got a copy of the
object-craft code, renamed it fastcsv, and compiled it to run with
Python 2.4 ... so you want some docs. The simplest thing to do is to
ask it, e.g. like this, but with Python 2.4 (not 2.2) and call it
fastcsv (not csv):

I guess... that's true... ;)

Thank you very much.




Thanks a lot for the reponse
John said:
sonald said:
Hi,
I am using
Python version python-2.4.1 and along with this there are other
installables
like:
1. fastcsv-1.0.1.win32-py2.4.exe

Well, you certainly didn't get that from the object-craft website --
just go and look at their download page
http://www.object-craft.com.au/projects/csv/download.html -- stops dead
in 2002 and the latest windows kit is a .pyd for Python 2.2. As you
have already been told and as the object-craft csv home-page says,
their csv was the precursor of the Python csv module.

2. psyco-1.4.win32-py2.4.exe
3. scite-1.63-setup.exe

We are freshers here, joined new... and are now into handling this
module which validates the data files, which are provided in some
predefined format from the third party.
The data files are provided in the comma separated format.

The fastcsv package is imported in the code...
import fastcsv
and
csv = fastcsv.parser(strict = 1,field_sep = ',')

Aha!! Looks like some misguided person has got a copy of the
object-craft code, renamed it fastcsv, and compiled it to run with
Python 2.4 ... so you want some docs. The simplest thing to do is to
ask it, e.g. like this, but with Python 2.4 (not 2.2) and call it
fastcsv (not csv):

... command-prompt...>\python22\python
Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.Help on built-in function parser:

parser(...)
parser(ms_double_quote = 1, field_sep = ',',
auto_clear = 1, strict = 0,
quote_char = '"', escape_char = None) -> Parser

Constructs a CSV parser object.

ms_double_quote
When True, quotes in a fields must be doubled up.

field_sep
Defines the character that will be used to separate
fields in the CSV record.

auto_clear
When True, calling parse() will automatically call
the clear() method if the previous call to parse() raised
an
exception during parsing.

strict
When True, the parser will raise an exception on
malformed fields rather than attempting to guess the right
behavior.

quote_char
Defines the character used to quote fields that
contain the field separator or newlines. If set to None
special characters will be escaped using the escape_char.
##### That's what you are looking for #####
escape_char
Defines the character used to escape special
characters. Only used if quote_char is None.
Help on module csv:

NAME
csv - This module provides class for performing CSV parsing and
writing.

FILE
SOMEWHERE\csv.pyd

DESCRIPTION
The CSV parser object (returned by the parser() function) supports
the
following methods:
clear()
Discards all fields parsed so far. If auto_clear is set to
zero. You should call this after a parser exception.

parse(string) -> list of strings
Extracts fields from the (partial) CSV record in string.
Trailing end of line characters are ignored, so you do not
need to strip the string before passing it to the parser.
If
you pass more than a single line of text, a csv.Error
exception will be raised.

join(sequence) -> string
Construct a CSV record from a sequence of fields.
Non-string
elements will be converted to string.

Typical usage:

import csv
p = csv.parser()
file = open('afile.csv')
while 1:
line = file.readline()
if not line:
break
fields = p.parse(line)
if not fields:
# multi-line record
continue
# process the fields
[snip remainder of docs]
can u plz tell me where to find the parser function definition, (used
above)
so that if possible i can provide a parameter for
text qualifier or text separator or text delimiter..
just as {field_sep = ','} (as given above)

I want to handle string containing double quotes (")
but the problem is that the default text qualifier is double quote

Now if I can change the default text qualifier... to say pipe (|)
the double quote inside the string may be ignored...
plz refer to the example given in my previous query...

It *appears* from this message that you have data already in a file,
and that data is *NOT* (as some one has already told you) in standard
CSV format.

Let me explain: The magic spell for quoting a field in standard CSV
format is:
quote = '"'
sep = ','
twoquotes = quote + quote
if quote in fld:
fld = quote + fld.replace(quote, twoquotes) + quote
elif sep in fld:
fld = quote + fld + quote

Note carefully that if the quote character appears in the raw input
data, it must be *doubled* in the output. If it is not, the standard
reader can't decode the input unambiguously. If is possible that the
using ms_double_quote=0 with the [fast]csv module will do the job for
you. If not, it is possible, if the original data contains *pairs* of
quotes e.g. -- He said "Hello" to his friend -- to decode that using a
different state machine. If that's what you've got, e-mail me; I may be
able to help. However the example you gave had just one quote :-(

*But* are you reading or writing this data? On one hand you say that
you are getting the data from a 3rd party and can't change it [which
implies that you are reading] but on the other hand you want to know
how to tell the [fast]csv module use a "|" as the quote character; that
would be appropriate under two circumstances (1) you are reading a file
that already has "pipe" as the quote character (2) you want to create a
file that quotes using "pipe" ... IOW, it's not guaranteed to work for
reading an existing file that uses " as the quote character. If there
is a pipe character in the original data, it will fail. If (more
likely) there are commas in the original data, then you will get one
extra field per comma.

A quick simple question: after the above csv = fastcsv.parser(.......),
does it do csv.parse(.....) or csv.join(...)???? Can you see any
fread() or fwrite() calls in the code??? If so, which???

HTH -- but you will have to describe what's going on a lot more
precisely.

Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top