very high-level IO functions?

York · Sep 19, 2005

Hi,

R language has very high-level IO functions, its read.table can read a
total .csv file and recogonize the types of each column. write.table can
do the reverse.

R's MySQL interface has high-level functions, too, e.g. dbWriteTable can
automatically build a MySQL table and write a table of R data
into it.

Is there any python packages do similar things?

-York

Caleb Hattingh · Sep 19, 2005

York

Short answer: yes

We use python and R at work, and in general you will find python syntax a
little cleaner for functionality they have in common. R is better for
some of the more hard-wired stats stuff, though.

Larry Bates · Sep 19, 2005

While it may "attempt" to recognize the types, it in fact cannot
be more correct than the programmer. Example:

data="""0X1E04 111"""

That "looks" lile a hex and an int. But wait. What if it is
instead two strings?

In Python you can easily write a class with a interator that can
read the data from the file/table and return the PROPER data types
as lists, tuples, or dictionaries that are easy to manipulate.

-Larry Bates

York · Sep 19, 2005

Caleb said:
York

Short answer: yes

Brilliant! and what are they?

We use python and R at work, and in general you will find python syntax
a little cleaner for functionality they have in common. R is better
for some of the more hard-wired stats stuff, though.

I love python. However, as a biologist, I like some high-levels
functions in R. I don't want to spend my time on parse a data file. Then
in my python script, I call R to read data file and write them into an
MySQL table. If python can do this easily, I don't need R at all.

Cheers,

-York

York · Sep 19, 2005

Your are right, a program cannot be smarter than its programmer. However
I need a program to parse any table-format data files offered by user. R
offer such a function, I hope python such a function too.

-York

Larry Bates · Sep 19, 2005

It's so easy (using csv module), no need to build in.
You can wrap in a class if you want to make even easier.
Same can be done for tables from SQL database.

import csv
fp=open(r'C:\test.txt', 'r')
#
# test.txt contains:
#
# "record","value1","value2"
# "1","2","3"
# "2","4","5"
# "3","6","7"
table=csv.DictReader(fp)
for record in table:
#
# Record is a dictionary with keys as fieldnames
# and values of the data in each record
#
print "record #=%s, value1=%s, value2=%s" % \
(record['record'],record['value1'],record['value2'])

fp.close()

-Larry Bates

Bruno Desthuilliers · Sep 19, 2005

York a Ã©crit :
(snip)

I love python. However, as a biologist, I like some high-levels
functions in R. I don't want to spend my time on parse a data file. http://www.python.org/doc/current/lib/module-csv.html

Then
in my python script, I call R to read data file and write them into an
MySQL table. If python can do this easily, I don't need R at all.

So you don't need R at all.

Tom Anderson · Sep 20, 2005

York a écrit :
(snip)

So you don't need R at all.

Did you even read the OP's post? Specifically, this bit:

R language has very high-level IO functions, its read.table can read a
total .csv file and recogonize the types of each column.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Python's csv module gives you tuples of strings; it makes no effort to
recognise the types of the data. AFAIK, python doesn't have any IO
facilities like this.

Larry's point that automagical type detection is risky because it can make
mistakes is a good one, but that doesn't mean that magic is useless - on
the contrary, for the majority of cases, it works fine, and is extremely
convenient.

The good news is that it's reasonably easy to write such a function: you
just need a function 'type_convert' which takes a string and returns an
object of the right type; then you can do:

import csv

def read_table(f):
for row in csv.reader(f):
yield map(type_convert, row)

This is a very, very rough cut - it doesn't do comment stripping, skipping
blank lines, dealing with the presence of a header line or the use of
different separators, etc, but all that's pretty easy to add. Also, note
that this returns an iterator rather than a list; use list(read_table(f))
if you want an actual list, or change the implementation of the function.

type_convert is itself fairly simple:

def _bool(s): # helper method for booleans
s = s.lower()
if (s == "true"): return True
elif (s == "false"): return False
else: raise ValueError, s

types = (int, float, complex, _bool, str)

def type_convert(s):
for type in types:
try:
return type(s)
except ValueError:
pass
raise ValueError, s

This whole thing isn't quite as sophisticated as R's table.convert; R
reads the whole table in, then tries to find a type for each column which
will fit all the values in that column, whereas i do each cell
individually. Again, it wouldn't be too hard to do this the other way
round.

Anyway, hope this helps. Bear in mind that there are python bindings for
the R engine, so you could just use R's version of read.table in python.

tom

York · Sep 21, 2005

Thank you, Tom.

-York

Tom said:
Did you even read the OP's post? Specifically, this bit:

R language has very high-level IO functions, its read.table can read a
total .csv file and recogonize the types of each column.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Python's csv module gives you tuples of strings; it makes no effort to
recognise the types of the data. AFAIK, python doesn't have any IO
facilities like this.

Larry's point that automagical type detection is risky because it can
make mistakes is a good one, but that doesn't mean that magic is useless
- on the contrary, for the majority of cases, it works fine, and is
extremely convenient.

The good news is that it's reasonably easy to write such a function: you
just need a function 'type_convert' which takes a string and returns an
object of the right type; then you can do:

import csv

def read_table(f):
for row in csv.reader(f):
yield map(type_convert, row)

This is a very, very rough cut - it doesn't do comment stripping,
skipping blank lines, dealing with the presence of a header line or the
use of different separators, etc, but all that's pretty easy to add.
Also, note that this returns an iterator rather than a list; use
list(read_table(f)) if you want an actual list, or change the
implementation of the function.

type_convert is itself fairly simple:

def _bool(s): # helper method for booleans
s = s.lower()
if (s == "true"): return True
elif (s == "false"): return False
else: raise ValueError, s

types = (int, float, complex, _bool, str)

def type_convert(s):
for type in types:
try:
return type(s)
except ValueError:
pass
raise ValueError, s

This whole thing isn't quite as sophisticated as R's table.convert; R
reads the whole table in, then tries to find a type for each column
which will fit all the values in that column, whereas i do each cell
individually. Again, it wouldn't be too hard to do this the other way
round.

Anyway, hope this helps. Bear in mind that there are python bindings for
the R engine, so you could just use R's version of read.table in python.

tom

[ANN] Struqtural: High level database interface library	3	Jul 17, 2010
[ANN] HercuLeS high-level synthesis tool	0	Jul 11, 2011
Reading in cooked mode (was Re: Python MSI not installing, log fileshowing name of a Viatnemese comm	8	Mar 23, 2014
A very high level code in VHDL, is it Synthesizable?	0	Jan 18, 2008
Consumer level eye tracking - easy activation of virtual buttonswithout touchscreen - wxpython for b	0	Nov 25, 2013
That's really high-level: bits of beautiful python	3	Feb 21, 2006
Some notes on a high-performance Python application.	4	Mar 26, 2008
A data transformation framework. A presentation inviting commentary.	0	Aug 21, 2013

very high-level IO functions?

York

Caleb Hattingh

Larry Bates

York

York

Larry Bates

Bruno Desthuilliers

Tom Anderson

York

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads