emulating read and readline methods

Sean Davis · Sep 10, 2008

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line

def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

def readline(self,n=1):
return self._read().next()

def read(self,n=1):
return self._read().next()

def close(self):
self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: COPY gene_info, line 1000: ""

for a 1000 line test file. Any ideas what is going on?

Thanks,
Sean

Diez B. Roggisch · Sep 10, 2008

Sean said:
I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line

def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

def readline(self,n=1):
return self._read().next()

def read(self,n=1):
return self._read().next()

def close(self):
self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: COPY gene_info, line 1000: ""

for a 1000 line test file. Any ideas what is going on?

I'm a bit lost why the above actually works - as _read() appears to be
re-created instead of re-used for each invocation, and thus can't work IMHO.

Anyway, I think the real problem is that you don't follow the
readline-protocol. it returns "" if there is no more line to read,
instead you raise a StopIteration

Diez

MRAB · Sep 10, 2008

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line

def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

def readline(self,n=1):
return self._read().next()

def read(self,n=1):
return self._read().next()

Each time readline() and read() call self._read() they are creating a
new generator. They then get one value from the newly-created
generator and then discard that generator. What you should do is
create the generator in __init__ and then use it in readline() and
read().

def close(self):
self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: COPY gene_info, line 1000: ""

for a 1000 line test file. Any ideas what is going on?

Click to expand...

I wonder whether it's expecting readline() and read() to return an
empty string at the end of the file instead of raising StopIteration.

MRAB · Sep 11, 2008

Sean Davis schrieb:

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

Click to expand...

I have a class like so:

Click to expand...

class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line

Click to expand...

def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

Click to expand...

def readline(self,n=1):
return self._read().next()

Click to expand...

def read(self,n=1):
return self._read().next()

Click to expand...

def close(self):
self.fh.close()

Click to expand...

and I use it like so:

It works well except that the end of file is not caught by copy_from.
I get errors like:

Click to expand...

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: COPY gene_info, line 1000: ""

Click to expand...

for a 1000 line test file. Any ideas what is going on?

Click to expand...

I'm a bit lost why the above actually works - as _read() appears to be
re-created instead of re-used for each invocation, and thus can't work IMHO.

Each generator that's created reads a single line from the file
(self.fh), yields the result, and is then discarded; none of the
individual generator read more than one line from the file.

John Machin · Sep 11, 2008

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

Click to expand...

I have a class like so:

Click to expand...

class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line

Click to expand...

def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

Click to expand...

def readline(self,n=1):
return self._read().next()

Click to expand...

def read(self,n=1):
return self._read().next()

Click to expand...

Each time readline() and read() call self._read() they are creating a
new generator. They then get one value from the newly-created
generator and then discard that generator. What you should do is
create the generator in __init__ and then use it in readline() and
read().

def close(self):
self.fh.close()

Click to expand...

and I use it like so:

It works well except that the end of file is not caught by copy_from.
I get errors like:

Click to expand...

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: COPY gene_info, line 1000: ""

Click to expand...

for a 1000 line test file. Any ideas what is going on?

Click to expand...

I wonder whether it's expecting readline() and read() to return an
empty string at the end of the file instead of raising StopIteration.

Don't wonder; ReadTheFantasticManual:

read( [size])

.... An empty string is returned when EOF is encountered
immediately. ...

readline( [size])

... An empty string is returned only when EOF is encountered
immediately.

Sean Davis · Sep 11, 2008

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).
I have a class like so:
class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line
def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"
def readline(self,n=1):
return self._read().next()
def read(self,n=1):
return self._read().next()

Click to expand...

Click to expand...

Each time readline() and read() call self._read() they are creating a
new generator. They then get one value from the newly-created
generator and then discard that generator. What you should do is
create the generator in __init__ and then use it in readline() and
read().

Click to expand...

I wonder whether it's expecting readline() and read() to return an
empty string at the end of the file instead of raising StopIteration.

Click to expand...

Don't wonder; ReadTheFantasticManual:

read( [size])

... An empty string is returned when EOF is encountered
immediately. ...

readline( [size])

... An empty string is returned only when EOF is encountered
immediately.

Thanks. This was indeed my problem--not reading the manual closely
enough.

And the points about the iterator being re-instantiated were also
right on point. Interestingly, in this case, the code was working
because read() and readline() were still returning the next line each
time since the file handle was being read one line at a time.

Sean

MRAB · Sep 11, 2008

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).
I have a class like so:
class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line
def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"
def readline(self,n=1):
return self._read().next()
def read(self,n=1):
return self._read().next()
Each time readline() and read() call self._read() they are creating a
new generator. They then get one value from the newly-created
generator and then discard that generator. What you should do is
create the generator in __init__ and then use it in readline() and
read().
def close(self):
self.fh.close()
and I use it like so:
a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()
It works well except that the end of file is not caught by copy_from.
I get errors like:
psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: COPY gene_info, line 1000: ""
for a 1000 line test file. Any ideas what is going on?
I wonder whether it's expecting readline() and read() to return an
empty string at the end of the file instead of raising StopIteration.

Click to expand...

Click to expand...

Don't wonder; ReadTheFantasticManual:

Click to expand...

read( [size])

Click to expand...

... An empty string is returned when EOF is encountered
immediately. ...

Click to expand...

readline( [size])

Click to expand...

... An empty string is returned only when EOF is encountered
immediately.

Click to expand...

Thanks. This was indeed my problem--not reading the manual closely
enough.

And the points about the iterator being re-instantiated were also
right on point. Interestingly, in this case, the code was working
because read() and readline() were still returning the next line each
time since the file handle was being read one line at a time.

After further thought, do you actually need a generator? read() and
readline() could just call _read(), which would read a line from the
file and return the result or an empty string. Or the processing could
be done in readline() and read() just could call readline().

Generic proxy (that proxies methods like __iter__)	3	Jan 27, 2010
Regex Matching on Readline()	3	Dec 20, 2007
psycopg2: connect copy_from and copy_to	8	Feb 19, 2008
problems getting os.system and wxmenu to read options from a file andthen execute	4	Jun 28, 2010
imaplib -- can't read body	0	Jul 11, 2008
MacOS 10.9.2: threading error using python.org 2.7.6 distribution	7	Apr 25, 2014
PyModule(G.py): Now Python has REAL globals -- and their scoped to boot!	0	Nov 13, 2013
Method to separate unit-test methods and data?	1	Jul 5, 2009

emulating read and readline methods

Sean Davis

Diez B. Roggisch

MRAB

MRAB

John Machin

Sean Davis

MRAB

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads