[postgres] Is there a way to avoid having the library slurp-read the whole result-set?

H

Helge Elvik

Hi,

I'm currently implementing a database application in Ruby, and I use
PostgreSQL for the data store. At first I tried using DBI to make the
data store as platform/database independent as possible, but when I came
back to work this morning I found that my computer was starved for
virtual memory, and that Ruby was using about 700MBs of RAM and rising.

What I found out is that DBI slurp reads the whole result set into
memory before it sends it back from the library. I poked around in the
code, but found no way to instead return one-and-one result serially as
the server sends them back.

So I poked around in the documentation for the postgres-library, but
sadly with no luck there either. No method there accepts a block so that
you can just receive the results serially. I also had a quick poke at
postgres-pr, but the situation seems to be the same there.

Am I missing something, or is the really now way of receiving the
results serially without the libraries slurping everything up in RAM
first? The table I'm having trouble with currently has about 2 million
rows, so reading everything into RAM is absolutely impossible. I'm
looking for something like this:

db.execute("select * from huge_table").each_row do |row|
# process row here
end

PS: Reading these 2 million rows is not a normal day-to-day operation,
but I do this when I normalize some data from a CSV file. Basically I
restructure the data into several tables, and for now I've mostly done
this with "CREATE TABLE t AS SELECT". I've found that I need to compute
some of the columns with a Ruby-function though, so I basically need to
loop through them all.

Regards,
Helge Elvik
 
H

hubert depesz lubaczewski

Am I missing something, or is the really now way of receiving the
results serially without the libraries slurping everything up in RAM
first? The table I'm having trouble with currently has about 2 million

use cursor's. (in posrtgresql docs look for:
declare
fetch
move
close

depesz
 
A

ara.t.howard

Hi,

I'm currently implementing a database application in Ruby, and I use
PostgreSQL for the data store. At first I tried using DBI to make the
data store as platform/database independent as possible, but when I came
back to work this morning I found that my computer was starved for
virtual memory, and that Ruby was using about 700MBs of RAM and rising.

What I found out is that DBI slurp reads the whole result set into
memory before it sends it back from the library. I poked around in the
code, but found no way to instead return one-and-one result serially as
the server sends them back.

So I poked around in the documentation for the postgres-library, but
sadly with no luck there either. No method there accepts a block so that
you can just receive the results serially. I also had a quick poke at
postgres-pr, but the situation seems to be the same there.

Am I missing something, or is the really now way of receiving the
results serially without the libraries slurping everything up in RAM
first? The table I'm having trouble with currently has about 2 million
rows, so reading everything into RAM is absolutely impossible. I'm
looking for something like this:

db.execute("select * from huge_table").each_row do |row|
# process row here
end

PS: Reading these 2 million rows is not a normal day-to-day operation,
but I do this when I normalize some data from a CSV file. Basically I
restructure the data into several tables, and for now I've mostly done
this with "CREATE TABLE t AS SELECT". I've found that I need to compute
some of the columns with a Ruby-function though, so I basically need to
loop through them all.

Regards,
Helge Elvik

it's been a while since i used the ruby postgres bindings but, when i did,
both 'query' and 'exec' took blocks to iterate over results sets.

have you tried?

-a
 
F

Francis Cianfrocca

it's been a while since i used the ruby postgres bindings but, when i did,
both 'query' and 'exec' took blocks to iterate over results sets.

have you tried?

Not sure that solves his problem, Ara- he wants to keep the library
from yanking the whole result set into RAM before it starts iterating
over his code.
 
A

ara.t.howard

Not sure that solves his problem, Ara- he wants to keep the library
from yanking the whole result set into RAM before it starts iterating
over his code.

yes - that's what it does (used to do). looking at the source and samples/ it
looks like it works this way now

from samples/test1.rb

res = conn.exec("FETCH ALL in myportal")

...

for fld in res.fields
printf("%-15s",fld)
end
printf("\n\n")

...

res.result.each do |tupl| # we iterating here
tupl.each do |fld|
printf("%-15s",fld)
end
printf("\n")
end

dunno how to do that from dbi - never used it.

regards.

-a
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top