S
subimage
Hey all...
I'm working on a massive Rails site that does heavy data import daily.
A lot of this data is in XML files of various sizes ranging from 100k
to 400mb, and totaling around 2gb for all sources. I'd like to keep the
entire project using Ruby.
At first, I wrote my parsers using REXML, but found that to be DOG
SLOW, especially for the large files. I tried REXML:
arse_stream but
couldn't find any good documentation for handling parsing that way. It
was taking around 30 minutes to an hour to even _open_ the larger files
on a p4 1.8ghz test machine.
After that exercise I switched to libxml, which is a lot speedier, but
still slow (no numbers to back it up yet, just can tell by the speed of
data insert in my DB)
I'm wondering if there's some other lib out there that I'm missing? Can
someone point me in the right direction? Is there anything faster I'm
missing out on?
Are there any "gotchas" with using libxml that I should be aware of
speed-wise?
Any and all help is much appreciated...thanks!
I'm working on a massive Rails site that does heavy data import daily.
A lot of this data is in XML files of various sizes ranging from 100k
to 400mb, and totaling around 2gb for all sources. I'd like to keep the
entire project using Ruby.
At first, I wrote my parsers using REXML, but found that to be DOG
SLOW, especially for the large files. I tried REXML:
couldn't find any good documentation for handling parsing that way. It
was taking around 30 minutes to an hour to even _open_ the larger files
on a p4 1.8ghz test machine.
After that exercise I switched to libxml, which is a lot speedier, but
still slow (no numbers to back it up yet, just can tell by the speed of
data insert in my DB)
I'm wondering if there's some other lib out there that I'm missing? Can
someone point me in the right direction? Is there anything faster I'm
missing out on?
Are there any "gotchas" with using libxml that I should be aware of
speed-wise?
Any and all help is much appreciated...thanks!