looking for efficient way to parse a file

Eric Martin · Jan 12, 2008

Hello,

I have a file with the following data structure:
#category
item name
data1
data2
item name
data1
data2
#category
item name
data1
data2
.... etc.

Any line that starts with #, indicates a new category. Between
categories, there can be any number of items, with associated data.
Each item has exactly two data properties.

My plan was to just get an array that contained the index of each of
the categories and then parse each item from there, since they are in
a set format...but I was wondering if there were any suggestions for a
more efficient way...

Gunnar Hjalmarsson · Jan 12, 2008

Eric said:
I have a file with the following data structure:
#category
item name
data1
data2
item name
data1
data2
#category
item name
data1
data2
... etc.

Any line that starts with #, indicates a new category. Between
categories, there can be any number of items, with associated data.
Each item has exactly two data properties.

My plan was to just get an array that contained the index of each of
the categories and then parse each item from there, since they are in
a set format...

Not sure what you mean by that. Could you please expand?

but I was wondering if there were any suggestions for a
more efficient way...

Efficient - in what sense?

To me, the described data structure would suggest a HoHoA (hash of
hashes of arrays):

use Data:

umper;

my (%HoHoA, $cat);
while ( <DATA> ) {
chomp;
if ( substr($_, 0, 1) eq '#' ) {
$cat = substr $_, 1;
next;
}
for my $item ( 0, 1 ) {
chomp( $HoHoA{$cat}{$_}[$item] = <DATA> );
}
}
print Dumper \%HoHoA;

__DATA__
#category1
item1
data1
data2
item2
data1
data2
#category2
item1
data1
data2

xhoster · Jan 12, 2008

Eric Martin said:
Hello,

I have a file with the following data structure:
#category
item name
data1
data2
item name
data1
data2
#category
item name
data1
data2
... etc.

Any line that starts with #, indicates a new category. Between
categories, there can be any number of items, with associated data.
Each item has exactly two data properties.

My plan was to just get an array that contained the index of each of
the categories

That suggests the categories are already in an array, or else what is the
index the index to? I'd probably not bother to load them into an array
in the first place, just parse it on the fly. Maybe not, depending on
where it was coming from and how big I expected it to plausibly get.

and then parse each item from there, since they are in
a set format...but I was wondering if there were any suggestions for a
more efficient way...

Efficient in what sense? Memory? CPU time? Programmer maintenance time?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Jürgen Exner · Jan 13, 2008

Eric Martin said:
I have a file with the following data structure:
#category
item name
data1
data2
item name
data1
data2
#category
item name
data1
data2
... etc.

Any line that starts with #, indicates a new category. Between
categories, there can be any number of items, with associated data.
Each item has exactly two data properties.

That suggests to me a Hash(category) of Hash(item name) of Array (two data
elements)

My plan was to just get an array that contained the index of each of
the categories and then parse each item from there, since they are in

What's an index of a category?

a set format...but I was wondering if there were any suggestions for a
more efficient way...

Reading the file line by line in a linear manner is about as efficient as
you can possibly get because you need to read each item at least once and
you don't read it more than once, either. The suggested data structure would
support a linear reading, too.

jue

Eric Martin · Jan 13, 2008

Not sure what you mean by that. Could you please expand?

I was thinking of loading the file into an array, iterating over it to
find the index values for each category, then parsing the data between
each category, using the array of indexes I previously created.
However, your suggestion to use a HoHoA and code sample, proved to be
exactly what I needed.

Efficient - in what sense?

I probably should have said effective

To me, the described data structure would suggest a HoHoA (hash of
hashes of arrays):

use Data:umper;

my (%HoHoA, $cat);
while ( <DATA> ) {
chomp;
if ( substr($_, 0, 1) eq '#' ) {
$cat = substr $_, 1;
next;
}
for my $item ( 0, 1 ) {
chomp( $HoHoA{$cat}{$_}[$item] = <DATA> );
}}

print Dumper \%HoHoA;

__DATA__
#category1
item1
data1
data2
item2
data1
data2
#category2
item1
data1
data2

Thanks for the code sample, it worked great! I didn't realize
referencing <DATA> in the while block would "increment" the record of
the data file.

-Eric

Looking for help for someone to Help Build me a programme	7	Feb 11, 2024
Image shifts to the right when export the page to pdf	4	May 5, 2023
efficient way to process data	0	Jan 12, 2014
Is there a way to input a unique number for each array output?	4	Aug 31, 2022
Looking for someone to take alook at this code and help	2	Mar 10, 2023
Hi, I am a webflow user. I am looking for CSS code that can KEEP ALL ELEMENTS POSITIONED in the SAME spot across all resolutions	0	Oct 27, 2023
PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 11, 2022
How to sort a CSV file with merge sort JAVA	7	May 6, 2021

looking for efficient way to parse a file

Eric Martin

Gunnar Hjalmarsson

xhoster

Jürgen Exner

Eric Martin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads