A
Adam Sanderson
I was wondering if anyone would be interested in, or knows of a generic
parsing library. I am continually faced with reading in bizarre text
files and parsing them. They tend to have regular structures though
(at the whim of researcher who made them). I'd like to write up some
sort of declarative code to parse these files. There's a lot of room
for reuse.
The data tends to be structured, but not rigorously, and changes
whenever someone feels like it, it's not hard to parse manually but
wouldn't it be nice to do a little metaprogramming at the top of a
class and say something like this? (not a rigorous example)
class LoopDetector
one :header, :hash, :start_after=>/^\*+$/,
:end_before=>/^\*+$/, :split=>/:\s+/
many :days, LoopData, :start_after=>:header,
:end_before=>/\n\n\n/
end
Most of the data can be broken down into:
- Spacer lines
- Hashes
- Tables
- Garbage (No seriously, some of these files have completely pointless
information in a lot of them)
Any ideas folks?
.adam sanderson
Here's one example of the type of data I get to play with (in reality
it goes from 00:00 -> 23:55 for each set of Loop Data, and there are
about 200 sets of Raw Loop Data). For anyone who's interested this is
loop detector data, which measures the amount of traffic on freeways.
***********************************
Filename: 0076ON04.cdl
Extracted by: CDR_Auto version 3.31 BETA g
Creation Date: Mar27/05 (Sun)
Creation Time: 20:23:09
File Type: TEXT
***********************************
ES-076R:_CN_O_1 I-5 MLK Jr Way-NB 157.13
01/01/04 (Thu)
---Raw Loop Data Listing---
Time Vol Occ Flg nPds
00:00 5 0.4% 1 15
00:05 11 1.2% 1 15
00:10 14 1.2% 1 15
23:50 3 0.5% 2 15
23:55 3 0.4% 1 15
ES-076R:_CN_O_1 I-5 MLK Jr Way-NB 157.13
01/02/04 (Fri)
---Raw Loop Data Listing---
Time Vol Occ Flg nPds
00:00 0 0.0% 0 0
00:05 0 0.0% 0 0
00:10 0 0.0% 0 0
00:15 0 0.0% 0 0
23:50 0 0.0% 0 0
23:55 26 3.8% 2 10
parsing library. I am continually faced with reading in bizarre text
files and parsing them. They tend to have regular structures though
(at the whim of researcher who made them). I'd like to write up some
sort of declarative code to parse these files. There's a lot of room
for reuse.
The data tends to be structured, but not rigorously, and changes
whenever someone feels like it, it's not hard to parse manually but
wouldn't it be nice to do a little metaprogramming at the top of a
class and say something like this? (not a rigorous example)
class LoopDetector
one :header, :hash, :start_after=>/^\*+$/,
:end_before=>/^\*+$/, :split=>/:\s+/
many :days, LoopData, :start_after=>:header,
:end_before=>/\n\n\n/
end
Most of the data can be broken down into:
- Spacer lines
- Hashes
- Tables
- Garbage (No seriously, some of these files have completely pointless
information in a lot of them)
Any ideas folks?
.adam sanderson
Here's one example of the type of data I get to play with (in reality
it goes from 00:00 -> 23:55 for each set of Loop Data, and there are
about 200 sets of Raw Loop Data). For anyone who's interested this is
loop detector data, which measures the amount of traffic on freeways.
***********************************
Filename: 0076ON04.cdl
Extracted by: CDR_Auto version 3.31 BETA g
Creation Date: Mar27/05 (Sun)
Creation Time: 20:23:09
File Type: TEXT
***********************************
ES-076R:_CN_O_1 I-5 MLK Jr Way-NB 157.13
01/01/04 (Thu)
---Raw Loop Data Listing---
Time Vol Occ Flg nPds
00:00 5 0.4% 1 15
00:05 11 1.2% 1 15
00:10 14 1.2% 1 15
23:50 3 0.5% 2 15
23:55 3 0.4% 1 15
ES-076R:_CN_O_1 I-5 MLK Jr Way-NB 157.13
01/02/04 (Fri)
---Raw Loop Data Listing---
Time Vol Occ Flg nPds
00:00 0 0.0% 0 0
00:05 0 0.0% 0 0
00:10 0 0.0% 0 0
00:15 0 0.0% 0 0
23:50 0 0.0% 0 0
23:55 26 3.8% 2 10