Can I perform this Parse in perl? (Non standard address)

S

Steve

Can't do this in Excel. Can perl do it?

Ok here is my goal.

On Thursday my local newspaper post Garage sell ads for the up coming
weekend.
I've found these sales are an excellent source for merchandise to
sell
on ebay. And the prices are awesome.


If I open the paper site in Excel I get cells that look like this.
(50
- 100 ads)


How can parse out just the time and address of the sale so I can plan
my routes and which days to visit which house.
(Folks mark the stuff down on the lastday)


1. Come see at: 4785 SE 133rd Dr, City, State 12345 Off Holgate, take
a right on 134th, (Aspen Meadows), stop sign take a right, take a
left
on 133rd and 5th house on the left. Saturday August 11, 2007 10am to
5pm only


2. Lots of Name Brands!! Tons of Clothes for Girls and Boys. Dog
House, Animal Kennel, toys, lego table, infant chairs, girl HOPE TO
SEE YOU THERE, THANK YOU!!!! This Friday & Saturday!! 8/10 & 8/11
10am
- 6pm 2100 SE 118th Ave City, St 12345


3. Lots of Name Brands!! Tons of Clothes for Girls and Boys. Dog
House, Animal Kennel, toys, lego table, infant chairs, girl HOPE TO
SEE YOU THERE, THANK YOU!!!! Fri & Sat Aug 10 & Aug 11 10 am - 6 pm
2100 SE 120th Ave City, St no zip


Steve
 
S

Skye Shaw!@#$

Can't do this in Excel. Can perl do it?
More of less
Ok here is my goal.

<snip>

On Thursday my local newspaper post Garage sell ads for the up coming
weekend.
If I open the paper site in Excel I get cells that look like this:
1. Come see at: 4785 SE 133rd Dr, City, State 12345 Off Holgate, take
a right on 134th, (Aspen Meadows), stop sign take a right, take a
left
on 133rd and 5th house on the left. Saturday August 11, 2007 10am to
5pm only

2. Lots of Name Brands!! Tons of Clothes for Girls and Boys. Dog
House, Animal Kennel, toys, lego table, infant chairs, girl HOPE TO
SEE YOU THERE, THANK YOU!!!! This Friday & Saturday!! 8/10 & 8/11
10am
- 6pm 2100 SE 118th Ave City, St 12345

3. Lots of Name Brands!! Tons of Clothes for Girls and Boys. Dog
House, Animal Kennel, toys, lego table, infant chairs, girl HOPE TO
SEE YOU THERE, THANK YOU!!!! Fri & Sat Aug 10 & Aug 11 10 am - 6 pm
2100 SE 120th Ave City, St no zip

How can parse out just the time and address of the sale so I can plan
my routes and which days to visit which house.


Dates/times are easy -for the most part, as for addresses, they can be
tricky, especially if you want to break them up into parts.

The best thing to do is look at each description for something that
will tell you, "Hey there's an address on this line". Addresses and
streets can be complicated, so we won't bother with those. The state
(well, zip too) is the simplest part, so we'll look for them.

In my example, i use a text file with the sample addresses
you posted ("state" switched with "California").

Using Text::CSV to iterate over the spreadsheet's rows, and
Text::Sentence to iterate over the description's lines is left as an
exercise...

[sshaw@localhost ~]$ cat bs.pl
use strict;
use warnings;

my %DAYS = (Monday=>qr!\bMon(?:\.|(?:day))?\b!i,
Tuesday=>qr!\bTues(?:\.|(?:day))?\b!i,
#...
Friday=>qr!\bFri(?:\.|(?:day))?\b!i,
Saturday=>qr!\bSat(?:\.|(?:urday))?\b!i);


my %DATE = (August=>qr!(?:(?:Aug(?:\.|(?:ust))?)|0?8[-/])\s*\d{1,2}!i,
#...
);


my %STATE = (California=>qr!\bCa(?:lifornia)?\b!i,
#...
);


my $addr;
my (@days,@dates,@times);

my $day = join "|",values %DAYS;
my $date = join "|",values %DATE;
my $state = join "|",values %STATE;



while(<>) {

if(/^$/) {
local $"=" - ";
print "$addr: @days, @dates @times\n";
(@days,@dates,@times) = ();
next;
}

# print $_;

while(/($day)/igo) {
push @days,$1;
}

while(/($date)/goi) {
push @dates,$1;
}

while(/(\d{1,2}\s*[ap]m)/goi) {
push @times,$1;
}

if(/(\d{2,}.+[^$state]\s+$state)/oi) {
$addr = $1;
}

}


[sshaw@localhost ~]$ perl bs.pl descs
4785 SE 133rd Dr, City, CA: Saturday, August 11 10am - 5pm
2100 SE 118th Ave City, California: Friday - Saturday, 8/10 - 8/11
10am - 6pm
2100 SE 120th Ave City, Ca: Fri - Sat, Aug 10 - Aug 11 10 am - 6 pm


Of course, this example will not work if an address spans 2 lines, or
if there are several times and/or messages relating to them. i.e.
"Everything must go by 2pm".

If you want more detailed address parsing, i.e. extracting
street,address,state,zip into their own fields, check out
Geo::StreetAddress::US. GEO::StreetAddress::US can't extract the
address from a paragraph, but once you have (or think you have <:^| )
an address, you can pass the the value to it for parsing.

Or you can use its RegExes.

I'm curious to see other suggestions.
 
S

Skye Shaw!@#$

More of less


<snip>

<reformatted post>






Dates/times are easy -for the most part, as for addresses, they can be
tricky, especially if you want to break them up into parts.

The best thing to do is look at each description for something that
will tell you, "Hey there's an address on this line". Addresses and
streets can be complicated, so we won't bother with those. The state
(well, zip too) is the simplest part, so we'll look for them.

In my example

my %DAYS = (Monday=>qr!\bMon(?:\.|(?:day))?\b!i,
Tuesday=>qr!\bTues(?:\.|(?:day))?\b!i,
#...
Friday=>qr!\bFri(?:\.|(?:day))?\b!i,
Saturday=>qr!\bSat(?:\.|(?:urday))?\b!i);
my $day = join "|",values %DAYS;
while(/($day)/igo) {

Oops, the "i" modifier is superfluous
 

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top