Can I perform this Parse in perl? (Non standard address)

Discussion in 'Perl Misc' started by Steve, Aug 12, 2007.

  1. Steve

    Steve Guest

    Can't do this in Excel. Can perl do it?

    Ok here is my goal.

    On Thursday my local newspaper post Garage sell ads for the up coming
    weekend.
    I've found these sales are an excellent source for merchandise to
    sell
    on ebay. And the prices are awesome.


    If I open the paper site in Excel I get cells that look like this.
    (50
    - 100 ads)


    How can parse out just the time and address of the sale so I can plan
    my routes and which days to visit which house.
    (Folks mark the stuff down on the lastday)


    1. Come see at: 4785 SE 133rd Dr, City, State 12345 Off Holgate, take
    a right on 134th, (Aspen Meadows), stop sign take a right, take a
    left
    on 133rd and 5th house on the left. Saturday August 11, 2007 10am to
    5pm only


    2. Lots of Name Brands!! Tons of Clothes for Girls and Boys. Dog
    House, Animal Kennel, toys, lego table, infant chairs, girl HOPE TO
    SEE YOU THERE, THANK YOU!!!! This Friday & Saturday!! 8/10 & 8/11
    10am
    - 6pm 2100 SE 118th Ave City, St 12345


    3. Lots of Name Brands!! Tons of Clothes for Girls and Boys. Dog
    House, Animal Kennel, toys, lego table, infant chairs, girl HOPE TO
    SEE YOU THERE, THANK YOU!!!! Fri & Sat Aug 10 & Aug 11 10 am - 6 pm
    2100 SE 120th Ave City, St no zip


    Steve
     
    Steve, Aug 12, 2007
    #1
    1. Advertising

  2. On Aug 11, 6:23 pm, Steve <> wrote:
    > Can't do this in Excel. Can perl do it?

    More of less

    > Ok here is my goal.


    <snip>

    <reformatted post>

    > On Thursday my local newspaper post Garage sell ads for the up coming
    > weekend.


    > If I open the paper site in Excel I get cells that look like this:


    > 1. Come see at: 4785 SE 133rd Dr, City, State 12345 Off Holgate, take
    > a right on 134th, (Aspen Meadows), stop sign take a right, take a
    > left
    > on 133rd and 5th house on the left. Saturday August 11, 2007 10am to
    > 5pm only
    >
    > 2. Lots of Name Brands!! Tons of Clothes for Girls and Boys. Dog
    > House, Animal Kennel, toys, lego table, infant chairs, girl HOPE TO
    > SEE YOU THERE, THANK YOU!!!! This Friday & Saturday!! 8/10 & 8/11
    > 10am
    > - 6pm 2100 SE 118th Ave City, St 12345
    >
    > 3. Lots of Name Brands!! Tons of Clothes for Girls and Boys. Dog
    > House, Animal Kennel, toys, lego table, infant chairs, girl HOPE TO
    > SEE YOU THERE, THANK YOU!!!! Fri & Sat Aug 10 & Aug 11 10 am - 6 pm
    > 2100 SE 120th Ave City, St no zip



    > How can parse out just the time and address of the sale so I can plan
    > my routes and which days to visit which house.



    Dates/times are easy -for the most part, as for addresses, they can be
    tricky, especially if you want to break them up into parts.

    The best thing to do is look at each description for something that
    will tell you, "Hey there's an address on this line". Addresses and
    streets can be complicated, so we won't bother with those. The state
    (well, zip too) is the simplest part, so we'll look for them.

    In my example, i use a text file with the sample addresses
    you posted ("state" switched with "California").

    Using Text::CSV to iterate over the spreadsheet's rows, and
    Text::Sentence to iterate over the description's lines is left as an
    exercise...

    [sshaw@localhost ~]$ cat bs.pl
    use strict;
    use warnings;

    my %DAYS = (Monday=>qr!\bMon(?:\.|(?:day))?\b!i,
    Tuesday=>qr!\bTues(?:\.|(?:day))?\b!i,
    #...
    Friday=>qr!\bFri(?:\.|(?:day))?\b!i,
    Saturday=>qr!\bSat(?:\.|(?:urday))?\b!i);


    my %DATE = (August=>qr!(?:(?:Aug(?:\.|(?:ust))?)|0?8[-/])\s*\d{1,2}!i,
    #...
    );


    my %STATE = (California=>qr!\bCa(?:lifornia)?\b!i,
    #...
    );


    my $addr;
    my (@days,@dates,@times);

    my $day = join "|",values %DAYS;
    my $date = join "|",values %DATE;
    my $state = join "|",values %STATE;



    while(<>) {

    if(/^$/) {
    local $"=" - ";
    print "$addr: @days, @dates @times\n";
    (@days,@dates,@times) = ();
    next;
    }

    # print $_;

    while(/($day)/igo) {
    push @days,$1;
    }

    while(/($date)/goi) {
    push @dates,$1;
    }

    while(/(\d{1,2}\s*[ap]m)/goi) {
    push @times,$1;
    }

    if(/(\d{2,}.+[^$state]\s+$state)/oi) {
    $addr = $1;
    }

    }


    [sshaw@localhost ~]$ perl bs.pl descs
    4785 SE 133rd Dr, City, CA: Saturday, August 11 10am - 5pm
    2100 SE 118th Ave City, California: Friday - Saturday, 8/10 - 8/11
    10am - 6pm
    2100 SE 120th Ave City, Ca: Fri - Sat, Aug 10 - Aug 11 10 am - 6 pm


    Of course, this example will not work if an address spans 2 lines, or
    if there are several times and/or messages relating to them. i.e.
    "Everything must go by 2pm".

    If you want more detailed address parsing, i.e. extracting
    street,address,state,zip into their own fields, check out
    Geo::StreetAddress::US. GEO::StreetAddress::US can't extract the
    address from a paragraph, but once you have (or think you have <:^| )
    an address, you can pass the the value to it for parsing.

    Or you can use its RegExes.

    I'm curious to see other suggestions.
     
    Skye Shaw!@#$, Aug 12, 2007
    #2
    1. Advertising

  3. On Aug 11, 11:10 pm, "Skye Shaw!@#$" <> wrote:
    > On Aug 11, 6:23 pm, Steve <> wrote:> Can't do this in Excel. Can perl do it?
    >
    > More of less
    >
    > > Ok here is my goal.

    >
    > <snip>
    >
    > <reformatted post>
    >
    >
    >
    > > On Thursday my local newspaper post Garage sell ads for the up coming
    > > weekend.
    > > If I open the paper site in Excel I get cells that look like this:
    > > 1. Come see at: 4785 SE 133rd Dr, City, State 12345 Off Holgate, take
    > > a right on 134th, (Aspen Meadows), stop sign take a right, take a
    > > left
    > > on 133rd and 5th house on the left. Saturday August 11, 2007 10am to
    > > 5pm only

    >
    > > 2. Lots of Name Brands!! Tons of Clothes for Girls and Boys. Dog
    > > House, Animal Kennel, toys, lego table, infant chairs, girl HOPE TO
    > > SEE YOU THERE, THANK YOU!!!! This Friday & Saturday!! 8/10 & 8/11
    > > 10am
    > > - 6pm 2100 SE 118th Ave City, St 12345

    >
    > > 3. Lots of Name Brands!! Tons of Clothes for Girls and Boys. Dog
    > > House, Animal Kennel, toys, lego table, infant chairs, girl HOPE TO
    > > SEE YOU THERE, THANK YOU!!!! Fri & Sat Aug 10 & Aug 11 10 am - 6 pm
    > > 2100 SE 120th Ave City, St no zip
    > > How can parse out just the time and address of the sale so I can plan
    > > my routes and which days to visit which house.

    >
    > Dates/times are easy -for the most part, as for addresses, they can be
    > tricky, especially if you want to break them up into parts.
    >
    > The best thing to do is look at each description for something that
    > will tell you, "Hey there's an address on this line". Addresses and
    > streets can be complicated, so we won't bother with those. The state
    > (well, zip too) is the simplest part, so we'll look for them.
    >
    > In my example


    <snip>

    > my %DAYS = (Monday=>qr!\bMon(?:\.|(?:day))?\b!i,
    > Tuesday=>qr!\bTues(?:\.|(?:day))?\b!i,
    > #...
    > Friday=>qr!\bFri(?:\.|(?:day))?\b!i,
    > Saturday=>qr!\bSat(?:\.|(?:urday))?\b!i);
    >


    > my $day = join "|",values %DAYS;


    > while(/($day)/igo) {


    Oops, the "i" modifier is superfluous
     
    Skye Shaw!@#$, Aug 12, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    2,887
    Toby Inkster
    Aug 15, 2004
  2. Replies:
    2
    Views:
    1,811
  3. Replies:
    10
    Views:
    508
  4. Charles Gamble
    Replies:
    3
    Views:
    161
    Roland Hall
    Feb 2, 2005
  5. Rahul
    Replies:
    8
    Views:
    404
    Rahul
    Feb 11, 2009
Loading...

Share This Page