sorting text in a file

Discussion in 'Ruby' started by Adam Akhtar, Mar 26, 2008.

  1. Adam Akhtar

    Adam Akhtar Guest

    Hi Ive been hacking away at this all morning and getting nowhere fast.
    Im relatively new to ruby and im not so hot at regex.

    Im trying to grab text data from a website that shows events and then
    putting each event into its own class. I figured out how to get the
    screen scraped stuff into a clean state. Its just processing it into my
    class htat im having problems.

    Here is a few events in their natural format

    ---start of file----
    Toto and Boz Scaggs

    Seminal American rock band with the talented blues-rock musician. Mar
    21, 7pm, ¥13,000. JCB Hall, Suidobashi. Tel: Udo 03-3402-5999.



    Kreva

    Hip-hop track maker. Mar 21, 7pm, ¥5,000. Akasaka Blitz.

    Tel: Disk Garage 03-5436-9600.



    Blood Red Shoes

    Rock duo from the UK. Mar 21, 7pm, ¥5,000. Shibuya Club Quattro.
    Tel: Creativeman 03-3462-6969.



    etcetcetc
    ---end-----

    First i grab the file into a string. As all the concerts are seperated
    by 4 newlines I use

    concertevents = filetext.split(/\n\n\n\n/)

    to get an array of events.

    Id then like to process these further by keeping the group name seperate
    from the rest of the other details. So I thought I'd do

    artist = conevt.slice(/[^\n]*/) #get artist info

    which assumes the group name will only be on one line. Fine for this
    prototype.

    The details are a bit trickier as some spill onto a second line (but
    seperated by a blank line). The second event is so. I tried

    description = conevt.slice(/.*\n\n(.*\n\n.*)/,1) #get desc

    Although my RegexCoach programm says it works with the first event, when
    i run the programme it seems slice returns nil to description. It
    definately works for the second event which takes up 3 lines.

    So first question is how should I alter the above regex to make it work
    for those cases above - any hints tips or if you feel like it answers.
    At this stage im up for easier longer ways rather than the shorter more
    cryptic ones.

    Second am i going about this the write way. Should I have just avoided
    regex and simply read the file line by line using if structures to
    figure out which lines are with which event???

    Does anyone know of any good resources e.g. tutorials on this subject
    i.e. screen scraping, cleaning the grabbed text and then processing it
    into your own classes.


    wow its a long post....ill leave it at that.
    --
    Posted via http://www.ruby-forum.com/.
     
    Adam Akhtar, Mar 26, 2008
    #1
    1. Advertising

  2. Adam Akhtar

    7stud -- Guest

    Adam Akhtar wrote:
    > The details are a bit trickier as some spill onto a second line (but
    > seperated by a blank line).


    Then you should have posted an example file with all the possibilities.

    > Second am i going about this the write way. S
    >


    Probably not.

    > Should I have just avoided
    > regex and simply read the file line by line using if structures to
    > figure out which lines are with which event???
    >


    That is one way. On the website, the information is probably contained
    in different html tags. So scraping the website, then joining all the
    data together, then trying to separate the data is not a good plan.
    You should be able to pick the pieces from the website directly.
    However, you have to know html is written and how it is structured.
    Ruby has several gems, e.g. Hpricot, that make it easy to pick out
    pieces of information on a website, but you sort of have to know how
    html in order to pick out the data you want.

    If that sounds too confusing, then just deal with the text file you
    have, and YES you should avoid regex's whenever possible. So reading
    the file line by line would be much easier.


    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Mar 26, 2008
    #2
    1. Advertising

  3. Adam Akhtar

    7stud -- Guest

    7stud -- wrote:
    > Adam Akhtar wrote:
    >> The details are a bit trickier as some spill onto a second line (but
    >> seperated by a blank line).

    >
    > Then you should have posted an example file with all the possibilities.
    >
    >> Second am i going about this the write way. S
    >>

    >
    > Probably not.
    >
    >> Should I have just avoided
    >> regex and simply read the file line by line using if structures to
    >> figure out which lines are with which event???
    >>

    >
    > That is one way. On the website, the information is probably contained
    > in different html tags. So scraping the website, then joining all the
    > data together, then trying to separate the data is not a good plan.
    > You should be able to pick the pieces from the website directly.
    > However, you have to know html is written and how it is structured.
    > Ruby has several gems, e.g. Hpricot, that make it easy to pick out
    > pieces of information on a website, but you sort of have to know how
    > html in order to pick out the data you want.
    >
    > If that sounds too confusing, then just deal with the text file you
    > have, and YES you should avoid regex's whenever possible. So reading
    > the file line by line would be much easier.


    Ack! Let's try that again:

    That is one way. On the website, the data is probably contained in
    different html tags. So scraping the website, then joining all the data
    together, then trying to separate the data back out again is not a very
    good plan. You should be able to pick out the pieces of the data you
    want directly from the html. However, you have to know how html is
    written and how html is structured. Ruby has several gems, e.g.
    Hpricot, that make it easy to pick out pieces of information from a page
    of html.

    If that sounds too confusing, then just deal with the text file you have
    already, and YES you should avoid regex's whenever possible. Reading
    the file line by line would be better and probably easier.
    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Mar 26, 2008
    #3
  4. Adam Akhtar

    Zoltan Dezso Guest

    Adam Akhtar wrote:
    > Hi Ive been hacking away at this all morning and getting nowhere fast.
    > Im relatively new to ruby and im not so hot at regex.


    Hi,

    How about something like this quick script:
    don't forget the /m modifier for multiline matching mode.
    (it assumes that there is no newline in the artist name part though)

    File.open('events.txt', 'r') {|f|
    contents = f.read()
    contents.split(/\n\n\n\n/).each {|conevt|
    if (conevt =~ /([^\n]*)\n\n(.*)/im)
    artist = $1
    description = $2
    print "ARTIST: #{artist}\nDESC: #{description}\n\n"
    end
    }
    }

    > Second am i going about this the write way. Should I have just avoided
    > regex and simply read the file line by line using if structures to
    > figure out which lines are with which event???


    From what I can see, I guess this is the format you have to deal with...
    in this case, I believe regexp are the way to go and you will only hurt
    yourself in the long term with switch-case spaghetti :)

    In case, you can get your hands on other formats, or if you are in
    charge of creating the data in the first place, I wouldn't recommend
    using plain text in the first place (yaml, xml, ini, whichever you like
    best), but I think that is not an option for you.

    Zaki
    --
    Posted via http://www.ruby-forum.com/.
     
    Zoltan Dezso, Mar 26, 2008
    #4
  5. Adam Akhtar

    Adam Akhtar Guest

    Thanks very much for your replies 7stud and Zaki, I am tinkering with
    Hpricot now. Ill see how working with the html tags in place will work.

    --
    Posted via http://www.ruby-forum.com/.
     
    Adam Akhtar, Mar 27, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JJ
    Replies:
    13
    Views:
    527
  2. Replies:
    2
    Views:
    1,465
    James Kanze
    Jul 6, 2010
  3. Jason
    Replies:
    0
    Views:
    398
    Jason
    Oct 4, 2006
  4. Tom Kirchner

    sorting by multiple criterias (sub-sorting)

    Tom Kirchner, Oct 11, 2003, in forum: Perl Misc
    Replies:
    3
    Views:
    485
    Michael Budash
    Oct 11, 2003
  5. Íéêüëáïò Êïýñáò

    Sorting a set works, sorting a dictionary fails ?

    Íéêüëáïò Êïýñáò, Jun 10, 2013, in forum: Python
    Replies:
    12
    Views:
    168
    Ulrich Eckhardt
    Jun 10, 2013
Loading...

Share This Page