Can I get a little help with my program? (string searching and regex)

Discussion in 'Ruby' started by michael.hincke@gmail.com, Jan 8, 2009.

  1. Guest

    So here's my issue, I'm trying to figure out a way that's not insanely
    round about to accomplish the following.

    I am ripping book information off of a website. I was able to do this
    quite easy, but i'm having problems when the site returns more than
    one book. I need a way to say:

    for each regex on the page
    store info into next unused excel line

    (I will be doing separate searches for each piece of info (author,
    isbn, etc) because of the way the html is setup)
    *note, I am using WATIR but the issues I'm having I believe are core
    ruby issues.

    <span id="rptCourses_ctl00_rptItems_ctl00_lblItemTxtTitle" style="font-
    weight: bold;">Book title 1</span>
    <span id="rptCourses_ctl00_rptItems_ctl01_lblItemTxtTitle" style="font-
    weight: bold;">Another book title</span>

    notice the slight difference in the second ctl0# depending on the
    number of books on the page the second number just itterates, I have
    yet to see a 10+ book return, but I would imagine the leading 0 would
    itterate in that instance but im not positive.

    Then the corosponding author is:

    <span id="rptCourses_ctl00_rptItems_ctl00_lblItemTxtAuthor">author 1</
    span>
    <span id="rptCourses_ctl00_rptItems_ctl01_lblItemTxtAuthor">author 2</
    span>

    with the ctl0# matching the titles.

    HOWEVER, when I am done pulling info from the page and go to the next
    page the first book is reset back to ctl00.

    This is what I have been using, but it never tests the regex a second
    time around so I never get more than one book data per search

    #do some search stuff based on an excel list of 4 digit numbers.
    Website will return 0-many books. (currently the script crashes if 0
    books are returned)
    while contLoop do colVal = worksheet.Cells(row, 'a').Value
    if (colVal) then
    browser.goto("http://www.website.com/searchterm=" + colVal)
    for i in 1...browser.spans.length
    if (browser.span:)id, /
    rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text) then
    var = browser.span:)id, /
    rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text
    worksheet.Cells(row, 'b').value = var
    end
    if (browser.span:)id, /rptCourses_ctl00_rptItems_ctl\d
    \d_lblItemTxtAuthor/).text) then
    var = browser.span:)id, /
    rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtAuthor/).text
    worksheet.Cells(row, 'c').value = var
    end
    end
    else
    contLoop = false
    end
    row += 1
    sleep 1
    end

    I'm worried that if it doesnt find an author or something for a book
    the list will get out of sync. The other way I think it could be done
    is to make the number that itterates in the regex a variable and go
    through that but this might cause issues on subseqent pages.

    That is the main problem. The other problem I have to tackle is making
    a cross reference list for each book found (this is done on a seperate
    sheet) ie

    Searchterm | Book ID (just a simple 1 through however many books
    created when the book is stored into the spreadsheet
    0001 | 1
    0001 | 2
    0001 | 3
    0002 | 4
    0003 | 5
    0004 | 1
    This would denote that 3 books were found when searching for 0001 and
    those are referenced by bookID (1,2,3) and one book each for 0002 and
    0003. BookID 1 comes up when searching for both 0001 and 0004 so I
    also need to find a way to make sure that another BookID is not made
    for the same book when 0004 comes around.

    I believe this is easiest done when storing the book but havent tried
    to tackle that yet.

    To sum up my problems:

    1) getting infro from more than one book when searching
    2) crashing when no books are found
    3) creating the reference list
    4) not double storing in reference list


    Any insite or sample code you can provide would be awesome. I don't
    perticularly want to code this, find out it doesnt work, and have to
    recode it 15 times.

    Mike
     
    , Jan 8, 2009
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jjcp
    Replies:
    2
    Views:
    415
  2. ThaDoctor
    Replies:
    3
    Views:
    411
    Alan Woodland
    Sep 28, 2007
  3. Replies:
    3
    Views:
    823
    Reedick, Andrew
    Jul 1, 2008
  4. strus_82
    Replies:
    5
    Views:
    370
    Roedy Green
    Jan 11, 2009
  5. Ruby Newbee

    regex =~ string or string =~ regex?

    Ruby Newbee, Jan 4, 2010, in forum: Ruby
    Replies:
    3
    Views:
    148
    Kirk Haines
    Jan 4, 2010
Loading...

Share This Page