Elegant Solution to a Seemingly Simple Problem?

Discussion in 'Ruby' started by Derek Cannon, Apr 18, 2010.

  1. Derek Cannon

    Derek Cannon Guest

    Hello everyone. It's me: Derek, again! Sorry for writing a novel here,
    but I'd really appreciate some help.

    I'm still working on the same program -- a way to show valid course
    combinations for my school schedule, using an HTML file that contains
    all the courses for the semester.

    I have a rough draft copy of it working, but I'd like to see an example
    of a more elegant coding style than my own.

    Here's a (simplified) example of the data I'm working with:

    <tr>
    <td>Intro to Programming<td> #title
    <td>MW</td> #days
    <td>9:00am-10:30am</td> #time
    <td>Dr. Smith</td> # professor
    </tr>
    <tr>
    <td>Intro to Knitting<td>
    <td>TR</td>
    <td>9:00am-10:30am</td>
    <td>Dr. Mittens</td>
    </tr>

    Earlier, someone on the forum showed me a very elegant way to collect
    this information (I use Nokogiri). It was:

    doc = Nokogiri::HTML(open(url))

    raw_course_list = doc.css("tr").collect { |row|
    row.css("td").collect { |column|
    column.text.strip
    }
    }

    This would give me an array of arrays in the format
    [[courseA,data,data], [courseB, data, data]].

    E.g., in this case it would yield:
    [["Intro to Programming", "MW", "9:00am-10:30am", "Dr. Smith"], ["Intro
    to Knitting", "TR", "9:00am-10:30am", "Dr. Mittens"]]

    This works perfectly, except in 3 main cases.

    *** Problem 1: The <tr> does not contain course information. (It's some
    irrelevant part of the HTML). In this case, I did the following:
    raw_course_data.reject! { |i| i.size != 4 }, would filtered out
    non-courses. Note: no tables without course data had the size of one
    with course data (in the non-simplified version, the size is actually
    much larger).

    So, already I think it's ugly coding! It firsts loads ALL <tr> contents
    into arrays, then rejects them after creation.

    *** Problem 2: In a few cases, some courses do not have specified days
    and times yet. In those cases, the course days reads "TBA" (to be
    announced), and there is no column for time. Thus, the array of such
    courses is 1 less than the normal expected case.

    E.g.:

    <tr>
    <td>Algebra<td>
    <td>TBA</td> # notice there is now 1 <td> for day/time now
    <td>Dr. Calculator</td>
    </tr>

    Thus, I create ANOTHER time that Ruby goes back over the elements of
    raw_course_list again. This time, the code is put right before problem
    1's fix:

    raw_course_list.each { |i|
    if i.size == 3
    i.insert(2, "")
    end
    }

    So again, if an array has a size of 3, I figure it's a valid course,
    just with no time assigned, so I create a blank element between the day
    and professor, just to satisfy the Course class, which these array
    elements of the outer array will ultimately become. E.g. of call:
    Course.new(title, day, time, professor)

    *** Problem 3: Some rows of the HTML are actually a continued
    description of the course in the row above. For example, a course that
    has a lab might look like this:

    <tr>
    <td>Chemistry /w Lab<td>
    <td>TR</td>
    <td>9:00am-10:30am</td>
    <td>Dr. Chemicals</td>
    </tr>
    <tr>
    <td><td> # Empty, since the above row provides the course name
    <td>R</td> # Day of the lab
    <td>11:00am-12:30pm</td> # Time of the lab
    <td>Dr. Chemicals</td> # Lab professor
    </tr>

    The good news is it's the same length as a normal class. So for this, I
    add a bit more code to problem 2's code (the each block), changing the
    each method to .each_with_index:

    raw_course_list.each_with_index { |i, index|
    if i.size == 3
    i.insert(2, "")
    end

    # NEW CODE FOR LABS (still working out the kinks, but hopefully I
    won't need this)
    # lab will always have a size of 4 and a empty first element so:
    if i[0].empty?
    # add all the data from the lab to the previous course:
    raw_course_list[index-1].push(i.each { |element| element })

    # then remove lab from raw_course_list
    raw_course_list.pop(index)

    # index has to go back one to avoid skipping an element (since we
    popped one)
    index -= 1
    end
    }

    ==================================

    So there you have it. Can anyone think of a way where I can improve the
    quality and elegance of this code?
    --
    Posted via http://www.ruby-forum.com/.
     
    Derek Cannon, Apr 18, 2010
    #1
    1. Advertising

  2. [Note: parts of this message were removed to make it a legal post.]

    On 18 April 2010 08:23, Derek Cannon <> wrote:

    > Hello everyone. It's me: Derek, again! Sorry for writing a novel here,
    > but I'd really appreciate some help.
    >
    > I'm still working on the same program -- a way to show valid course
    > combinations for my school schedule, using an HTML file that contains
    > all the courses for the semester.
    >
    > I have a rough draft copy of it working, but I'd like to see an example
    > of a more elegant coding style than my own.
    >
    > Here's a (simplified) example of the data I'm working with:
    >
    > <tr>
    > <td>Intro to Programming<td> #title
    > <td>MW</td> #days
    > <td>9:00am-10:30am</td> #time
    > <td>Dr. Smith</td> # professor
    > </tr>
    > <tr>
    > <td>Intro to Knitting<td>
    > <td>TR</td>
    > <td>9:00am-10:30am</td>
    > <td>Dr. Mittens</td>
    > </tr>
    >
    > Earlier, someone on the forum showed me a very elegant way to collect
    > this information (I use Nokogiri). It was:
    >
    > doc = Nokogiri::HTML(open(url))
    >
    > raw_course_list = doc.css("tr").collect { |row|
    > row.css("td").collect { |column|
    > column.text.strip
    > }
    > }
    >
    > This would give me an array of arrays in the format
    > [[courseA,data,data], [courseB, data, data]].
    >
    > E.g., in this case it would yield:
    > [["Intro to Programming", "MW", "9:00am-10:30am", "Dr. Smith"], ["Intro
    > to Knitting", "TR", "9:00am-10:30am", "Dr. Mittens"]]
    >
    > This works perfectly, except in 3 main cases.
    >
    > *** Problem 1: The <tr> does not contain course information. (It's some
    > irrelevant part of the HTML). In this case, I did the following:
    > raw_course_data.reject! { |i| i.size != 4 }, would filtered out
    > non-courses. Note: no tables without course data had the size of one
    > with course data (in the non-simplified version, the size is actually
    > much larger).
    >
    > So, already I think it's ugly coding! It firsts loads ALL <tr> contents
    > into arrays, then rejects them after creation.
    >
    > *** Problem 2: In a few cases, some courses do not have specified days
    > and times yet. In those cases, the course days reads "TBA" (to be
    > announced), and there is no column for time. Thus, the array of such
    > courses is 1 less than the normal expected case.
    >
    > E.g.:
    >
    > <tr>
    > <td>Algebra<td>
    > <td>TBA</td> # notice there is now 1 <td> for day/time now
    > <td>Dr. Calculator</td>
    > </tr>
    >
    > Thus, I create ANOTHER time that Ruby goes back over the elements of
    > raw_course_list again. This time, the code is put right before problem
    > 1's fix:
    >
    > raw_course_list.each { |i|
    > if i.size == 3
    > i.insert(2, "")
    > end
    > }
    >
    > So again, if an array has a size of 3, I figure it's a valid course,
    > just with no time assigned, so I create a blank element between the day
    > and professor, just to satisfy the Course class, which these array
    > elements of the outer array will ultimately become. E.g. of call:
    > Course.new(title, day, time, professor)
    >
    > *** Problem 3: Some rows of the HTML are actually a continued
    > description of the course in the row above. For example, a course that
    > has a lab might look like this:
    >
    > <tr>
    > <td>Chemistry /w Lab<td>
    > <td>TR</td>
    > <td>9:00am-10:30am</td>
    > <td>Dr. Chemicals</td>
    > </tr>
    > <tr>
    > <td><td> # Empty, since the above row provides the course name
    > <td>R</td> # Day of the lab
    > <td>11:00am-12:30pm</td> # Time of the lab
    > <td>Dr. Chemicals</td> # Lab professor
    > </tr>
    >
    > The good news is it's the same length as a normal class. So for this, I
    > add a bit more code to problem 2's code (the each block), changing the
    > each method to .each_with_index:
    >
    > raw_course_list.each_with_index { |i, index|
    > if i.size == 3
    > i.insert(2, "")
    > end
    >
    > # NEW CODE FOR LABS (still working out the kinks, but hopefully I
    > won't need this)
    > # lab will always have a size of 4 and a empty first element so:
    > if i[0].empty?
    > # add all the data from the lab to the previous course:
    > raw_course_list[index-1].push(i.each { |element| element })
    >
    > # then remove lab from raw_course_list
    > raw_course_list.pop(index)
    >
    > # index has to go back one to avoid skipping an element (since we
    > popped one)
    > index -= 1
    > end
    > }
    >
    > ==================================
    >
    > So there you have it. Can anyone think of a way where I can improve the
    > quality and elegance of this code?
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >

    Do you have any control over the HTML at all? Some semantic HTML classes
    would go a long way to simplifying this.
     
    Elliot Crosby-McCullough, Apr 18, 2010
    #2
    1. Advertising

  3. Derek Cannon

    Derek Cannon Guest

    > Do you have any control over the HTML at all? Some semantic HTML
    > classes would go a long way to simplifying this.


    The HTML comes from a page on my schools website :(
    --
    Posted via http://www.ruby-forum.com/.
     
    Derek Cannon, Apr 18, 2010
    #3
  4. Derek Cannon wrote:
    > Here's a (simplified) example of the data I'm working with:


    Can you post a complete sample page somewhere?

    There are lots of ways to identify more precisely which part of the HTML
    you want, using CSS selectors. Most easily, if the rows are inside
    <table id='courses'> then seomthing like 'table#courses tr' could do it.

    But otherwise, you can select the table based on its location in the
    page relative to other elements (nth, nth-child).
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Apr 18, 2010
    #4
  5. Derek Cannon

    gf Guest

    For your contemplation:


    doc = Nokogiri::HTML(open(url))
    raw_course_list = doc.css("tr").collect { |row|
    t_row = row.css("td").collect { |column| column.text.strip }
    t_row.insert(2, "") if (t_row[1] == "TBA")
    }.reject{ |i| i.size != 4 }


    This isn't guaranteed to work because we're dealing with pieces of the
    HTML you are parsing. Give us a sample with ALL the variations in it
    and we can stop shooting in the dark.
     
    gf, Apr 18, 2010
    #5
  6. Derek Cannon

    Derek Cannon Guest

    Sure, I'll post HTML examples. In this non-simplified version, there are
    20 columns per row which are:

    availability, course_reference_number, subject, course_number, section,
    campus, credit_hours, title, days, time, cap, registered, remaining,
    xl_cap, xl_registered, xl_remaining, professor, date, location,
    attributes


    Here's three specific examples of the HTML that cover all the
    possibilities (normal class, course with TBA day, and labs):


    <!--1. Normal class-->


    <TR>
    <TD class="dddefault"><ABBR title="Not available for
    registration">NR</ABBR></TD>
    <TD class="dddefault"><A
    href="https://ggc.gabest.usg.edu/pls/B400/bwckschd.p_disp_listcrse?term_in=201008&subj_in=ACCT&crse_in=2101&crn_in=80983"
    onmouseover="window.status='Detail'; return true"
    onfocus="window.status='Detail'; return true"
    onmouseout="window.status=''; return true"
    onblur="window.status=''; return true">80983</A></TD>
    <TD class="dddefault">ACCT</TD>
    <TD class="dddefault">2101</TD>
    <TD class="dddefault">01</TD>
    <TD class="dddefault">A</TD>
    <TD class="dddefault">3.000</TD>
    <TD class="dddefault">Intro to Financial Accounting</TD>
    <TD class="dddefault">MW</TD>
    <TD class="dddefault">08:00 am-09:15 am</TD>
    <TD class="dddefault">30</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">30</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault"><ABBR title="To Be Announced">TBA</ABBR></TD>
    <TD class="dddefault">08/23-12/09</TD>
    <TD class="dddefault">A 1880</TD>
    <TD class="dddefault">&nbsp;</TD>
    </TR>
    <TR>


    <!--2. No day/time separation due to class days being "TBA":-->


    <TR>
    <TD class="dddefault"><ABBR title="Closed">C</ABBR></TD>
    <TD class="dddefault"><A
    href="https://ggc.gabest.usg.edu/pls/B400/bwckschd.p_disp_listcrse?term_in=201008&subj_in=BUSA&crse_in=4700&crn_in=81085"
    onmouseover="window.status='Detail'; return true"
    onfocus="window.status='Detail'; return true"
    onmouseout="window.status=''; return true"
    onblur="window.status=''; return true">81085</A></TD>
    <TD class="dddefault">BUSA</TD>
    <TD class="dddefault">4700</TD>
    <TD class="dddefault">01</TD>
    <TD class="dddefault">A</TD>
    <TD class="dddefault">3.000</TD>
    <TD class="dddefault">Selected Topics in Business</TD>
    <TD colspan="2" class="dddefault"><ABBR title="To Be
    Announced">TBA</ABBR></TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault"><ABBR title="To Be Announced">TBA</ABBR></TD>
    <TD class="dddefault">08/23-12/09</TD>
    <TD class="dddefault"><ABBR title="To Be Announced">TBA</ABBR></TD>
    <TD class="dddefault">&nbsp;</TD>
    </TR>


    <!--3. Class with lab. First row is class, second row is lab details -->


    <TR>
    <TD class="dddefault"><ABBR title="Not available for
    registration">NR</ABBR></TD>
    <TD class="dddefault"><A
    href="https://ggc.gabest.usg.edu/pls/B400/bwckschd.p_disp_listcrse?term_in=201008&subj_in=CHEM&crse_in=1151K&crn_in=80073"
    onmouseover="window.status='Detail'; return true"
    onfocus="window.status='Detail'; return true"
    onmouseout="window.status=''; return true"
    onblur="window.status=''; return true">80073</A></TD>
    <TD class="dddefault">CHEM</TD>
    <TD class="dddefault">1151K</TD>
    <TD class="dddefault">01</TD>
    <TD class="dddefault">A</TD>
    <TD class="dddefault">4.000</TD>
    <TD class="dddefault">Survey of Chemistry I w/Lab</TD>
    <TD class="dddefault">MF</TD>
    <TD class="dddefault">11:00 am-12:15 pm</TD>
    <TD class="dddefault">20</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">20</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">0</TD>
    <TD class="dddefault">David Pursell (<ABBR
    title="Primary">P</ABBR>)</TD>
    <TD class="dddefault">08/23-12/09</TD>
    <TD class="dddefault">A 1400</TD>
    <TD class="dddefault">&nbsp;</TD>
    </TR>
    <TR>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">W</TD>
    <TD class="dddefault">11:00 am-01:45 pm</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">&nbsp;</TD>
    <TD class="dddefault">David Pursell (<ABBR
    title="Primary">P</ABBR>)</TD>
    <TD class="dddefault">08/23-12/09</TD>
    <TD class="dddefault">A 1290</TD>
    <TD class="dddefault">&nbsp;</TD>
    </TR>
    --
    Posted via http://www.ruby-forum.com/.
     
    Derek Cannon, Apr 18, 2010
    #6
  7. Derek Cannon

    Derek Cannon Guest

    > There are lots of ways to identify more precisely which part of the HTML
    > you want, using CSS selectors. Most easily, if the rows are inside
    > <table id='courses'> then seomthing like 'table#courses tr' could do it.


    Since my original post, I've been playing around with the code some
    more. I made a new way of getting courses that automatically filters out
    "non-course" rows. The code is:

    table = doc.css("tr").collect { |row|
    row.css(".dddefault").collect { |column|
    column.text.strip
    }
    }

    This way, "non-courses" appear as empty arrays. I still don't know how
    to neatly get rid of the empty arrays... I tried .compact! but that
    doesn't seem to work.

    >doc = Nokogiri::HTML(open(url))
    >raw_course_list = doc.css("tr").collect { |row|
    > t_row = row.css("td").collect { |column| column.text.strip }
    > t_row.insert(2, "") if (t_row[1] == "TBA")
    >}.reject{ |i| i.size != 4 }


    Excellent example, I think this is much better than what I had earlier.
    I guess I could now replace your reject with i.empty?, right?

    PS - I changed raw_course_list to table to make it more readable.
    --
    Posted via http://www.ruby-forum.com/.
     
    Derek Cannon, Apr 18, 2010
    #7

  8. > This way=2C "non-courses" appear as empty arrays. I still don't know how=

    =20
    > to neatly get rid of the empty arrays... I tried .compact! but that=20
    > doesn't seem to work.


    Try #flatten! instead. #compact! just gets rid of nil entries=2C and an emp=
    ty array is not the same as nil.

    - Ehsan
    =20
    _________________________________________________________________
    Hotmail has tools for the New Busy. Search=2C chat and e-mail from your inb=
    ox.
    http://www.windowslive.com/campaign/thenewbusy?ocid=3DPID28326::T:WLMTAGL:O=
    N:WL:en-US:WM_HMP:042010_1=
     
    Ehsanul Hoque, Apr 18, 2010
    #8
  9. Hi --

    On Sun, 18 Apr 2010, Derek Cannon wrote:

    >> There are lots of ways to identify more precisely which part of the HTML
    >> you want, using CSS selectors. Most easily, if the rows are inside
    >> <table id='courses'> then seomthing like 'table#courses tr' could do it.

    >
    > Since my original post, I've been playing around with the code some
    > more. I made a new way of getting courses that automatically filters out
    > "non-course" rows. The code is:
    >
    > table = doc.css("tr").collect { |row|
    > row.css(".dddefault").collect { |column|
    > column.text.strip
    > }
    > }
    >
    > This way, "non-courses" appear as empty arrays. I still don't know how
    > to neatly get rid of the empty arrays... I tried .compact! but that
    > doesn't seem to work.
    >
    >> doc = Nokogiri::HTML(open(url))
    >> raw_course_list = doc.css("tr").collect { |row|
    >> t_row = row.css("td").collect { |column| column.text.strip }
    >> t_row.insert(2, "") if (t_row[1] == "TBA")
    >> }.reject{ |i| i.size != 4 }

    >
    > Excellent example, I think this is much better than what I had earlier.
    > I guess I could now replace your reject with i.empty?, right?
    >
    > PS - I changed raw_course_list to table to make it more readable.


    I think having the condition and the reject be the last things in the
    code are going to make it hard to follow it later. I wouldn't rule out
    doing something a tiny bit more procedural but maybe a little easier
    to parse visually, like this:

    doc = Nokogiri::HTML(open(url))
    table = []
    doc.css("tr").each do |row|
    cells = row.css("td").map {|cell| cell.text.strip }
    next unless cells.size == 4
    next unless cells[1] == "TBA"
    cells.insert(2, "")
    table << cells
    end

    You could also extract some methods, and end up with something like:

    table = doc.css("tr").
    select {|row| valid_row?(row) }.
    map {|row| prepare_row(row) }

    (The above is all untested.)


    David

    --
    David A. Black, Senior Developer, Cyrus Innovation Inc.

    THE Ruby training with Black/Brown/McAnally
    COMPLEAT Coming to Chicago area, June 18-19, 2010!
    RUBYIST http://www.compleatrubyist.com
     
    David A. Black, Apr 18, 2010
    #9
  10. Derek Cannon

    Phrogz Guest

    On Apr 18, 1:23 am, Derek Cannon <> wrote:
    > [...]
    > Earlier, someone on the forum showed me a very elegant way to collect
    > this information (I use Nokogiri). It was:
    >
    > doc = Nokogiri::HTML(open(url))
    >
    > raw_course_list = doc.css("tr").collect { |row|
    >   row.css("td").collect { |column|
    >     column.text.strip
    >   }
    >
    > }
    > [...]
    > This works perfectly, except in 3 main cases.
    >
    > *** Problem 1: The <tr> does not contain course information. (It's some
    > irrelevant part of the HTML). In this case, I did the following:
    > raw_course_data.reject! { |i| i.size != 4 }, would filtered out
    > non-courses. Note: no tables without course data had the size of one
    > with course data (in the non-simplified version, the size is actually
    > much larger).
    >
    > So, already I think it's ugly coding! It firsts loads ALL <tr> contents
    > into arrays, then rejects them after creation.
    > [...]


    Generalized, you have an array of values and you want to map a subset
    of them to new array. There are (at least) four patterns you can use
    to handle this sort of situation:

    1) Map the unwanted elements to a 'broken' value and then reject the
    broken values later. (What you are doing now.) This can be hard if you
    don't have a way of creating a broken value. For example, you might be
    mapping all values directly to an object, but you don't have enough
    information for the object constructor and no way of making up clearly
    spurious values. Further, it's inefficient as you do the work and use
    the memory of creating the object only to throw it out later.

    2) Map the unwanted elements to nil and compact the array afterwards.
    In your case, you'd need to look at the TDs in your row and decide if
    you wanted to map the row to the mapping of them or nil. This is
    convenient in terms of one-liners, but still slightly inefficient
    because you're creating an intermediary array packed with nils that
    you don't want. (You should be clear, though, that computational
    inefficiency is not always more important than programmer convenience
    of code clarity.)

    3) Instead of using map (or the same effect under the longer name
    'collect', as Robert apparently likes) to create a new array from your
    original, explicitly create the new array and push values only as
    valid. This is basically the same as above, but without the nil values
    and the later compact. For example:
    raw_course_list = []
    doc.css("tr").each { |row|
    tds = row.css("td")
    if tds.have_the_values_I_want
    raw_course_list << tds.map{ |col| ... }
    end
    }

    4) Use map (collect) on the array as in #1 or #2, but before that do a
    pass through your source array and sanitize it. Sanitization might be
    mapping values to nil and then compacting (thus very similar to #2),
    or fixing values (as in your TBA or continued description case). This
    feels cleaner, but note that this has you doing one (or two, in the
    case of map+compact) passes on your data before you get around to
    mapping it.


    Here's (very roughly) what I might do given what you wrote:
    # Assuming you're using Ruby 1.9
    course_info = []
    trs = doc.css('tr')
    trs.each.with_index{ |row,i|
    tds = row.css('td')
    title = ...
    prof = ...
    days = ...
    times = ...
    desc = ...
    next_row = trs[i+1]
    if next_row && next_row.is_a_continuation?
    # Add content from next_row to description
    # If needed, invalidate next_row so it will be skipped
    elsif title && prof && days # If you have all the information you
    need
    course_info << Course.new( title, prof, days )
    end
    }

    Regardless of the approach you use, remember that even though you're
    annoyed that you are 'processing' (in one form or another) invalid
    entries, you have to touch every row to find out if you like it or
    not. It's up to you for how you detect which are invalid and handle
    them.
     
    Phrogz, Apr 18, 2010
    #10
  11. Derek Cannon

    Derek Cannon Guest

    >doc = Nokogiri::HTML(open(url))
    > table = []
    > doc.css("tr").each do |row|
    > cells = row.css("td").map {|cell| cell.text.strip }
    > next unless cells.size == 4
    > next unless cells[1] == "TBA"
    > cells.insert(2, "")
    > table << cells
    > end


    This is interesting. So nil is returns for elements that activate the
    unless statement?


    > # Assuming you're using Ruby 1.9
    > course_info = []
    > trs = doc.css('tr')
    > trs.each.with_index{ |row,i|
    > tds = row.css('td')
    > title = ...
    > prof = ...
    > days = ...
    > times = ...
    > desc = ...
    > next_row = trs[i+1]
    > if next_row && next_row.is_a_continuation?
    > # Add content from next_row to description
    > # If needed, invalidate next_row so it will be skipped
    > elsif title && prof && days # If you have all the information you
    > need
    > course_info << Course.new( title, prof, days )
    > end
    > }


    I like this code a lot, however, is it considered less efficient to look
    at the next row to see if it is a lab for the previous row rather than
    checking each row to see if it's a lab, and if it is, adding it to the
    last row instead? That way, every row won't need to check with the next
    row.
    --
    Posted via http://www.ruby-forum.com/.
     
    Derek Cannon, Apr 19, 2010
    #11
  12. Derek Cannon

    Phrogz Guest

    On Apr 18, 9:15 pm, Derek Cannon <> wrote:
    > >doc = Nokogiri::HTML(open(url))
    > >   table = []
    > >   doc.css("tr").each do |row|
    > >     cells = row.css("td").map {|cell| cell.text.strip }
    > >     next unless cells.size == 4
    > >     next unless cells[1] == "TBA"
    > >     cells.insert(2, "")
    > >     table << cells
    > >   end

    >
    > This is interesting. So nil is returns for elements that activate the
    > unless statement?


    No. Next inside a block is like an early return. The ruby code:
    do_this unless something
    is the same as:
    unless something
    do_this
    end
    is the same as:
    if !something
    do_this
    end

    So what's happening in the above is that you're creating an array
    (table), and then looping through the list of rows returned from
    Nokogiri. For each row, if certain criteria are met (there aren
    exactly four cells, or the second cell is "TBA") you immediately stop
    processing that row and move on to the next. ("Next!" shouts the
    government worker, dismissing you and moving on.) If you didn't bail
    out early, however, you inject an extra entry into the array of
    strings and then shove that whole array as a new entry onto the end of
    the table array.

    Because you're using the pattern of creating an array and
    conditionally populating during a traversal, there are no embedded
    nils to clean up later.

    For comparison, here's similar code that WOULD leave you with a table
    with some nil entries (that you'd probably want to #compact
    afterwards):

    table = doc.css("tr").map do |row|
    cells = row.css("td").map {|cell| cell.text.strip }
    if cells.size==4 && cells[1]=="TBA"
    cells.insert(2, "")
    cells
    end
    end

    In the code immediately above, the value of an if statement that
    matches is the value of the last expression, while the value of an if
    statement that is not matches is nil.

    Here's a simpler example showing the difference between map and each-
    with-injecting:

    irb(main):001:0> digits = (1..9).to_a
    => [1, 2, 3, 4, 5, 6, 7, 8, 9]

    irb(main):002:0> odds = []
    => []

    irb(main):003:0> digits.each{ |n| if n%2==1 then odds<<n end }
    => [1, 2, 3, 4, 5, 6, 7, 8, 9]

    irb(main):004:0> odds
    => [1, 3, 5, 7, 9]

    irb(main):005:0> evens = digits.map{ |n| if n%2==0 then n end }
    => [nil, 2, nil, 4, nil, 6, nil, 8, nil]

    irb(main):006:0> evens.compact
    => [2, 4, 6, 8]
     
    Phrogz, Apr 20, 2010
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tom Johnson
    Replies:
    4
    Views:
    392
    red floyd
    Aug 15, 2003
  2. python_only
    Replies:
    2
    Views:
    327
    bruno modulix
    Apr 26, 2005
  3. Guilherme Polo

    Re: seemingly simple list indexing problem

    Guilherme Polo, Jul 28, 2008, in forum: Python
    Replies:
    3
    Views:
    282
    John Krukoff
    Jul 30, 2008
  4. John Krukoff
    Replies:
    6
    Views:
    295
    John Krukoff
    Jul 30, 2008
  5. ThePacific
    Replies:
    0
    Views:
    376
    ThePacific
    Sep 26, 2012
Loading...

Share This Page