A (minor) coding challenge

Discussion in 'Ruby' started by rubyhacker, Oct 12, 2005.

  1. rubyhacker

    rubyhacker Guest

    This is one of those things that "anyone can do" and it doesn't
    take that long. But it's always fun/educational to see how
    different people would do it.

    Given: A text is in two languages (say, English and French) --
    assume separate files or whatever is convenient. They're
    formatted properly, so that paragraphs correspond to each other
    predictably. We define a "paragraph" as simply a group of non-blank
    lines followed by one or more blank lines or end of file. (Thus
    even a simple title or heading would count.) Assume a page length
    N (lines per page).

    Reformat both texts such that:

    1. Corresponding paragraphs start on corresponding lines of the
    page.

    2. If either paragraph is shorter than the other, it will be padded
    with blank lines so that the next paragraphs coincide.

    3. Preserve any "extra" blank lines that were already there
    between paragraphs.

    4. Neither text will allow a page break in the middle of a paragraph.
    If it won't fit in either case, do a page break for both.

    5. If you want to simplify output, represent a page break as "----"
    or the equivalent.


    I'll be playing at this in my spare minutes.

    Let the games begin.


    Hal
     
    rubyhacker, Oct 12, 2005
    #1
    1. Advertisements


  2. Lines_per_page = 60

    def grab( i )
    IO.read( ARGV ).split( /^((?:[ \t]*\n)*[ \t]*\n)/ ).map{ |s|
    s.scan( /.*?\n|.+$/ ) }
    end

    texts = grab(0).zip(grab(1)).map{ |x|
    m = [ x.first.size, x.last.size ].max
    2.times { |i| x += Array.new( m - x.size ) { "" } }
    x
    }

    class Array
    def page_break
    each { |handle| handle.puts "----" }
    end
    end

    handles = []
    2.times {|i| handles << File.open( "out-junk#{ i }", "w" ) }
    count = 0

    texts.each_with_index { |x,n|
    psize = [ x.first.size, x.last.size ].max
    if n % 2 == 0
    if psize > Lines_per_page - count
    handles.page_break
    count = 0
    end
    2.times { |i| handles.puts x }
    count += psize
    else
    psize.times { |i|
    if Lines_per_page == count
    handles.page_break
    count = 0
    end
    2.times { |j|
    handles[j].puts x[j]
    }
    count += 1
    }
    end
    }

    handles.each { |h| h.close }
     
    William James, Oct 13, 2005
    #2
    1. Advertisements

  3. rubyhacker

    Hal Fulton Guest

    [snip solution]

    That does indeed work. FWIW, here is my
    unfinished one below.


    Hal



    lines1 = File.readlines("f1")
    lines2 = File.readlines("f2")

    def show(l1,l2)
    l1.each_with_index do |x,i|
    printf "%-38s | %-38s\n", x, l2
    end
    end

    N = 15

    def canonize(lines)
    arr = [0]
    lines.each do |line|
    if line=="\n"
    if arr[-1].is_a?(Fixnum)
    arr[-1]+=1
    else
    arr << 1
    end
    else # it's part of a paragraph
    if arr[-1].is_a?(Array)
    arr[-1]<<line
    else
    arr << [line]
    end
    end
    end
    arr
    end

    def fixup(a1,a2)
    r1 = ""
    r2 = ""
    a1.each_with_index do |a,i|
    b = a2
    raise "mismatch" if a.class != b.class
    case a
    when Fixnum
    blanks = [a,b].max
    blanks.times { r1 << "\n"; r2 << "\n" }
    when Array
    #p [a.size,b.size]
    psize = [a.size,b.size].max
    0.upto(psize-1) do |i|
    r1 << (a || "\n")
    r2 << (b || "\n")
    end
    end
    end
    [r1.split("\n"),r2.split("\n")]
    end

    class Fixnum
    def size
    self # duh
    end
    end


    # show(lines1,lines2)

    p1 = canonize(lines1)
    p2 = canonize(lines2)

    r1,r2 = fixup(p1,p2)

    # paginate(p1,p2)
     
    Hal Fulton, Oct 13, 2005
    #3
  4. Hi --

    Quite the brute force approach, and probably full of holes, but anyway:

    PARAGRAPH_RE = /.*?\n(?:\n+|\z)/m

    def parallelize(a,b)
    short,long = [a.dup,b.dup].sort_by {|text| text.to_a.size }
    short << "\n" until short.to_a.size == long.to_a.size
    return short,long
    end

    def pagify(text,n)
    paragraphs = text.scan(PARAGRAPH_RE)
    line = 1
    paragraphs.each do |para|
    if line + para.size > n
    para.replace("----\n#{para}")
    line = 1
    end
    end
    paragraphs.join
    end

    # Sample usage

    english = File.read....
    french = File.read....

    eng_final = ""
    fr_final = ""

    eng.scan(PARAGRAPH_RE).zip(fr.scan(PARAGRAPH_RE)).each do |e,f|
    ep,fp = parallelize(e,f)
    eng_final << ep
    fr_final << fp
    end

    puts pagify(eng_final,60), pagify(fr_final,60)


    David
     
    David A. Black, Oct 13, 2005
    #4
  5. rubyhacker

    Jacob Fugal Guest

    Question regarding the combination of rules 3 and 4:

    Assume paragraphs A and B, where B follows A directly, with some
    number of extra newlines. After reformating, a page break must be
    inserted between A and B. Should the extra newlines be 1) before the
    page break, 2) after the page break or 3) consumed in the page break?
    In the case of 1), what if all the newlines don't fit? Should they
    span the page break?

    Just want to make sure I've got the requirements right before making
    an attempt. :)

    Jacob Fugal
     
    Jacob Fugal, Oct 13, 2005
    #5
  6. [ deleted lines ]
    Added a few comments and made improvements.

    Lines_per_page = 60

    # Read and parse a file.
    def grab( i )
    IO.read( ARGV ).split( /^((?:[ \t]*\n)*[ \t]*\n)/ ).map{ |s|
    s.scan( /.*?\n|.+$/ ) }
    end

    texts = grab(0).zip(grab(1)).inject([]){ |arr,pair|
    m = [ pair.first.size, pair.last.size ].max
    if pair.first.first =~ /\S/
    # Equalize lengths of parallel paragraphs.
    2.times { |i| pair += Array.new( m - pair.size ) { "" } }
    arr << pair
    else
    # Equalize runs of blank lines and make them breakable.
    m.times { arr << [ [""], [""] ] }
    end
    arr
    }

    class Array
    def page_break
    each { |handle| handle.puts "----" }
    end
    end

    handles = []
    2.times {|i| handles << File.open( "out-junk#{ i }", "w" ) }
    count = 0

    texts.each { |x|
    psize = x.first.size
    # Print paragraph or blank line.
    if psize > Lines_per_page - count
    handles.page_break
    count = 0
    end
    2.times { |i| handles.puts x }
    count += psize
    }

    handles.each { |h| h.close }
     
    William James, Oct 13, 2005
    #6
  7. rubyhacker

    rubyhacker Guest

    I knew somebody would bring that up. :)

    My gut feeling is that a page break can take the place
    of an arbitrary number of newlines. So I guess that means
    they are "consumed" in the page break. Additionally it
    seems "wrong" to start a page with blank lines.

    The only reason for preserving the extra blank lines is
    in case the text happened to use them significantly, e.g.,
    to separate sections or before/after an inset quotation.

    I also haven't addressed the issue of paragraphs that are
    longer than the page. Fortunately, most/all of the text won't
    be Faulkner. ;)

    Now to learn a bit of PDF::Writer... at/after the conf, of course.


    Hal
     
    rubyhacker, Oct 13, 2005
    #7
  8. Hi --

    Oh, so NOW you tell us :) This reminds me of the eating whitespace
    issues in scanf.... :)
    Hmmm... in that case, what's the reason for not normalizing to one
    blank line for every longest-of-the-two paragraphs? In other words,
    given:

    para1
    <blank>
    <blank>

    and

    para1
    <blank>

    why pad the second text with another blank line?


    David
     
    David A. Black, Oct 13, 2005
    #8

  9. Lines_per_page = 60

    # Read and parse a file.
    def grab( i )
    IO.read( ARGV ).split( /^((?:[ \t]*\n)*[ \t]*\n)/ ).map{ |s|
    s.scan( /.*?\n|.+$/ ) }
    end

    texts = grab(0).zip(grab(1)).inject([]){ |arr,pair|
    m = [ pair.first.size, pair.last.size ].max
    if pair.first.first =~ /\S/
    # Equalize lengths of parallel paragraphs.
    2.times { |i| pair += Array.new( m - pair.size ) { "" } }
    arr << pair
    else
    # Equalize runs of blank lines and make them breakable.
    m.times { arr << [ [""], [""] ] }
    end
    arr
    }

    class Array
    def page_break
    each { |handle| handle.puts "----" }
    end
    end

    handles = []
    2.times {|i| handles << File.open( "out-junk#{ i }", "w" ) }
    count = 0

    texts.each { |x|
    psize = x.first.size
    # Print paragraph or blank line.
    if psize > Lines_per_page - count
    handles.page_break
    count = 0
    end
    # Don't print blank lines at top of page.
    if count > 0 or x.first.first =~ /\S/
    2.times { |i| handles.puts x }
    count += psize
    end
    }

    handles.each { |h| h.close }
     
    William James, Oct 13, 2005
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.