A (minor) coding challenge

R

rubyhacker

This is one of those things that "anyone can do" and it doesn't
take that long. But it's always fun/educational to see how
different people would do it.

Given: A text is in two languages (say, English and French) --
assume separate files or whatever is convenient. They're
formatted properly, so that paragraphs correspond to each other
predictably. We define a "paragraph" as simply a group of non-blank
lines followed by one or more blank lines or end of file. (Thus
even a simple title or heading would count.) Assume a page length
N (lines per page).

Reformat both texts such that:

1. Corresponding paragraphs start on corresponding lines of the
page.

2. If either paragraph is shorter than the other, it will be padded
with blank lines so that the next paragraphs coincide.

3. Preserve any "extra" blank lines that were already there
between paragraphs.

4. Neither text will allow a page break in the middle of a paragraph.
If it won't fit in either case, do a page break for both.

5. If you want to simplify output, represent a page break as "----"
or the equivalent.


I'll be playing at this in my spare minutes.

Let the games begin.


Hal
 
W

William James

This is one of those things that "anyone can do" and it doesn't
take that long. But it's always fun/educational to see how
different people would do it.

Given: A text is in two languages (say, English and French) --
assume separate files or whatever is convenient. They're
formatted properly, so that paragraphs correspond to each other
predictably. We define a "paragraph" as simply a group of non-blank
lines followed by one or more blank lines or end of file. (Thus
even a simple title or heading would count.) Assume a page length
N (lines per page).

Reformat both texts such that:

1. Corresponding paragraphs start on corresponding lines of the
page.

2. If either paragraph is shorter than the other, it will be padded
with blank lines so that the next paragraphs coincide.

3. Preserve any "extra" blank lines that were already there
between paragraphs.

4. Neither text will allow a page break in the middle of a paragraph.
If it won't fit in either case, do a page break for both.

5. If you want to simplify output, represent a page break as "----"
or the equivalent.


I'll be playing at this in my spare minutes.

Let the games begin.


Hal


Lines_per_page = 60

def grab( i )
IO.read( ARGV ).split( /^((?:[ \t]*\n)*[ \t]*\n)/ ).map{ |s|
s.scan( /.*?\n|.+$/ ) }
end

texts = grab(0).zip(grab(1)).map{ |x|
m = [ x.first.size, x.last.size ].max
2.times { |i| x += Array.new( m - x.size ) { "" } }
x
}

class Array
def page_break
each { |handle| handle.puts "----" }
end
end

handles = []
2.times {|i| handles << File.open( "out-junk#{ i }", "w" ) }
count = 0

texts.each_with_index { |x,n|
psize = [ x.first.size, x.last.size ].max
if n % 2 == 0
if psize > Lines_per_page - count
handles.page_break
count = 0
end
2.times { |i| handles.puts x }
count += psize
else
psize.times { |i|
if Lines_per_page == count
handles.page_break
count = 0
end
2.times { |j|
handles[j].puts x[j]
}
count += 1
}
end
}

handles.each { |h| h.close }
 
H

Hal Fulton

[snip solution]

That does indeed work. FWIW, here is my
unfinished one below.


Hal



lines1 = File.readlines("f1")
lines2 = File.readlines("f2")

def show(l1,l2)
l1.each_with_index do |x,i|
printf "%-38s | %-38s\n", x, l2
end
end

N = 15

def canonize(lines)
arr = [0]
lines.each do |line|
if line=="\n"
if arr[-1].is_a?(Fixnum)
arr[-1]+=1
else
arr << 1
end
else # it's part of a paragraph
if arr[-1].is_a?(Array)
arr[-1]<<line
else
arr << [line]
end
end
end
arr
end

def fixup(a1,a2)
r1 = ""
r2 = ""
a1.each_with_index do |a,i|
b = a2
raise "mismatch" if a.class != b.class
case a
when Fixnum
blanks = [a,b].max
blanks.times { r1 << "\n"; r2 << "\n" }
when Array
#p [a.size,b.size]
psize = [a.size,b.size].max
0.upto(psize-1) do |i|
r1 << (a || "\n")
r2 << (b || "\n")
end
end
end
[r1.split("\n"),r2.split("\n")]
end

class Fixnum
def size
self # duh
end
end


# show(lines1,lines2)

p1 = canonize(lines1)
p2 = canonize(lines2)

r1,r2 = fixup(p1,p2)

# paginate(p1,p2)
 
D

David A. Black

Hi --

This is one of those things that "anyone can do" and it doesn't
take that long. But it's always fun/educational to see how
different people would do it.

Given: A text is in two languages (say, English and French) --
assume separate files or whatever is convenient. They're
formatted properly, so that paragraphs correspond to each other
predictably. We define a "paragraph" as simply a group of non-blank
lines followed by one or more blank lines or end of file. (Thus
even a simple title or heading would count.) Assume a page length
N (lines per page).

Reformat both texts such that:

1. Corresponding paragraphs start on corresponding lines of the
page.

2. If either paragraph is shorter than the other, it will be padded
with blank lines so that the next paragraphs coincide.

3. Preserve any "extra" blank lines that were already there
between paragraphs.

4. Neither text will allow a page break in the middle of a paragraph.
If it won't fit in either case, do a page break for both.

5. If you want to simplify output, represent a page break as "----"
or the equivalent.


I'll be playing at this in my spare minutes.

Let the games begin.

Quite the brute force approach, and probably full of holes, but anyway:

PARAGRAPH_RE = /.*?\n(?:\n+|\z)/m

def parallelize(a,b)
short,long = [a.dup,b.dup].sort_by {|text| text.to_a.size }
short << "\n" until short.to_a.size == long.to_a.size
return short,long
end

def pagify(text,n)
paragraphs = text.scan(PARAGRAPH_RE)
line = 1
paragraphs.each do |para|
if line + para.size > n
para.replace("----\n#{para}")
line = 1
end
end
paragraphs.join
end

# Sample usage

english = File.read....
french = File.read....

eng_final = ""
fr_final = ""

eng.scan(PARAGRAPH_RE).zip(fr.scan(PARAGRAPH_RE)).each do |e,f|
ep,fp = parallelize(e,f)
eng_final << ep
fr_final << fp
end

puts pagify(eng_final,60), pagify(fr_final,60)


David
 
J

Jacob Fugal

Reformat both texts such that:

3. Preserve any "extra" blank lines that were already there
between paragraphs.

4. Neither text will allow a page break in the middle of a paragraph.
If it won't fit in either case, do a page break for both.

Question regarding the combination of rules 3 and 4:

Assume paragraphs A and B, where B follows A directly, with some
number of extra newlines. After reformating, a page break must be
inserted between A and B. Should the extra newlines be 1) before the
page break, 2) after the page break or 3) consumed in the page break?
In the case of 1), what if all the newlines don't fit? Should they
span the page break?

Just want to make sure I've got the requirements right before making
an attempt. :)

Jacob Fugal
 
W

William James

William said:
Lines_per_page = 60

def grab( i )

[ deleted lines ]
handles.each { |h| h.close }

Added a few comments and made improvements.

Lines_per_page = 60

# Read and parse a file.
def grab( i )
IO.read( ARGV ).split( /^((?:[ \t]*\n)*[ \t]*\n)/ ).map{ |s|
s.scan( /.*?\n|.+$/ ) }
end

texts = grab(0).zip(grab(1)).inject([]){ |arr,pair|
m = [ pair.first.size, pair.last.size ].max
if pair.first.first =~ /\S/
# Equalize lengths of parallel paragraphs.
2.times { |i| pair += Array.new( m - pair.size ) { "" } }
arr << pair
else
# Equalize runs of blank lines and make them breakable.
m.times { arr << [ [""], [""] ] }
end
arr
}

class Array
def page_break
each { |handle| handle.puts "----" }
end
end

handles = []
2.times {|i| handles << File.open( "out-junk#{ i }", "w" ) }
count = 0

texts.each { |x|
psize = x.first.size
# Print paragraph or blank line.
if psize > Lines_per_page - count
handles.page_break
count = 0
end
2.times { |i| handles.puts x }
count += psize
}

handles.each { |h| h.close }
 
R

rubyhacker

I knew somebody would bring that up. :)

My gut feeling is that a page break can take the place
of an arbitrary number of newlines. So I guess that means
they are "consumed" in the page break. Additionally it
seems "wrong" to start a page with blank lines.

The only reason for preserving the extra blank lines is
in case the text happened to use them significantly, e.g.,
to separate sections or before/after an inset quotation.

I also haven't addressed the issue of paragraphs that are
longer than the page. Fortunately, most/all of the text won't
be Faulkner. ;)

Now to learn a bit of PDF::Writer... at/after the conf, of course.


Hal
 
D

David A. Black

Hi --

I knew somebody would bring that up. :)

My gut feeling is that a page break can take the place
of an arbitrary number of newlines. So I guess that means
they are "consumed" in the page break. Additionally it
seems "wrong" to start a page with blank lines.

Oh, so NOW you tell us :) This reminds me of the eating whitespace
issues in scanf.... :)
The only reason for preserving the extra blank lines is
in case the text happened to use them significantly, e.g.,
to separate sections or before/after an inset quotation.

Hmmm... in that case, what's the reason for not normalizing to one
blank line for every longest-of-the-two paragraphs? In other words,
given:

para1
<blank>
<blank>

and

para1
<blank>

why pad the second text with another blank line?


David
 
W

William James

I knew somebody would bring that up. :)

My gut feeling is that a page break can take the place
of an arbitrary number of newlines. So I guess that means
they are "consumed" in the page break. Additionally it
seems "wrong" to start a page with blank lines.

The only reason for preserving the extra blank lines is
in case the text happened to use them significantly, e.g.,
to separate sections or before/after an inset quotation.


Lines_per_page = 60

# Read and parse a file.
def grab( i )
IO.read( ARGV ).split( /^((?:[ \t]*\n)*[ \t]*\n)/ ).map{ |s|
s.scan( /.*?\n|.+$/ ) }
end

texts = grab(0).zip(grab(1)).inject([]){ |arr,pair|
m = [ pair.first.size, pair.last.size ].max
if pair.first.first =~ /\S/
# Equalize lengths of parallel paragraphs.
2.times { |i| pair += Array.new( m - pair.size ) { "" } }
arr << pair
else
# Equalize runs of blank lines and make them breakable.
m.times { arr << [ [""], [""] ] }
end
arr
}

class Array
def page_break
each { |handle| handle.puts "----" }
end
end

handles = []
2.times {|i| handles << File.open( "out-junk#{ i }", "w" ) }
count = 0

texts.each { |x|
psize = x.first.size
# Print paragraph or blank line.
if psize > Lines_per_page - count
handles.page_break
count = 0
end
# Don't print blank lines at top of page.
if count > 0 or x.first.first =~ /\S/
2.times { |i| handles.puts x }
count += psize
end
}

handles.each { |h| h.close }
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top