normalizing newlines

A

Alan Munn

I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. What's the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

Thanks

Alan
 
X

Xavier Noria

I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. =C2=A0What's the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

The script is line oriented or has the entire text slurped?
 
7

7stud --

Alan said:
I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. What's the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

Look at the original strings in the examples:

----------------------------------------------------------- String#chomp
str.chomp(separator=$/) => new_str
------------------------------------------------------------------------
Returns a new +String+ with the given record separator removed from
the end of _str_ (if present). If +$/+ has not been changed from
the default Ruby record separator, then +chomp+ also removes
carriage return characters (that is it will remove +\n+, +\r+, and
+\r\n+).

"hello".chomp #=> "hello"
"hello\n".chomp #=> "hello"
"hello\r\n".chomp #=> "hello"
"hello\n\r".chomp #=> "hello\n"
"hello\r".chomp #=> "hello"
"hello \n there".chomp #=> "hello \n there"
"hello".chomp("llo") #=> "he"
 
B

Bertram Scharpf

Hi,


Am Montag, 14. Sep 2009, 03:10:12 +0900 schrieb Alan Munn:
I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. What's the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

str.gsub /(\r?\n)|\r/, $/

Bertram
 
A

Alan Munn

Xavier Noria said:
The script is line oriented or has the entire text slurped?

I'm not sure I know exactly what you mean. The script takes copied cells
in csv format and massages them into different LaTeX table styles for
insertion into a LaTeX document. I first split the lines and then use
csv methods to do the rest.

Alan
 
A

Alan Munn

Bertram Scharpf said:
Hi,


Am Montag, 14. Sep 2009, 03:10:12 +0900 schrieb Alan Munn:

str.gsub /(\r?\n)|\r/, $/

Bertram

Thanks, that got me on the right track.

Alan
 
A

Alan Munn

7stud -- said:
Look at the original strings in the examples:

----------------------------------------------------------- String#chomp
str.chomp(separator=$/) => new_str
------------------------------------------------------------------------
Returns a new +String+ with the given record separator removed from
the end of _str_ (if present). If +$/+ has not been changed from
the default Ruby record separator, then +chomp+ also removes
carriage return characters (that is it will remove +\n+, +\r+, and
+\r\n+).

"hello".chomp #=> "hello"
"hello\n".chomp #=> "hello"
"hello\r\n".chomp #=> "hello"
"hello\n\r".chomp #=> "hello\n"
"hello\r".chomp #=> "hello"
"hello \n there".chomp #=> "hello \n there"
"hello".chomp("llo") #=> "he"

Thanks. I'd prefer to normalize the lines and then use .split on \r
thank chomp each line.

Alan
 
X

Xavier Noria

I'm not sure I know exactly what you mean. The script takes copied cells
in csv format and massages them into different LaTeX table styles for
insertion into a LaTeX document. =C2=A0I first split the lines and then u= se
csv methods to do the rest.

I mean, do you have the entire CSV in memory? I guess the split means
you do. In that case you could split with a regexp like this:

lines =3D csv.split(/[\015\012]+/)

The exact regexp depends on what you need. That one prevents blank
items in lines in particular, may do.

That said, I'd normally use a CSV library to handle the input.
 
A

Alan Munn

Xavier Noria said:
I'm not sure I know exactly what you mean. The script takes copied cells
in csv format and massages them into different LaTeX table styles for
insertion into a LaTeX document.  I first split the lines and then use
csv methods to do the rest.

I mean, do you have the entire CSV in memory? I guess the split means
you do. In that case you could split with a regexp like this:

lines = csv.split(/[\015\012]+/)

The exact regexp depends on what you need. That one prevents blank
items in lines in particular, may do.

Yes, this will do what I want.
That said, I'd normally use a CSV library to handle the input.

Well the basic idea is to simply allow cut and paste functionality from
a spreadsheet to a LaTeX document, so I'm never dealing with huge
amounts of data generally.

Thanks for the help.

Alan
 
X

Xavier Noria

Well the basic idea is to simply allow cut and paste functionality from
a spreadsheet to a LaTeX document, so I'm never dealing with huge
amounts of data generally.

Problem is not the size of the data. CSV is brittle, record
separators, field separators, quoting, escaping. With a library you
may write about the same amount of code AND have a robust script for
the same price.

If the data is known and dead simple a split would do though.
 
A

Alan Munn

Xavier Noria said:
Problem is not the size of the data. CSV is brittle, record
separators, field separators, quoting, escaping. With a library you
may write about the same amount of code AND have a robust script for
the same price.

If the data is known and dead simple a split would do though.

I see. I think that for what its intended use will be, the split should
work fine (and certainly in testing hasn't failed yet.) If I begin
getting reports of failure on more complicated data I may need to
rethink things.

Thanks for you comments.

Alan
 
D

David Masover

I see. I think that for what its intended use will be, the split should
work fine (and certainly in testing hasn't failed yet.) If I begin
getting reports of failure on more complicated data I may need to
rethink things.

I suppose so, but take a look at CSV in the standard library. It's really
incredibly simple to use, and it's in the standard library, so it's not as if
this is another dependency.

Even if you want to spit the whole thing into RAM, you could do:

require 'csv'
rows = []
CSV.open('foo.csv'){|row| rows << row}

Parsing a string you've got in RAM isn't any harder.

Look at it another way, too -- if there is some weird failure case (as an
example, try putting a double quote in the CSV and see what happens), you can
work with the standard library people to get it fixed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top