normalizing newlines

Alan Munn · Sep 13, 2009

I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. What's the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

Thanks

Alan

Xavier Noria · Sep 13, 2009

I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. =C2=A0What's the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

The script is line oriented or has the entire text slurped?

7stud -- · Sep 13, 2009

Alan said:
I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. What's the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

Look at the original strings in the examples:

----------------------------------------------------------- String#chomp
str.chomp(separator=$/) => new_str
------------------------------------------------------------------------
Returns a new +String+ with the given record separator removed from
the end of _str_ (if present). If +$/+ has not been changed from
the default Ruby record separator, then +chomp+ also removes
carriage return characters (that is it will remove +\n+, +\r+, and
+\r\n+).

"hello".chomp #=> "hello"
"hello\n".chomp #=> "hello"
"hello\r\n".chomp #=> "hello"
"hello\n\r".chomp #=> "hello\n"
"hello\r".chomp #=> "hello"
"hello \n there".chomp #=> "hello \n there"
"hello".chomp("llo") #=> "he"

Bertram Scharpf · Sep 13, 2009

Hi,

Am Montag, 14. Sep 2009, 03:10:12 +0900 schrieb Alan Munn:

I have a script that processes text from the clipboard. Depending on the
application the text is copied from, (even on the same platform) lines
sometimes end in lf, cr, or possibly cr/lf. What's the best way of
normalizing the lines so that my script will work independent of the
actual newlines chosen?

str.gsub /(\r?\n)|\r/, $/

Bertram

Alan Munn · Sep 13, 2009

Xavier Noria said:
The script is line oriented or has the entire text slurped?

I'm not sure I know exactly what you mean. The script takes copied cells
in csv format and massages them into different LaTeX table styles for
insertion into a LaTeX document. I first split the lines and then use
csv methods to do the rest.

Alan

Alan Munn · Sep 13, 2009

Bertram Scharpf said:
Hi,

Am Montag, 14. Sep 2009, 03:10:12 +0900 schrieb Alan Munn:

str.gsub /(\r?\n)|\r/, $/

Bertram

Thanks, that got me on the right track.

Alan

Alan Munn · Sep 13, 2009

7stud -- said:
Look at the original strings in the examples:

----------------------------------------------------------- String#chomp
str.chomp(separator=$/) => new_str
------------------------------------------------------------------------
Returns a new +String+ with the given record separator removed from
the end of _str_ (if present). If +$/+ has not been changed from
the default Ruby record separator, then +chomp+ also removes
carriage return characters (that is it will remove +\n+, +\r+, and
+\r\n+).

"hello".chomp #=> "hello"
"hello\n".chomp #=> "hello"
"hello\r\n".chomp #=> "hello"
"hello\n\r".chomp #=> "hello\n"
"hello\r".chomp #=> "hello"
"hello \n there".chomp #=> "hello \n there"
"hello".chomp("llo") #=> "he"

Thanks. I'd prefer to normalize the lines and then use .split on \r
thank chomp each line.

Alan

Xavier Noria · Sep 13, 2009

I'm not sure I know exactly what you mean. The script takes copied cells
in csv format and massages them into different LaTeX table styles for
insertion into a LaTeX document. =C2=A0I first split the lines and then u= se
csv methods to do the rest.

I mean, do you have the entire CSV in memory? I guess the split means
you do. In that case you could split with a regexp like this:

lines =3D csv.split(/[\015\012]+/)

The exact regexp depends on what you need. That one prevents blank
items in lines in particular, may do.

That said, I'd normally use a CSV library to handle the input.

Alan Munn · Sep 14, 2009

Xavier Noria said:
I'm not sure I know exactly what you mean. The script takes copied cells
in csv format and massages them into different LaTeX table styles for
insertion into a LaTeX document. Â I first split the lines and then use
csv methods to do the rest.

Click to expand...

I mean, do you have the entire CSV in memory? I guess the split means
you do. In that case you could split with a regexp like this:

lines = csv.split(/[\015\012]+/)

The exact regexp depends on what you need. That one prevents blank
items in lines in particular, may do.

Yes, this will do what I want.

That said, I'd normally use a CSV library to handle the input.

Well the basic idea is to simply allow cut and paste functionality from
a spreadsheet to a LaTeX document, so I'm never dealing with huge
amounts of data generally.

Thanks for the help.

Alan

Xavier Noria · Sep 14, 2009

Well the basic idea is to simply allow cut and paste functionality from
a spreadsheet to a LaTeX document, so I'm never dealing with huge
amounts of data generally.

Problem is not the size of the data. CSV is brittle, record
separators, field separators, quoting, escaping. With a library you
may write about the same amount of code AND have a robust script for
the same price.

If the data is known and dead simple a split would do though.

Alan Munn · Sep 14, 2009

Xavier Noria said:
Problem is not the size of the data. CSV is brittle, record
separators, field separators, quoting, escaping. With a library you
may write about the same amount of code AND have a robust script for
the same price.

If the data is known and dead simple a split would do though.

I see. I think that for what its intended use will be, the split should
work fine (and certainly in testing hasn't failed yet.) If I begin
getting reports of failure on more complicated data I may need to
rethink things.

Thanks for you comments.

Alan

David Masover · Sep 15, 2009

I see. I think that for what its intended use will be, the split should
work fine (and certainly in testing hasn't failed yet.) If I begin
getting reports of failure on more complicated data I may need to
rethink things.

I suppose so, but take a look at CSV in the standard library. It's really
incredibly simple to use, and it's in the standard library, so it's not as if
this is another dependency.

Even if you want to spit the whole thing into RAM, you could do:

require 'csv'
rows = []
CSV.open('foo.csv'){|row| rows << row}

Parsing a string you've got in RAM isn't any harder.

Look at it another way, too -- if there is some weird failure case (as an
example, try putting a double quote in the CSV and see what happens), you can
work with the standard library people to get it fixed.

Read a line under MS/Unix/Mac	16	Nov 5, 2007
codecs.open on Win32 -- converting my newlines to CR+LF	4	Aug 27, 2009
How to always write Windows style newlines to a file?	7	Dec 13, 2006
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
FOSS or Freeware, Prefferably Runs on Linux Mint: Search US Goverment Records, Legally to Find Literarary Work	8	Apr 5, 2023
Simple script execution problems (newbie)	17	Aug 22, 2010
ConfigParser and newlines	1	May 15, 2009
Registration Form	7	Aug 30, 2023

normalizing newlines

Alan Munn

Xavier Noria

7stud --

Bertram Scharpf

Alan Munn

Alan Munn

Alan Munn

Xavier Noria

Alan Munn

Xavier Noria

Alan Munn

David Masover

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads