regular expresson for Unix and Dos Lineendings wanted

  • Thread starter Franz Steinhaeusler
  • Start date
F

Franz Steinhaeusler

Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:
import re
retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE)

1) Windows'erewr \r\nafjdskl'

2) Unix
Who can help me (regular expression, which works for both cases).

Thank you in advance!
 
M

Martin Franklin

Franz said:
Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:
import re
retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE)

1) Windows'erewr \r\nafjdskl'

2) Unix'erewr\nafjdskl'

Who can help me (regular expression, which works for both cases).

Thank you in advance!


why not use string methods strip, rstrip and lstrip
 
G

gene tani

Franz said:
Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:
import re
retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE)

1) Windows'erewr \r\nafjdskl'

2) Unix
Who can help me (regular expression, which works for both cases).

universal newlines:
http://www.python.org/doc/2.3.3/whatsnew/node7.html
http://mail.python.org/pipermail/python-list/2006-February/324410.html
 
F

Franz Steinhaeusler

why not use string methods strip, rstrip and lstrip

because this removes only the last spaces,'erewr \r\nafjdskl'

I want:
'erewr\r\nafjdskl'

or for unix line endings
'erewr\nafjdskl'
 
G

gene tani

F

Franz Steinhaeusler

if multiple end-of line markers are present (\r, \r\n and or \n), use
the file's newlines attribute to see what they are. I think the thread
linked above touched on that. Otherwise newlines (or os.linesep)
should tell you what end of line is in that file.

http://docs.python.org/lib/bltin-file-objects.html

Thank you for your info.

I need it for a file, whose line endings I don't know.

I wrote for DrPython this script:
(using styled text control and wxPython) and this works,
but I'm looking for a shorter way:

===================================================================
#drscript
#RemoveTrailingWhitespaces

import re
import string

eol = DrDocument.GetEndOfLineCharacter()
regex = re.compile('\s+' + eol, re.MULTILINE)

relewin = re.compile('\r\n', re.M)
releunix = re.compile('[^\r]\n', re.M)
relemac = re.compile('\r[^\n]', re.M)

text = DrDocument.GetText()

#check line endings
win = unix = mac = 0
if relewin.search(text):
win = 1
if releunix.search(text):
unix = 1
if relemac.search(text):
mac = 1
mixed = win + unix + mac

#correct the lineendings before
if mixed > 1:
wx.MessageDialog(DrFrame, "Line endings mixed", "Remove trailing
Whitespace", wx.ICON_EXCLAMATION).ShowModal()

#ok to remove
else:
lines = text.split(eol)
new_lines = []
nr_lines = 0
nr_clines = 0
first_cline = -1
for line in lines:
nr_lines += 1
result = regex.search(line + eol)
if result != None:
end = result.start()
nr_clines += 1
if first_cline == -1:
first_cline = nr_lines
new_lines.append (line [:end])
else:
new_lines.append(line)

#file has trailing whitespaces
if nr_clines > 0:
d = wx.MessageDialog(DrFrame, "%d of %d lines have trailing
whitespaces (First:%d)\nCorrect?" % (nr_clines, nr_lines, first_cline),
\
"Remove trailing Whitespace", wx.OK | wx.CANCEL |
wx.ICON_QUESTION)
answer = d.ShowModal()
d.Destroy()
if (answer == wx.ID_OK):
newtext = string.join(new_lines, eol)
#save current line
curline = DrDocument.GetCurrentLine()
DrDocument.SetText(newtext)
#jump to saved current line
DrDocument.GotoLine(curline)

#no need to change the file
else:
wx.MessageDialog(DrFrame, "File ok!", "Remove trailing Whitespace",
wx.ICON_EXCLAMATION).ShowModal()

===========================================================================
 
F

Franz Steinhaeusler

I need it for a file, whose line endings I don't know.

I wrote for DrPython this script:
(using styled text control and wxPython) and this works,
but I'm looking for a shorter way:

ah, sorry, I try to make this more clear again:

(DrDocument is instance of a styled text control)

import re
import string

def GetEndOfLineCharacter():
emode = DrDocument.GetEOLMode()
if emode == wx.stc.STC_EOL_CR:
return '\r'
elif emode == wx.stc.STC_EOL_CRLF:
return '\r\n'
return '\n'

text = DrDocument.GetText()

eol = GetEndOfLineCharacter()
regex = re.compile('\s+' + eol, re.MULTILINE)

lines = text.split(eol)

new_lines = []
for line in lines:
result = regex.search(line + eol)
if result != None:
end = result.start()
new_lines.append (line [:end])
else:
new_lines.append(line)


newtext = string.join(new_lines, eol)
DrDocument.SetText(newtext)
 
M

Martin Franklin

Franz said:
because this removes only the last spaces,
'erewr \r\nafjdskl'

I want:
'erewr\r\nafjdskl'

or for unix line endings
'erewr\nafjdskl'


how about one of these variations

print 'erewr \r\nafjdskl '.replace(" ", "")
print 'erewr \r\nafjdskl '.strip(" \t")
 
J

John Zenger

How about r"\s+[\n\r]+|\s+$" ?

Franz said:
Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:

import re
retrailingwhitespace = re.compile('(?<=\S)[ \t]+$', re.MULTILINE)


1) Windows

'erewr \r\nafjdskl'

2) Unix

'erewr\nafjdskl'


Who can help me (regular expression, which works for both cases).

Thank you in advance!
 
F

Franz Steinhaeusler

Hello, I need a regularexpression, which trims trailing whitespaces.

While with unix line endings, it works;
but not with Window (Dos) CRLF's:

Thank you all for the replies.
But I still don't have a solution.

Of course with more lines it is possible,
but it would be fine to have a "oneliner".
 
M

Martin Franklin

Franz said:
Thank you all for the replies.
But I still don't have a solution.

Of course with more lines it is possible,
but it would be fine to have a "oneliner".


Then I clearly don't understand your problem... it seems we gave
you several ways of skinning your cat... but none of them 'worked'?
I find that hard to believe... perhaps you can re-state you problem
or show us your more than one line solution...(so that we might learn
from it)


Martin
 
S

Steven D'Aprano

because this removes only the last spaces,
'erewr \r\nafjdskl'

I want:
'erewr\r\nafjdskl'

or for unix line endings
'erewr\nafjdskl'


# Untested
def whitespace_cleaner(s):
"""Clean whitespace from string s, returning new string.

Strips all trailing whitespace from the end of the string, including
linebreaks. Removes whitespace except for linebreaks from everywhere
in the string. Internal linebreaks are converted to whatever is
appropriate for the current platform.
"""

from os import linesep
from string import whitespace
s = s.rstrip()
for c in whitespace:
if c in '\r\n':
continue
s = s.replace(c, '')
if linesep == '\n': # Unix, Linux, Mac OS X, etc.
# the order of the replacements is important
s = s.replace('\r\n', '\n').replace('\r', '\n')
elif linesep == '\r': # classic Macintosh
s = s.replace('\r\n', '\r').replace('\n', '\r')
elif linesep == '\r\n': # Windows
s = s.replace('\r\n', '\r').replace('\n', '\r')
s = s.replace('\r', '\r\n')
else: # weird platforms?
print "Unknown line separator, skipping."
return s
 
F

Franz Steinhaeusler

how about one of these variations

print 'erewr \r\nafjdskl '.replace(" ", "")
print 'erewr \r\nafjdskl '.strip(" \t")

Version 1:

it replaces all spaces, not only the trailing whitespaces.


version 2:

I found a solution (not the most beautiful, but for
my purpose sufficiently good.)
Given: a file has no mixed lineendings, so it is either
a dos or unix file (mac line endings not respected).


swin="erewr \r\nafjdskl "
sunix="erewr \nafjdskl "

Dos Line endings (at least on '\r' included)?
r is contents of a file:

helpchar = ''
if r.find('\r') != -1:
helpchar = '\r'
retrailingwhitespacelf = re.compile('(?<=\S)[ \t'+helpchar+']+$',
re.MULTILINE)
newtext, n = retrailingwhitespace.subn(helpchar, r)
if n > 1:
r = newtext
 
J

John Zenger

Franz said:
Thank you all for the replies.
But I still don't have a solution.

Of course with more lines it is possible,
but it would be fine to have a "oneliner".

re.sub(r"\s+[\n\r]+", lambda x: x.expand("\g<0>"). \
lstrip(" \t\f\v"),text).rstrip()

....where "text" is the unsplit block of text with mysterious line-endings.

But I think your code is a lot easier to read. :)
 
S

Sion Arrowsmith

Franz Steinhaeusler said:
why not use string methods strip, rstrip and lstrip
because this removes only the last spaces,
[given r = 'erewr \r\nafjdskl ']
I want:
'erewr\r\nafjdskl'

os.linesep.join(l.rstrip() for l in r.split(os.linesep))
 
F

Franz Steinhäusler

Franz said:
Thank you all for the replies.
But I still don't have a solution.

Of course with more lines it is possible,
but it would be fine to have a "oneliner".

re.sub(r"\s+[\n\r]+", lambda x: x.expand("\g<0>"). \
lstrip(" \t\f\v"),text).rstrip()

...where "text" is the unsplit block of text with mysterious line-endings.

But I think your code is a lot easier to read. :)

Hello John,

perfect, thank you,

but as you said, this is somehow not so easy to grasp.
(At least for me). :)
 
F

Franz Steinhaeusler

Franz Steinhaeusler said:
why not use string methods strip, rstrip and lstrip
because this removes only the last spaces,
[given r = 'erewr \r\nafjdskl ']
I want:
'erewr\r\nafjdskl'

os.linesep.join(l.rstrip() for l in r.split(os.linesep))

Hello Sion,

thank you, your solution, I like most!!
(it is clean und one don't have to use re's).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top