How to remove empty lines with re?

Tim Haynes · Oct 10, 2003

ted said:
f = open("old_site/index.html")
for line in f:
line = re.sub(r'^\s+$|\n', '', line) # }
print line # }

If you will set a variable to an empty string and then print it, you will
get an empty line printed

~Tim
--
Product Development Consultant
OpenLink Software
Tel: +44 (0) 20 8681 7701
Web: <http://www.openlinksw.com>
Universal Data Access & Data Integration Technology Providers

ted · Oct 10, 2003

I'm having trouble using the re module to remove empty lines in a file.

Here's what I thought would work, but it doesn't:

import re
f = open("old_site/index.html")
for line in f:
line = re.sub(r'^\s+$|\n', '', line)
print line

Also, when I try to remove some HTML tags, I get even more empty lines:

import re
f = open("old_site/index.html")
for line in f:
line = re.sub('<.*?>', '', line)
line = re.sub(r'^\s+$|\n', '', line)
print line

I don't know what I'm doing. Any help appreciated.

TIA,
Ted

Peter Otten · Oct 10, 2003

ted said:
I'm having trouble using the re module to remove empty lines in a file.

Here's what I thought would work, but it doesn't:

import re
f = open("old_site/index.html")
for line in f:
line = re.sub(r'^\s+$|\n', '', line)
print line

Try:

import sys
for line in f:
if line.strip():
sys.stdout.write(line)

Background: lines read from the file keep their trailing "\n", a second
newline is inserted by the print statement.
The strip() method creates a copy of the string with all leading/trailing
whitespace chars removed. All but the empty string evaluate to True in the
if statement.

Peter

Bror Johansson · Oct 10, 2003

ted said:
I'm having trouble using the re module to remove empty lines in a file.

Here's what I thought would work, but it doesn't:

import re
f = open("old_site/index.html")
for line in f:
line = re.sub(r'^\s+$|\n', '', line)
print line

nonempty = [x for x in f if not x.strip()]

/BJ

Anand Pillai · Oct 10, 2003

To do this, you need to modify your re to just
this

empty=re.compile('^$')

This of course looks for a pattern where there is beginning just
after end, ie the line is empty

Here is the complete code.

import re

empty=re.compile('^$')
for line in open('test.txt').readlines():
if empty.match(line):
continue
else:
print line,

The comma at the end of the print is to avoid printing another newline,
since the 'readlines()' method gives you the line with a '\n' at the end.

Also dont forget to compile your regexps for efficiency sake.

HTH

-Anand Pillai

Anand Pillai · Oct 10, 2003

Errata:

I meant "there is end just after the beginning" of course.

-Anand

Klaus Alexander Seistrup · Oct 10, 2003

Anand said:
Here is the complete code.

import re

empty=re.compile('^$')
for line in open('test.txt').readlines():
if empty.match(line):
continue
else:
print line,

The .readlines() method retains any line terminators, and using the
builtin print will suffix an extra line terminator to every line,
thus effectively producing an empty line for every non-empty line.
You'd want to use e.g. sys.stdout.write() instead of print.

// Klaus

--

ted · Oct 11, 2003

Thanks Anand, works great.

Anand Pillai said:
To do this, you need to modify your re to just
this

empty=re.compile('^$')

This of course looks for a pattern where there is beginning just
after end, ie the line is empty

Here is the complete code.

import re

empty=re.compile('^$')
for line in open('test.txt').readlines():
if empty.match(line):
continue
else:
print line,

The comma at the end of the print is to avoid printing another newline,
since the 'readlines()' method gives you the line with a '\n' at the end.

Also dont forget to compile your regexps for efficiency sake.

HTH

-Anand Pillai

"ted" <[email protected]> wrote in message

Anand Pillai · Oct 12, 2003

You probably did not read my posting completely.

I have added a comma after the print statement and mentioned
a comment specifically on this.

The 'print line,' statement with a comma after it does not print
a newline which you also call as line terminator whereas
the 'print' without a comma at the end does just that.

No wonder python sometimes feels like high-level psuedocode ;-)
It has that ultra intuitive feel for most of its tricks.

In this case, the comma is usually put when you have more than
one item to print, and python puts a newline after all items.
So it very intuitively follows that just putting a comma will not
print a newline! It is better than telling the programmer to use
another print function to avoid newlines, which you find in many
other 'un-pythonic' languages.

-Anand

Klaus Alexander Seistrup · Oct 12, 2003

Anand said:
You probably did not read my posting completely.

I have added a comma after the print statement and mentioned
a comment specifically on this.

You are completely right, I missed an important part of your posting.
I didn't know about the comma feature, so thanks for teaching me!

Cheers,

// Klaus

--

How to remove an empty line which is created when i deleted a element from my xml file?	0	Oct 1, 2016
Is Scanner's nextLine() Supposed to Return True with Unread Empty Lines?	1	Mar 13, 2011
Compiling perl with GCC	7	Sep 25, 2003
how to remove code duplication	23	Aug 11, 2008
DBI problem : How can I load quickly one huge table with DBI ??.	3	Sep 12, 2003
is it possible to remove the ':' symbol in the end of lines starting with 'if', 'while' etc?	8	Feb 22, 2007
How to remove empty space in a string and others	4	Oct 9, 2006
How to copy hyperlinks using xlrd, xlwt and xlutils?	0	Oct 17, 2013

How to remove empty lines with re?

Tim Haynes

ted

Peter Otten

Bror Johansson

Anand Pillai

Anand Pillai

Klaus Alexander Seistrup

ted

Anand Pillai

Klaus Alexander Seistrup

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads