How to remove empty lines with re?

Discussion in 'Python' started by Tim Haynes, Oct 10, 2003.

  1. Tim Haynes

    Tim Haynes Guest

    "ted" <> writes:

    > f = open("old_site/index.html")
    > for line in f:
    > line = re.sub(r'^\s+$|\n', '', line) # }
    > print line # }



    If you will set a variable to an empty string and then print it, you will
    get an empty line printed ;)

    ~Tim
    --
    Product Development Consultant
    OpenLink Software
    Tel: +44 (0) 20 8681 7701
    Web: <http://www.openlinksw.com>
    Universal Data Access & Data Integration Technology Providers
     
    Tim Haynes, Oct 10, 2003
    #1
    1. Advertising

  2. Tim Haynes

    ted Guest

    I'm having trouble using the re module to remove empty lines in a file.

    Here's what I thought would work, but it doesn't:

    import re
    f = open("old_site/index.html")
    for line in f:
    line = re.sub(r'^\s+$|\n', '', line)
    print line

    Also, when I try to remove some HTML tags, I get even more empty lines:

    import re
    f = open("old_site/index.html")
    for line in f:
    line = re.sub('<.*?>', '', line)
    line = re.sub(r'^\s+$|\n', '', line)
    print line

    I don't know what I'm doing. Any help appreciated.

    TIA,
    Ted
     
    ted, Oct 10, 2003
    #2
    1. Advertising

  3. Tim Haynes

    Peter Otten Guest

    ted wrote:

    > I'm having trouble using the re module to remove empty lines in a file.
    >
    > Here's what I thought would work, but it doesn't:
    >
    > import re
    > f = open("old_site/index.html")
    > for line in f:
    > line = re.sub(r'^\s+$|\n', '', line)
    > print line


    Try:

    import sys
    for line in f:
    if line.strip():
    sys.stdout.write(line)

    Background: lines read from the file keep their trailing "\n", a second
    newline is inserted by the print statement.
    The strip() method creates a copy of the string with all leading/trailing
    whitespace chars removed. All but the empty string evaluate to True in the
    if statement.

    Peter
     
    Peter Otten, Oct 10, 2003
    #3
  4. "ted" <> wrote in message
    news:...
    > I'm having trouble using the re module to remove empty lines in a file.
    >
    > Here's what I thought would work, but it doesn't:
    >
    > import re
    > f = open("old_site/index.html")
    > for line in f:
    > line = re.sub(r'^\s+$|\n', '', line)
    > print line
    >


    nonempty = [x for x in f if not x.strip()]

    /BJ
     
    Bror Johansson, Oct 10, 2003
    #4
  5. Tim Haynes

    Anand Pillai Guest

    To do this, you need to modify your re to just
    this

    empty=re.compile('^$')

    This of course looks for a pattern where there is beginning just
    after end, ie the line is empty :)

    Here is the complete code.

    import re

    empty=re.compile('^$')
    for line in open('test.txt').readlines():
    if empty.match(line):
    continue
    else:
    print line,

    The comma at the end of the print is to avoid printing another newline,
    since the 'readlines()' method gives you the line with a '\n' at the end.

    Also dont forget to compile your regexps for efficiency sake.

    HTH

    -Anand Pillai


    "ted" <> wrote in message news:<>...
    > I'm having trouble using the re module to remove empty lines in a file.
    >
    > Here's what I thought would work, but it doesn't:
    >
    > import re
    > f = open("old_site/index.html")
    > for line in f:
    > line = re.sub(r'^\s+$|\n', '', line)
    > print line
    >
    > Also, when I try to remove some HTML tags, I get even more empty lines:
    >
    > import re
    > f = open("old_site/index.html")
    > for line in f:
    > line = re.sub('<.*?>', '', line)
    > line = re.sub(r'^\s+$|\n', '', line)
    > print line
    >
    > I don't know what I'm doing. Any help appreciated.
    >
    > TIA,
    > Ted
     
    Anand Pillai, Oct 10, 2003
    #5
  6. Tim Haynes

    Anand Pillai Guest

    Errata:

    I meant "there is end just after the beginning" of course.

    -Anand

    "ted" <> wrote in message news:<>...
    > I'm having trouble using the re module to remove empty lines in a file.
    >
    > Here's what I thought would work, but it doesn't:
    >
    > import re
    > f = open("old_site/index.html")
    > for line in f:
    > line = re.sub(r'^\s+$|\n', '', line)
    > print line
    >
    > Also, when I try to remove some HTML tags, I get even more empty lines:
    >
    > import re
    > f = open("old_site/index.html")
    > for line in f:
    > line = re.sub('<.*?>', '', line)
    > line = re.sub(r'^\s+$|\n', '', line)
    > print line
    >
    > I don't know what I'm doing. Any help appreciated.
    >
    > TIA,
    > Ted
     
    Anand Pillai, Oct 10, 2003
    #6
  7. Anand Pillai wrote:

    > Here is the complete code.
    >
    > import re
    >
    > empty=re.compile('^$')
    > for line in open('test.txt').readlines():
    > if empty.match(line):
    > continue
    > else:
    > print line,


    The .readlines() method retains any line terminators, and using the
    builtin print will suffix an extra line terminator to every line,
    thus effectively producing an empty line for every non-empty line.
    You'd want to use e.g. sys.stdout.write() instead of print.


    // Klaus

    --
    ><> unselfish actions pay back better
     
    Klaus Alexander Seistrup, Oct 10, 2003
    #7
  8. Tim Haynes

    ted Guest

    Thanks Anand, works great.


    "Anand Pillai" <> wrote in message
    news:...
    > To do this, you need to modify your re to just
    > this
    >
    > empty=re.compile('^$')
    >
    > This of course looks for a pattern where there is beginning just
    > after end, ie the line is empty :)
    >
    > Here is the complete code.
    >
    > import re
    >
    > empty=re.compile('^$')
    > for line in open('test.txt').readlines():
    > if empty.match(line):
    > continue
    > else:
    > print line,
    >
    > The comma at the end of the print is to avoid printing another newline,
    > since the 'readlines()' method gives you the line with a '\n' at the end.
    >
    > Also dont forget to compile your regexps for efficiency sake.
    >
    > HTH
    >
    > -Anand Pillai
    >
    >
    > "ted" <> wrote in message

    news:<>...
    > > I'm having trouble using the re module to remove empty lines in a file.
    > >
    > > Here's what I thought would work, but it doesn't:
    > >
    > > import re
    > > f = open("old_site/index.html")
    > > for line in f:
    > > line = re.sub(r'^\s+$|\n', '', line)
    > > print line
    > >
    > > Also, when I try to remove some HTML tags, I get even more empty lines:
    > >
    > > import re
    > > f = open("old_site/index.html")
    > > for line in f:
    > > line = re.sub('<.*?>', '', line)
    > > line = re.sub(r'^\s+$|\n', '', line)
    > > print line
    > >
    > > I don't know what I'm doing. Any help appreciated.
    > >
    > > TIA,
    > > Ted
     
    ted, Oct 11, 2003
    #8
  9. Tim Haynes

    Anand Pillai Guest

    You probably did not read my posting completely.

    I have added a comma after the print statement and mentioned
    a comment specifically on this.

    The 'print line,' statement with a comma after it does not print
    a newline which you also call as line terminator whereas
    the 'print' without a comma at the end does just that.

    No wonder python sometimes feels like high-level psuedocode ;-)
    It has that ultra intuitive feel for most of its tricks.

    In this case, the comma is usually put when you have more than
    one item to print, and python puts a newline after all items.
    So it very intuitively follows that just putting a comma will not
    print a newline! It is better than telling the programmer to use
    another print function to avoid newlines, which you find in many
    other 'un-pythonic' languages.

    -Anand

    Klaus Alexander Seistrup <> wrote in message news:<>...
    > Anand Pillai wrote:
    >
    > > Here is the complete code.
    > >
    > > import re
    > >
    > > empty=re.compile('^$')
    > > for line in open('test.txt').readlines():
    > > if empty.match(line):
    > > continue
    > > else:
    > > print line,

    >
    > The .readlines() method retains any line terminators, and using the
    > builtin print will suffix an extra line terminator to every line,
    > thus effectively producing an empty line for every non-empty line.
    > You'd want to use e.g. sys.stdout.write() instead of print.
    >
    >
    > // Klaus
     
    Anand Pillai, Oct 12, 2003
    #9
  10. Anand Pillai wrote:

    > You probably did not read my posting completely.
    >
    > I have added a comma after the print statement and mentioned
    > a comment specifically on this.


    You are completely right, I missed an important part of your posting.
    I didn't know about the comma feature, so thanks for teaching me!

    Cheers,

    // Klaus

    --
    ><> unselfish actions pay back better
     
    Klaus Alexander Seistrup, Oct 12, 2003
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jack
    Replies:
    9
    Views:
    2,706
  2. Joe Wright
    Replies:
    0
    Views:
    538
    Joe Wright
    Jul 27, 2003
  3. lovecreatesbeauty

    How to know two lines are a pare parallel lines

    lovecreatesbeauty, Apr 27, 2006, in forum: C Programming
    Replies:
    11
    Views:
    679
    Old Wolf
    Apr 28, 2006
  4. Replies:
    1
    Views:
    467
    Jonathan Mcdougall
    Dec 6, 2005
  5. tor
    Replies:
    6
    Views:
    1,537
    Jeff 'japhy' Pinyan
    Dec 10, 2003
Loading...

Share This Page