How to remove empty lines with re?

Discussion in 'Python' started by Tim Haynes, Oct 10, 2003.

  1. Tim Haynes

    Tim Haynes Guest


    If you will set a variable to an empty string and then print it, you will
    get an empty line printed ;)

    ~Tim
    --
    Product Development Consultant
    OpenLink Software
    Tel: +44 (0) 20 8681 7701
    Web: <http://www.openlinksw.com>
    Universal Data Access & Data Integration Technology Providers
     
    Tim Haynes, Oct 10, 2003
    #1
    1. Advertisements

  2. Tim Haynes

    ted Guest

    I'm having trouble using the re module to remove empty lines in a file.

    Here's what I thought would work, but it doesn't:

    import re
    f = open("old_site/index.html")
    for line in f:
    line = re.sub(r'^\s+$|\n', '', line)
    print line

    Also, when I try to remove some HTML tags, I get even more empty lines:

    import re
    f = open("old_site/index.html")
    for line in f:
    line = re.sub('<.*?>', '', line)
    line = re.sub(r'^\s+$|\n', '', line)
    print line

    I don't know what I'm doing. Any help appreciated.

    TIA,
    Ted
     
    ted, Oct 10, 2003
    #2
    1. Advertisements

  3. Tim Haynes

    Peter Otten Guest

    Try:

    import sys
    for line in f:
    if line.strip():
    sys.stdout.write(line)

    Background: lines read from the file keep their trailing "\n", a second
    newline is inserted by the print statement.
    The strip() method creates a copy of the string with all leading/trailing
    whitespace chars removed. All but the empty string evaluate to True in the
    if statement.

    Peter
     
    Peter Otten, Oct 10, 2003
    #3
  4. nonempty = [x for x in f if not x.strip()]

    /BJ
     
    Bror Johansson, Oct 10, 2003
    #4
  5. Tim Haynes

    Anand Pillai Guest

    To do this, you need to modify your re to just
    this

    empty=re.compile('^$')

    This of course looks for a pattern where there is beginning just
    after end, ie the line is empty :)

    Here is the complete code.

    import re

    empty=re.compile('^$')
    for line in open('test.txt').readlines():
    if empty.match(line):
    continue
    else:
    print line,

    The comma at the end of the print is to avoid printing another newline,
    since the 'readlines()' method gives you the line with a '\n' at the end.

    Also dont forget to compile your regexps for efficiency sake.

    HTH

    -Anand Pillai
     
    Anand Pillai, Oct 10, 2003
    #5
  6. Tim Haynes

    Anand Pillai Guest

    Errata:

    I meant "there is end just after the beginning" of course.

    -Anand
     
    Anand Pillai, Oct 10, 2003
    #6
  7. The .readlines() method retains any line terminators, and using the
    builtin print will suffix an extra line terminator to every line,
    thus effectively producing an empty line for every non-empty line.
    You'd want to use e.g. sys.stdout.write() instead of print.


    // Klaus

    --
     
    Klaus Alexander Seistrup, Oct 10, 2003
    #7
  8. Tim Haynes

    ted Guest

    Thanks Anand, works great.


     
    ted, Oct 11, 2003
    #8
  9. Tim Haynes

    Anand Pillai Guest

    You probably did not read my posting completely.

    I have added a comma after the print statement and mentioned
    a comment specifically on this.

    The 'print line,' statement with a comma after it does not print
    a newline which you also call as line terminator whereas
    the 'print' without a comma at the end does just that.

    No wonder python sometimes feels like high-level psuedocode ;-)
    It has that ultra intuitive feel for most of its tricks.

    In this case, the comma is usually put when you have more than
    one item to print, and python puts a newline after all items.
    So it very intuitively follows that just putting a comma will not
    print a newline! It is better than telling the programmer to use
    another print function to avoid newlines, which you find in many
    other 'un-pythonic' languages.

    -Anand
     
    Anand Pillai, Oct 12, 2003
    #9
  10. You are completely right, I missed an important part of your posting.
    I didn't know about the comma feature, so thanks for teaching me!

    Cheers,

    // Klaus

    --
     
    Klaus Alexander Seistrup, Oct 12, 2003
    #10
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.