do a sed / awk filter with python tools (at least as fast)

Discussion in 'Python' started by Mathieu Prevot, Jul 7, 2008.

  1. Hi,

    I use in a bourne shell script the following filter:

    sed '/watch?v=/! d;s/.*v=//;s/\(.\{11\}\).*/\1/' \
    | sort | uniq | awk 'ORS=" "{print $1}'

    that give me all sets of 11 characters that follows the "watch?v="
    motif. I would like to do it in python on stdout from a
    subprocess.Popen instance, using python tools rather than sed awk etc.
    How can I do this ? Can I expect something as fast ?

    Thanks,
    Mathieu
     
    Mathieu Prevot, Jul 7, 2008
    #1
    1. Advertising

  2. Mathieu Prevot

    Peter Otten Guest

    Mathieu Prevot wrote:

    > I use in a bourne shell script the following filter:
    >
    > sed '/watch?v=/! d;s/.*v=//;s/\(.\{11\}\).*/\1/' \
    > | sort | uniq | awk 'ORS=" "{print $1}'
    >
    > that give me all sets of 11 characters that follows the "watch?v="
    > motif. I would like to do it in python on stdout from a
    > subprocess.Popen instance, using python tools rather than sed awk etc.
    > How can I do this ? Can I expect something as fast ?


    You should either do it in Python , e. g.:

    def process(lines):
    candidates = (line.rstrip().partition("/watch?v=") for line in lines)
    matches = (c[:11] for a, b, c in candidates if len(c) >= 11)
    print " ".join(sorted(set(matches)))

    if __name__ == "__main__":
    import sys
    process(sys.stdin)

    or invoke your shell script via subprocess.Popen(). Invoking a python script
    via subprocess doesn't make sense IMHO.

    Peter
     
    Peter Otten, Jul 7, 2008
    #2
    1. Advertising

  3. 2008/7/7 Peter Otten <>:
    > Mathieu Prevot wrote:
    >
    >> I use in a bourne shell script the following filter:
    >>
    >> sed '/watch?v=/! d;s/.*v=//;s/\(.\{11\}\).*/\1/' \
    >> | sort | uniq | awk 'ORS=" "{print $1}'
    >>
    >> that give me all sets of 11 characters that follows the "watch?v="
    >> motif. I would like to do it in python on stdout from a
    >> subprocess.Popen instance, using python tools rather than sed awk etc.
    >> How can I do this ? Can I expect something as fast ?

    >
    > You should either do it in Python , e. g.:
    >
    > def process(lines):
    > candidates = (line.rstrip().partition("/watch?v=") for line in lines)
    > matches = (c[:11] for a, b, c in candidates if len(c) >= 11)
    > print " ".join(sorted(set(matches)))
    >
    > if __name__ == "__main__":
    > import sys
    > process(sys.stdin)
    >
    > or invoke your shell script via subprocess.Popen(). Invoking a python script
    > via subprocess doesn't make sense IMHO.


    :) Thanks.
    Mathieu
     
    Mathieu Prevot, Jul 7, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. NNTP

    sed awk or perl for this?

    NNTP, Sep 11, 2003, in forum: Perl
    Replies:
    13
    Views:
    3,490
    Alan Connor
    Sep 30, 2003
  2. gorda
    Replies:
    2
    Views:
    548
    Andrew Shitov
    Oct 21, 2003
  3. NNTP
    Replies:
    2
    Views:
    958
    rakesh sharma
    Apr 7, 2004
  4. Replies:
    5
    Views:
    789
  5. hofer
    Replies:
    11
    Views:
    2,655
Loading...

Share This Page