Re: How to Write grep in Emacs Lisp (tutorial)

Discussion in 'Python' started by Petter Gustad, Feb 8, 2011.

  1. Xah Lee <> writes:

    > problem with find xargs is that they spawn grep for each file, which
    > becomes too slow to be usable.


    find . -maxdepth 2 -name '*.html -print0 | xargs -0 grep whatever

    will call grep with a list of filenames given by find, only a single
    grep process will run.

    //Petter
    --
    ..sig removed by request.
     
    Petter Gustad, Feb 8, 2011
    #1
    1. Advertising

  2. Petter Gustad

    Xah Lee Guest

    hi Tass,

    Xah wrote:
    〈How to Write grep in Emacs Lisp〉
    http://xahlee.org/emacs/elisp_grep_script.html

    On Feb 8, 12:22 am, Tassilo Horn <> wrote:
    > Hi Xah,
    >
    > > • Often, the string i need to search is long, containing 300 hundred
    > > chars or more. You could put your search string in a file with grep,
    > > but it is not convenient.

    >
    > Well, you seem to encode the search string in your script, so I don't
    > see how that is better than relying on your shell history, which is
    > managed automatically, searchable, editable...


    not sure what you meant above. I made a mistake above. I meant to say
    my search string is few hundred chars. Usually a snippet of html code
    that may contain javascript code and also unicode chars.

    e.g.

    <div class="chtk"><script type="text/
    javascript">ch_client="thoucm";ch_width=550;ch_height=90;ch_type="mpu";ch_sid="Chitika
    Default";ch_backfill=1;ch_color_site_link="#00C";ch_color_title="#00C";ch_color_border="#FFF";ch_color_text="#000";ch_color_bg="#FFF";</
    script><script src="http://scripts.chitika.net/eminimalls/amm.js"
    type="text/javascript"></script></div>

    > > • grep can't really deal with directories recursively. (there's -r,
    > > but then you can't specify file pattern such as “*\.html†(maybe it is
    > > possible, but i find it quite frustrating to trial error man page loop
    > > with unix tools.))

    >
    > You can rely on shell globbing, so that grep gets a list of all files in
    > all subdirectories.  For example, I can grep all header files of the
    > linux kernel using
    >
    >   % grep FOO /usr/src/linux/**/*.h


    say, i want to search in the dir
    ~/web/xahlee_org/

    but no more than 2 levels deep, and only files ending in “.htmlâ€. This
    is not a toy question. I actually need to do that.

    > However, on older systems or on windows, that may produce a too long
    > command line.  Alternatively, you can use the -R option to grep a
    > directory recursively, and specify an include globbing pattern (or many,
    > and/or one or many exclude patterns).
    >
    >   % grep -R FOO --include='*.h' /usr/src/linux/
    >
    > You can also use a combination of `find', `xargs' and `grep' (with some
    > complications for allowing spaces in file names [-print0 to find]), or,
    > when using zsh, you can use
    >
    >   % zargs /usr/src/linux/**/*.h -- grep FOO
    >
    > which does all relevant quoting and stuff for you.


    problem with find xargs is that they spawn grep for each file, which
    becomes too slow to be usable.
    To not use xargs but “find ... -exec†instead is possible of course
    but i always have problems with the syntax...

    > > • unix grep and associated tool bag (sort, wc, uniq, pipe, sed, awk,
    > > …) is not flexible. When your need is slightly more complex, unix
    > > shell tool bag can't handle it. For example, suppose you need to find
    > > a string in HTML file only if the string happens inside another tag.
    > > (extending the limit of unix tool bag is how Perl was born in 1987.)

    >
    > There are many things you can also do with a plain shell script.  I'm
    > always amazed how good and concise you can do all sorts of file/text
    > manipulation using `zsh' builtins.


    never really got into bash for shell scripting... sometimes tried but
    the ratio power/syntax isn't tolerable. Knowing perl well pretty much
    killed any possible incentive left.

    .... in late 1990s, my thoughts was that i'll just learn perl well and
    never need
    to learn other lang or shell for any text processing and sys admin
    tasks for
    personal use. The thinking is that it'd be efficient in the sense of
    not having
    to waste time learning multiple langs for doing the same thing. (not
    counting
    job requirement in a company) So i have written a lot perl scripts for
    find &
    replace and file management stuff and tried to make them as general as
    possible.
    lol. But what turns out is that, over the years, for one reason or
    another, i
    just learned python, php, then in 2007 elisp. Maybe the love for
    languages
    inevitably won over my one-powerful-coherent-system efficiency
    obsession. But
    also, i end up rewrote many of my text processing script in each lang.
    I guess
    part of it is exercise when learning a new lang.

    .... anyway, i guess am random babbling, but one thing i learned is
    that for misc
    text processing scripts, the idea of writing a generic flexible
    powerful one
    once for all just doesn't work, because the coverage are too wide and
    tasks
    that needs to be done at one time are too specific. (and i think this
    makes
    sense, because the idea of one language or one generic script for all
    is mostly
    from ideology, not really out of practical need. If we look at the
    real world,
    it's almost always a disparate mess of components and systems.)

    my text processing scripts ends up being a mess. There are like
    several versions
    in different langs. A few are general, but most are basically used
    once or in a
    particular year only. (many of them do more or less the same thing).
    When i need to do some
    particular task, i found it easier just to write a new one in whatever
    lang that's
    currently in my brain memory than trying to spend time fishing out and
    revisit old scripts.

    some concrete example...

    e.g. i wrote this general script in 2000, intended to be one-stop for
    all find/replace needs

    〈Perl: Find & Replace on Multiple Files〉
    http://xahlee.org/perl-python/find_replace_perl.html

    in 2005, while i was learning python, i wrote (several) versions in
    python. e.g.

    〈Python: Find & Replace Strings in Unicode Files〉
    http://xahlee.org/perl-python/find_replace_unicode.html

    it's not a port of the perl code. The python version doesn't have much
    features as the perl. But for some reason, i have stopped using the
    perl version. Didn't need all that perl version features for some
    reason, and when i do need them, i have several other python scripts
    that address a particular need. (e.g. one for unicode, one for
    multiple pairs in one shot, one for regex one for plain text, one for
    find only one for finde+replace, several for find/replace only if
    particular condition is met, etc.)

    then in 2006, i fell into the emacs hole and start to learn elisp. In
    the process, i realized that elisp for text processing is more
    powerful than perl or python. Not due to lisp the lang, but more due
    to emacs the text-editing environment and system. I tried to explain
    this in few places but mostly here:

    〈Text Processing: Emacs Lisp vs Perl〉
    http://xahlee.org/emacs/elisp_text_processing_lang.html

    so, all my new scripts for text processing are in elisp. A few of my
    python script i still use, but almost everything is now in elisp.

    also, sometimes in 2008, i grew a shell script that process weblogs
    using the bunch of unix bag cat grep awk sort uniq. It's about 100
    lines. You can see it here:

    http://xahlee.org/comp/weblog_process.sh

    at one time i wondered, why am i doing it. Didn't i thought that perl
    replace all shell scripts? I gave it a little thought, and i think
    the
    conclusion is that for this task, the shell script is actually more
    efficient
    and simpler to write. Possibly if i started with perl for this task
    and i might
    end up with a good structured code and not necessarily less
    efficient... but you
    know things in life isn't all planned. It began when i just need a few
    lines of
    grep to see something in my web log. Then, over the years, added
    another line,
    another line, then another, all need based. If in any of those time i
    thought
    “let's scratch this and restart with perlâ€, that'd be wasting time.
    Besides
    that, i have some doubt that perl would do a better job for this. With
    shell
    tools, each line just do one simple thing with piping. To do it in
    perl, one'd
    have to read-in the huge log file then maintain some data structure
    and try to
    parse it... too much memory and thinking would involved. If i code
    perl by
    emulating the shell code line-by-line, then it makes no sense to do it
    in perl,
    since it's just shell bag in perl.

    Also note, this shell script can't be replaced by elisp, because elisp
    is not suitable when the file size is large.

    well, that's my story — extempore! ☺

    Xah Lee
     
    Xah Lee, Feb 8, 2011
    #2
    1. Advertising

  3. On Tue, 08 Feb 2011 13:51:54 +0100, Petter Gustad wrote:

    > Xah Lee <> writes:
    >
    >> problem with find xargs is that they spawn grep for each file, which
    >> becomes too slow to be usable.

    >
    > find . -maxdepth 2 -name '*.html -print0 | xargs -0 grep whatever
    >
    > will call grep with a list of filenames given by find, only a single
    > grep process will run.
    >
    > //Petter


    This is getting off-topic for the listed newsgroups and into
    comp.unix.shell (although the question was originally posed in a MS
    windows context).

    The 'modern' way to do this is
    find . -maxdepth 2 -name '*.html' -exec grep whatever {} +

    The key thing which makes this 'modern' is the '+' at the end of the
    command, rather than '\;'. This causes find to execute the grep once per
    group of files, rather than once per file.
     
    Icarus Sparry, Feb 8, 2011
    #3
  4. Icarus Sparry <> writes:

    > The 'modern' way to do this is
    > find . -maxdepth 2 -name '*.html' -exec grep whatever {} +


    Agree, I've noticed that recent version of find have the + option. I
    remember in the old days the exec method was considered bad since it
    would fork grep for each process, so I've got used to using xargs. I
    always used to quote "{}" as well, but this does not seem to be
    required in later versions of find.

    In terms of the number of forks the above will be similar to xargs as
    they both have to make sure that they don't overflow the command
    length.


    Petter
    --
    ..sig removed by request.
     
    Petter Gustad, Feb 8, 2011
    #4
  5. Petter Gustad

    Xah Lee Guest

    On Feb 8, 9:32 am, Icarus Sparry <> wrote:
    > On Tue, 08 Feb 2011 13:51:54 +0100, Petter Gustad wrote:
    > > Xah Lee <> writes:

    >
    > >> problem with find xargs is that they spawn grep for each file, which
    > >> becomes too slow to be usable.

    >
    > > find . -maxdepth 2 -name '*.html -print0 | xargs -0 grep whatever

    >
    > > will call grep with a list of filenames given by find, only a single
    > > grep process will run.

    >
    > > //Petter

    >
    > This is getting off-topic for the listed newsgroups and into
    > comp.unix.shell (although the question was originally posed in a MS
    > windows context).
    >
    > The 'modern' way to do this is
    > find . -maxdepth 2 -name '*.html' -exec grep whatever {} +
    >
    > The key thing which makes this 'modern' is the '+' at the end of the
    > command, rather than '\;'. This causes find to execute the grep once per
    > group of files, rather than once per file.


    Nice. When was the + introduced?

    Xah
     
    Xah Lee, Feb 8, 2011
    #5
  6. On Tue, 08 Feb 2011 14:30:53 -0800, Xah Lee wrote:

    > On Feb 8, 9:32 am, Icarus Sparry <> wrote:

    [snip]
    >> The 'modern' way to do this is
    >> find . -maxdepth 2 -name '*.html' -exec grep whatever {} +
    >>
    >> The key thing which makes this 'modern' is the '+' at the end of the
    >> command, rather than '\;'. This causes find to execute the grep once
    >> per group of files, rather than once per file.

    >
    > Nice. When was the + introduced?


    Years ago! The posix spec for find lists it in the page which has a
    copyright of 2001-2004.

    http://pubs.opengroup.org/onlinepubs/009695399/utilities/find.html

    Using google, I have come up with this reference from 2001

    https://www.opengroup.org/sophocles/show_mail.tpl?
    CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=3067

    in which David Korn reports writing the code in 1987.
     
    Icarus Sparry, Feb 9, 2011
    #6
  7. [Icarus Sparry <>]

    > The 'modern' way to do this is
    > find . -maxdepth 2 -name '*.html' -exec grep whatever {} +


    Actually, I think it should be

    find . -maxdepth 2 -name '*.html' -exec grep whatever /dev/null {} +

    because grep behaves differently when given only one filename as opposed
    to several.

    --
    * Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
    - It is undesirable to believe a proposition
    when there is no ground whatsoever for supposing it is true.
    -- Bertrand Russell
     
    Harald Hanche-Olsen, Feb 9, 2011
    #7
  8. [Icarus Sparry <>]

    > The 'modern' way to do this is
    > find . -maxdepth 2 -name '*.html' -exec grep whatever {} +


    Actually, I think it should be

    find . -maxdepth 2 -name '*.html' -exec grep whatever /dev/null {} + \;

    because grep behaves differently when given only one filename as opposed
    to several.

    --
    * Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
    - It is undesirable to believe a proposition
    when there is no ground whatsoever for supposing it is true.
    -- Bertrand Russell
     
    Harald Hanche-Olsen, Feb 9, 2011
    #8
  9. Petter Gustad

    Tassilo Horn Guest

    Xah Lee <> writes:

    >> You can rely on shell globbing, so that grep gets a list of all files in
    >> all subdirectories.  For example, I can grep all header files of the
    >> linux kernel using
    >>
    >>   % grep FOO /usr/src/linux/**/*.h

    >
    > say, i want to search in the dir
    > ~/web/xahlee_org/
    >
    > but no more than 2 levels deep, and only files ending in “.htmlâ€. This
    > is not a toy question. I actually need to do that.


    % grep ~/web/xahlee_org/*{,/*}.html FOO

    That'll grep files like ~/web/xahlee_org/bla.html as well as
    ~/web/xahlee_org/bla/bla.html, but not any deeper.

    >> However, on older systems or on windows, that may produce a too long
    >> command line.  Alternatively, you can use the -R option to grep a
    >> directory recursively, and specify an include globbing pattern (or many,
    >> and/or one or many exclude patterns).
    >>
    >>   % grep -R FOO --include='*.h' /usr/src/linux/
    >>
    >> You can also use a combination of `find', `xargs' and `grep' (with some
    >> complications for allowing spaces in file names [-print0 to find]), or,
    >> when using zsh, you can use
    >>
    >>   % zargs /usr/src/linux/**/*.h -- grep FOO
    >>
    >> which does all relevant quoting and stuff for you.

    >
    > problem with find xargs is that they spawn grep for each file, which
    > becomes too slow to be usable.


    I can see not speed difference in find | xargs grep or grep with glob...

    > To not use xargs but “find ... -exec†instead is possible of course
    > but i always have problems with the syntax...


    Yeah, there are so many ways. ;-)

    >> There are many things you can also do with a plain shell script.  I'm
    >> always amazed how good and concise you can do all sorts of file/text
    >> manipulation using `zsh' builtins.

    >
    > never really got into bash for shell scripting... sometimes tried but
    > the ratio power/syntax isn't tolerable. Knowing perl well pretty much
    > killed any possible incentive left.


    Yeah, perl is a swiss army knife, but I never got comfortable with it.

    > Also note, this shell script can't be replaced by elisp, because elisp
    > is not suitable when the file size is large.


    You could chunk the file and handle the parts separately, in order to
    not have everything in an emacs buffer and thus getting out of RAM.

    Bye,
    Tassilo
     
    Tassilo Horn, Feb 9, 2011
    #9
  10. At 09:39 PM 2/9/2011, Rob Warnock wrote:
    >Harald Hanche-Olsen <> wrote:

    [snip]
    >Years & years ago, right after I learned about "xargs", I got burned
    >several times on "find | xargs grep pat" when the file list was long
    >enough that "xargs" fired up more than one "grep"... and the last
    >invocation was given only one arg!! IT FOUND THE PATTERN, BUT DIDN'T
    >TELL ME WHAT !@^%!$@#@! FILE IT WAS IN!! :-{
    >
    >The trailing "/dev/null" fixes that. ;-}


    I find that I need periodic review of the grep -l -L -h and -H
    options . I'm surprised when other people forget about these
    too. The -H option is your heart's desire.

    >-Rob
    >
    >-----
    >Rob Warnock <>
    >627 26th Avenue <URL:http://rpw3.org/>
    >San Mateo, CA 94403 (650)572-2607
     
    Thomas L. Shinnick, Feb 10, 2011
    #10
  11. (Rob Warnock) writes:

    > invocation was given only one arg!! IT FOUND THE PATTERN, BUT DIDN'T
    > TELL ME WHAT !@^%!$@#@! FILE IT WAS IN!! :-{


    Sounds frustrating, but grep -H will always print the filename, even
    when given a single filename on the command line.

    //Petter
    --
    ..sig removed by request.
     
    Petter Gustad, Feb 10, 2011
    #11
  12. Hello,

    On Thu, Feb 10, 2011 at 07:52:34AM +0100, Petter
    Gustad wrote:
    > (Rob Warnock) writes:
    > > invocation was given only one arg!! IT FOUND
    > > THE PATTERN, BUT DIDN'T TELL ME WHAT
    > > !@^%!$@#@! FILE IT WAS IN!! :-{

    >
    > Sounds frustrating, but grep -H will always
    > print the filename, even when given a single
    > filename on the command line.


    not on HP-UX, though. Not even with "export
    UNIX_STD=2003"

    P.S. "find ... -exec ... +" works there just fine.

    --
    With best regards,
    xrgtn
     
    Alexander Gattin, Feb 11, 2011
    #12
  13. Hello,

    On Tue, Feb 08, 2011 at 05:32:05PM +0000, Icarus
    Sparry wrote:
    > The key thing which makes this 'modern' is the
    > '+' at the end of the command, rather than '\;'.
    > This causes find to execute the grep once per
    > group of files, rather than once per file.


    many thanks to you, man!

    I'm surprised to find out that this works on HP-UX
    B.11.31 and SunOS 5.9 (but not on HP Tru64 UNIX
    V5.1B).

    --
    With best regards,
    xrgtn
     
    Alexander Gattin, Feb 11, 2011
    #13
  14. Petter Gustad

    Xah Lee Guest

    On Feb 11, 2:06 am, Alexander Gattin <> wrote:
    > Hello,
    >
    > On Tue, Feb 08, 2011 at 05:32:05PM +0000, Icarus
    >
    > Sparry wrote:
    > > The key thing which makes this 'modern' is the
    > > '+' at the end of the command, rather than '\;'.
    > > This causes find to execute the grep once per
    > > group of files, rather than once per file.

    >
    > many thanks to you, man!
    >
    > I'm surprised to find out that this works on HP-UX
    > B.11.31 and SunOS 5.9 (but not on HP Tru64 UNIX
    > V5.1B).


    Is HP-UX still alive?

    lol. in 2000 i ported our ecommerce web app from Solaris to it. Am not
    exactly thrilled. At the time, i vaguely recall, the HP sales guys
    come to us and tells us they have this heart-beat technology ...

    Xah
     
    Xah Lee, Feb 11, 2011
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ekzept
    Replies:
    0
    Views:
    372
    ekzept
    Aug 10, 2007
  2. nanothermite911fbibustards
    Replies:
    0
    Views:
    378
    nanothermite911fbibustards
    Jun 16, 2010
  3. nanothermite911fbibustards
    Replies:
    0
    Views:
    318
    nanothermite911fbibustards
    Jun 16, 2010
  4. Rivka Miller

    Re: Emacs Lisp's Library System (tutorial)

    Rivka Miller, Jul 14, 2010, in forum: C Programming
    Replies:
    0
    Views:
    514
    Rivka Miller
    Jul 14, 2010
  5. small Pox
    Replies:
    4
    Views:
    371
    Steve Holden
    Dec 10, 2010
Loading...

Share This Page