Efficient grep using Python?

Discussion in 'Python' started by Jane Austine, Dec 16, 2004.

  1. Jane Austine

    Jane Austine Guest

    [Fredrik Lundh]
    >>> bdict = dict.fromkeys(open(bfile).readlines())
    >>>
    >>> for line in open(afile):
    >>> if line not in bdict:
    >>> print line,
    >>>
    >>> </F>


    [Tim Peters]
    >> Note that an open file is an iterable object, yielding the lines in
    >> the file. The "for" loop exploited that above, but fromkeys() can
    >> also exploit it. That is,
    >>
    >> bdict = dict.fromkeys(open(bfile))
    >>
    >> is good enough (there's no need for the .readlines()).


    [/F]
    > (sigh. my brain knows that, but my fingers keep forgetting)
    >
    > and yes, for this purpose, "dict.fromkeys" can be replaced
    > with "set".
    >
    > bdict = set(open(bfile))
    >
    > (and then you can save a few more bytes by renaming the
    > variable...)


    [Tim Peters]
    > Except the latter two are just shallow spelling changes. Switching
    > from fromkeys(open(f).readlines()) to fromkeys(open(f)) is much more
    > interesting, since it can allow major reduction in memory use. Even
    > if all the lines in the file are pairwise distinct, not materializing
    > them into a giant list can be a significant win. I wouldn't have
    > bothered replying if the only point were that you can save a couple
    > bytes of typing <wink>.


    fromkeys(open(f).readlines()) and fromkeys(open(f)) seem to be
    equivalent.

    When I pass an iterator instance(or a generator iterator) to the
    dict.fromkeys, it is expanded at that moment, thus fromkeys(open(f))
    is effectively same with fromkeys(list(open(f))) and
    fromkeys(open(f).readlines()).

    Am I missing something?

    Jane
     
    Jane Austine, Dec 16, 2004
    #1
    1. Advertising

  2. Jane Austine

    Tim Peters Guest

    [Jane Austine]
    > fromkeys(open(f).readlines()) and fromkeys(open(f)) seem to be
    > equivalent.


    Semantically, yes; pragmatically, no, in the way explained before.

    > When I pass an iterator instance(or a generator iterator) to the
    > dict.fromkeys, it is expanded at that moment,


    I don't know what "expanded at that moment" means to you. The CPython
    implementation of dict.fromkeys() alternates between getting the next
    vaule from its iterable argument, and storing that value as a dict
    key. It does that regardless of whether a list, or any other kind of
    iterable object, is passed to it. So the difference isn't in
    fromkeys(), it's in what's passed to fromkeys().

    > thus fromkeys(open(f)) is effectively same with
    > fromkeys(list(open(f))) and fromkeys(open(f).readlines()).


    Semantically, yes; and the last two are pragmatically the same too.
    The first is pragmatically different.

    > Am I missing something?


    You at least were <wink>.

    Build a file containing a million long identical lines (so the dict
    only has 1 entry in the end). Try all 3 spellings and watch their
    memory use. Report what you find.
     
    Tim Peters, Dec 16, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sf
    Replies:
    15
    Views:
    802
    Christos TZOTZIOY Georgiou
    Dec 17, 2004
  2. Milo Thurston

    Using array.select with grep

    Milo Thurston, Aug 1, 2008, in forum: Ruby
    Replies:
    15
    Views:
    182
    David A. Black
    Aug 1, 2008
  3. Mmcolli00 Mom
    Replies:
    3
    Views:
    145
    Joel VanderWerf
    May 14, 2009
  4. Dan King

    Using popen & grep

    Dan King, Apr 12, 2010, in forum: Ruby
    Replies:
    2
    Views:
    209
    Robert Klemme
    Apr 13, 2010
  5. Simon Harrison

    Using grep on subarrays - help!

    Simon Harrison, Apr 3, 2011, in forum: Ruby
    Replies:
    11
    Views:
    218
    7stud --
    Apr 5, 2011
Loading...

Share This Page