Efficient grep using Python?

J

Jane Austine

[Fredrik Lundh]
[Tim Peters]
[/F]
(sigh. my brain knows that, but my fingers keep forgetting)

and yes, for this purpose, "dict.fromkeys" can be replaced
with "set".

bdict = set(open(bfile))

(and then you can save a few more bytes by renaming the
variable...)

[Tim Peters]
Except the latter two are just shallow spelling changes. Switching
from fromkeys(open(f).readlines()) to fromkeys(open(f)) is much more
interesting, since it can allow major reduction in memory use. Even
if all the lines in the file are pairwise distinct, not materializing
them into a giant list can be a significant win. I wouldn't have
bothered replying if the only point were that you can save a couple
bytes of typing <wink>.

fromkeys(open(f).readlines()) and fromkeys(open(f)) seem to be
equivalent.

When I pass an iterator instance(or a generator iterator) to the
dict.fromkeys, it is expanded at that moment, thus fromkeys(open(f))
is effectively same with fromkeys(list(open(f))) and
fromkeys(open(f).readlines()).

Am I missing something?

Jane
 
T

Tim Peters

[Jane Austine]
fromkeys(open(f).readlines()) and fromkeys(open(f)) seem to be
equivalent.

Semantically, yes; pragmatically, no, in the way explained before.
When I pass an iterator instance(or a generator iterator) to the
dict.fromkeys, it is expanded at that moment,

I don't know what "expanded at that moment" means to you. The CPython
implementation of dict.fromkeys() alternates between getting the next
vaule from its iterable argument, and storing that value as a dict
key. It does that regardless of whether a list, or any other kind of
iterable object, is passed to it. So the difference isn't in
fromkeys(), it's in what's passed to fromkeys().
thus fromkeys(open(f)) is effectively same with
fromkeys(list(open(f))) and fromkeys(open(f).readlines()).

Semantically, yes; and the last two are pragmatically the same too.
The first is pragmatically different.
Am I missing something?

You at least were <wink>.

Build a file containing a million long identical lines (so the dict
only has 1 entry in the end). Try all 3 spellings and watch their
memory use. Report what you find.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top