Sharing: File Reader Generator with & w/o Policy

M

Mark H Harris

hi folks, I am posting to share a File Reader Generator which I have
been playing with, that simplifies reading of text files on-demand:
like log files, config files, small record flat data-bases, &c.

I have two generators to share, one with & one without "policy".
The idea is to have the generator open and close the file (with error
checking: try-finish block) and then maintain its state for on-demand
reading either into memory (as list or dict) or for in-line processing.

I will demonstrate the generators here, and then post the code
following. The generator will be reading a path+filename of a local disk
file and printing it as in this simple case without policy:print(record)

The quick brown fox jumped
over the lazy dog's tail.

Now is the time for all
good women to come to the
aid of computer science!
The second generator adds "policy" to the generator processing and
yields tuples, rather than strings. Each tuple contains the record
number (from zero), and record length (minus the line end), and the
record itself (stripped of the line end):print(record)

(0, 26, 'The quick brown fox jumped')
(1, 25, "over the lazy dog's tail.")
(2, 0, '')
(3, 23, 'Now is the time for all')
(4, 25, 'good women to come to the')
(5, 24, 'aid of computer science!')
I will now share the source by allowing the fName(filename) utility
to expose itself. Enjoy:print(record)

#---------------------------------------------------------
# fName(filename) generator: file reader iterable
#---------------------------------------------------------
def fName(filename):
try:
fh = open(filename, 'r')
except FileNotFoundError as err_code:
print (err_code)
else:
while True:
linein = fh.readline()
if (linein!=''):
yield(linein.strip('\n'))
else:
break
fh.close()
finally:
None

#---------------------------------------------------------
# fnName(filename) generator: file reader iterable
#---------------------------------------------------------
def fnName(filename):
try:
fh = open(filename, 'r')
except FileNotFoundError as err_code:
print (err_code)
else:
line_count = 0
while True:
linein = fh.readline()
if (linein!=''):
lineout = linein.strip('\n')
length = len(lineout)
yield((line_count, length, lineout))
line_count+=1
else:
break
fh.close()
finally:
None

#---------------------------------------------------------
# {next util}
#---------------------------------------------------------
mark h harris
 
M

MRAB

hi folks, I am posting to share a File Reader Generator which I have
been playing with, that simplifies reading of text files on-demand:
like log files, config files, small record flat data-bases, &c.

I have two generators to share, one with & one without "policy".
The idea is to have the generator open and close the file (with error
checking: try-finish block) and then maintain its state for on-demand
reading either into memory (as list or dict) or for in-line processing.

I will demonstrate the generators here, and then post the code
following. The generator will be reading a path+filename of a local disk
file and printing it as in this simple case without policy:
print(record)

The quick brown fox jumped
over the lazy dog's tail.

Now is the time for all
good women to come to the
aid of computer science!

The second generator adds "policy" to the generator processing and
yields tuples, rather than strings. Each tuple contains the record
number (from zero), and record length (minus the line end), and the
record itself (stripped of the line end):
print(record)

(0, 26, 'The quick brown fox jumped')
(1, 25, "over the lazy dog's tail.")
(2, 0, '')
(3, 23, 'Now is the time for all')
(4, 25, 'good women to come to the')
(5, 24, 'aid of computer science!')

I will now share the source by allowing the fName(filename) utility
to expose itself. Enjoy:
print(record)

#---------------------------------------------------------
# fName(filename) generator: file reader iterable
#---------------------------------------------------------
def fName(filename):
try:
fh = open(filename, 'r')
except FileNotFoundError as err_code:
print (err_code)
else:
while True:
linein = fh.readline()
if (linein!=''):
yield(linein.strip('\n'))
else:
break
fh.close()
finally:
None
I don't like how it always swallows the exception, so you can't tell
whether the file doesn't exist or exists but is empty, and no way to
specify the file's encoding.

Why do you have the 'finally' clause with 'None' in it? Instead of None
you should have 'pass', or, better yet, omit the clause entirely.

You can also shorten it somewhat:

def fName(filename):
try:
with open(filename, 'r') as fh:
for linein in fh:
yield linein.strip('\n')
except FileNotFoundError as err_code:
print(err_code)

[snip]
 
M

Mark H Harris

I don't like how it always swallows the exception, so you can't tell
whether the file doesn't exist or exists but is empty, and no way to
specify the file's encoding.

Yes, the error handling needs more robustness/ and instead of printing
the errcode, my actual model on system will log it.
Why do you have the 'finally' clause with 'None' in it? Instead of None
you should have 'pass', or, better yet, omit the clause entirely.

Its a stub-in really, and that's all at this point. The 'finally'
happens regardless of whether the exception occurs, and I don't need
anything there yet, just don't want to forget it.

I've been playing around with wrapping generators within generators for
readability and simplicity. Like this, where I'm going to wrap the
fnName(filename) generator within a getnumline(filename) wrapper:
for record in fnName(filename):
yield(record)

Or this, where I put it all in memory as a dict:
d1[line[0]]=(line[1], line[2])
print (d1[key])

(26, 'The quick brown fox jumped')
(25, "over the lazy dog's tail.")
(0, '')
(23, 'Now is the time for all')
(25, 'good women to come to the')
(24, 'aid of computer science!')
marcus
 
M

Mark H Harris

def fName(filename):
try:
with open(filename, 'r') as fh:
for linein in fh:
yield linein.strip('\n')
except FileNotFoundError as err_code:
print(err_code)

[snip]

The "with" confuses me because I am not sure specifically what happens
in the context manager. I'm taking it for granted in this case that
__exit__() closes the file?

I am finding many examples of file handling using the context manager,
but none so far that wrap into a generator; more often and file object.
Is there a preference for file object over generator?

marcus
 
M

Mark H Harris

You can also shorten it somewhat:

Thanks, I like it... I shortened the fnName() also:

#---------------------------------------------------------
# fn2Name(filename) generator: file reader iterable
#---------------------------------------------------------
def fn2Name(filename):
try:
with open(filename, 'r') as fh: <=========== can you tell me
line_count = 0
for linein in fh:
lineout = linein.strip('\n')
length = len(lineout)
yield((line_count, length, lineout))
line_count+=1
except FileNotFoundError as err_code:
print(err_code)

#---------------------------------------------------------


.... where I can go to find out (for specific contexts) what the
__init__() and __exit__() are actually doing, like for instance in this
case does the filename get closed in __exit__(), and also if errors
occur does the file close automatically? thanks

marcus
 
M

Mark Lawrence

Thanks, I like it... I shortened the fnName() also:

#---------------------------------------------------------
# fn2Name(filename) generator: file reader iterable
#---------------------------------------------------------
def fn2Name(filename):
try:
with open(filename, 'r') as fh: <=========== can you tell me
line_count = 0
for linein in fh:
lineout = linein.strip('\n')
length = len(lineout)
yield((line_count, length, lineout))
line_count+=1
except FileNotFoundError as err_code:
print(err_code)

#---------------------------------------------------------


... where I can go to find out (for specific contexts) what the
__init__() and __exit__() are actually doing, like for instance in this
case does the filename get closed in __exit__(), and also if errors
occur does the file close automatically? thanks

marcus

Start here
http://docs.python.org/3/library/stdtypes.html#context-manager-types
 
S

Steven D'Aprano

hi folks, I am posting to share a File Reader Generator which I have
been playing with, that simplifies reading of text files on-demand: like
log files, config files, small record flat data-bases, &c.

Reading from files is already pretty simple. I would expect that it will
be harder to learn the specific details of custom, specialised, file
readers that *almost*, but not quite, do what you want, than to just
write a couple of lines of code to do what you need when you need it.
Particularly for interactive use, where robustness is less important than
ease of use.

I have two generators to share, one with & one without "policy".

What's "policy"?

The idea is to have the generator open and close the file (with error
checking: try-finish block) and then maintain its state for on-demand
reading either into memory (as list or dict) or for in-line processing.

I will demonstrate the generators here, and then post the code
following. The generator will be reading a path+filename of a local disk
file and printing it as in this simple case without policy:

print(record)

The quick brown fox jumped
over the lazy dog's tail.

What's "fName" mean? "File name"? That's a horribly misleading name,
since it *takes* a file name as argument, it doesn't return one. That
would be like renaming the len() function to "list", since it takes a
list as argument. Function and class names should be descriptive, giving
at least a hint as to what they do.

It looks to me that this fName just iterates over the lines in a file,
which makes it pretty close to just:

for line in open(path + "my_fox"):
print(line)


The second generator adds "policy" to the generator processing and
yields tuples, rather than strings. Each tuple contains the record
number (from zero), and record length (minus the line end), and the
record itself (stripped of the line end):

I presume that "record" here means "line", rather than an actual record
from a flat file with fixed-width fields, or some delimiter other than
newlines.

for i, line in enumerate(open(pathname + "my_fox")):
print((i, len(line), line))

print(record)

What's "fnName" mean? Perhaps "filename name"? "function name"? Again,
the name gives no hint as to what the function does.
def fName(filename):
try:
fh = open(filename, 'r')
except FileNotFoundError as err_code:
print (err_code)

For interactive use, this is *just barely* acceptable as a (supposedly)
user-friendly alternative to a stack trace.

[Aside: I don't believe that insulating programmers from tracebacks does
them any favours. Like the Dark Side of the Force, hiding errors is
seductively attractive, but ultimately harmful, since error tracebacks
are intimidating to beginners but an essential weapon in the battle
against buggy code. But reading tracebacks is a skill programmers have to
learn. Hiding tracebacks does them no favours, it just makes it harder
for them to learn good debugging skills, and encourages them to treat
errors as *something to hide* rather than *something to fix*.]

But as a reusable tool for use in non-interactive code, this function
fails badly. By capturing the exception, it makes it painfully difficult
for the caller to have control over error-handling. You cannot let the
exception propagate to some other part of the application for handling;
you cannot log the exception, or ignore it, or silently swallow the
exception and try another file. The fName function makes the decision for
you: it will print the error to standard output (not even standard
error!) no matter what you want. That's the very essence of *user-
hostile* for library code.

Worse, it's inconsistent! Some errors are handled normally, with an
exception. It's only FileNotFoundError that is captured and printed. So
if the user wants to re-use this function and do something with any
exceptions, she has to use *two* forms of error handling:

(1) wrap it in try...except handler to capture any exception other
than FileNotFoundError; and

(2) intercept writes to standard out, capture the error message, and
reverse-engineer what went wrong.


instead of just one.

else:
while True:
linein = fh.readline()
if (linein!=''):
yield(linein.strip('\n'))
else:
break
fh.close()

Apart from stripping newlines, which is surely better left to the user
(what if they need to see the newline? by stripping them automatically,
the user cannot distinguish between a file which ends with a newline
character and one which does not), this part is just a re-invention of
the existing wheel. File objects are already iterable, and yield the
lines of the file.

finally:
None

The finally clause is pointless, and not even written idiomatically as a
do-nothing statement ("pass").

def fnName(filename):
try:
fh = open(filename, 'r')
except FileNotFoundError as err_code:
print (err_code)
else:
line_count = 0
while True:
linein = fh.readline()
if (linein!=''):
lineout = linein.strip('\n')
length = len(lineout)
yield((line_count, length, lineout))
line_count+=1
else:
break
fh.close()
finally:
None


This function re-implements the fName function, except for a simple
addition. It could be written as:

def fnName(filename):
for count, line in enumerate(fName(filename)):
yield (count, len(line), line)
 
M

Mark H Harris


Thanks Mark. I have three books open, and that doc, and wading through.
You might like to know (as an aside) that I'm done with gg. Got back up
here with a real news reader and server. All is good that way. gg has
not been stable over the past three weeks, and this weekend it
completely quit working. It looks like this reader|client handles the
line wrapping correctly. whoohoo.

marcus
 
M

Mark H Harris

Reading from files is already pretty simple. I would expect that it will
be harder to learn the specific details of custom, specialised, file
readers that *almost*, but not quite, do what you want, than to just
write a couple of lines of code to do what you need when you need it.
Particularly for interactive use, where robustness is less important than
ease of use.

Yes. What I'm finding is that I'm coding the same 4-6 lines of code
with every file open (I do want error handling, at least for
FileNotFoundError) and I only want it to be two lines, read the file
into a list with error handling.
What's "policy"?

That's part of what I personally struggle with (frequently) is do I
place the policy in the generator, or do I handle it on the outside. For
instance, I normally strip the line-end and I want to know the record
lengths. I also may want to know the record number from arrival
sequence. This policy can be handled in the generator; although, I could
have handled it outside too.


for i, line in enumerate(open(pathname + "my_fox")):
print((i, len(line), line))

I like it... and this is where I've always been, when I finally said to
myself, yuk. yes, it technically works very well. But, its ugly. And I
don't mean its technically ugly, I mean its aesthetically ugly and not
user-easy-to-read. (I know that's all subjective)

for line in getnumline(path+"my_foxy")):
print(line)

In this case getnumline() is a generator wrapper around fName(). It of
course doesn't do anything different than the two lines you listed, but
it is immediately easier to tell what is happening; even if you're not
an experienced python programmer.

[Aside: I don't believe that insulating programmers from tracebacks does
them any favours.

Yes. I think you're right about that. But what if they're not
programmers; what if they're just application users that don't have a
clue what a trace-back is, and just want to know that the file does not
exist? And right away they realize that, oops, I spelled the filename
wrong. Yeaah, I struggle with this as I'm trying to simplify, because
personally I want to see the trace back info.
Worse, it's inconsistent! Some errors are handled normally, with an
exception. It's only FileNotFoundError that is captured and printed. So
if the user wants to re-use this function and do something with any
exceptions, she has to use *two* forms of error handling:

Yes. The exception handling needs to handle all normal errors.
(1) wrap it in try...except handler to capture any exception other
than FileNotFoundError; and

(2) intercept writes to standard out, capture the error message, and
reverse-engineer what went wrong.
Ok.


Apart from stripping newlines, which is surely better left to the user
(what if they need to see the newline? by stripping them automatically,
the user cannot distinguish between a file which ends with a newline
character and one which does not), this part is just a re-invention of
the existing wheel. File objects are already iterable, and yield the
lines of the file.

Yes, this is based on my use case, which never needs the line-ends, in
fact they are a pain. These files are variable record length and the
only thing the newline is used for is delimiting the records.

def fnName(filename):
for count, line in enumerate(fName(filename)):
yield (count, len(line), line)
I like this, thanks! enumerate and I are becoming friends.

I like this case philosophically because it is a both | and. The policy
is contained in the wrapper generator using enumerate() and len()
leaving the fName() generator to produce the line.

And you are right about another thing, I just want to use this thing
over and over.

for line in getnumline(filename):
{whatever}

There does seem to be just one way of doing this (file reads) but
there are actually many ways of doing this. Is a file object really
better than a generator, are there good reasons for using the generator,
are there absolute cases for using a file object?

marcus
 
C

Chris Angelico

And you are right about another thing, I just want to use this thing over
and over.

for line in getnumline(filename):
{whatever}

There does seem to be just one way of doing this (file reads) but there
are actually many ways of doing this. Is a file object really better than a
generator, are there good reasons for using the generator, are there
absolute cases for using a file object?

I recommend you read up on the Rule of Three. Not the comedic
principle - although that's worth knowing about too - but the
refactoring rule. [1]

As a general rule, code should be put into a function when it's been
done three times the same way. It depends a bit on how similar the
versions are, of course; having two places where the exact same thing
is done might well be enough to refactor, and sometimes you need to
see four or five places doing something only broadly similar before
you can figure out what the common part is, but most of the time,
three usages is the point to give it a name.

There's a cost to refactoring. Suddenly there's a new primitive on the
board - a new piece of language. If you can't give it a good name,
that's potentially a high cost. Splitting out all sorts of things into
generators when you could use well-known primitives like enumerate
gets expensive fast - what's the difference between fName and fnName?
I certainly wouldn't be able to call that, without actually looking
them up.

Let your use-cases justify your refactoring.

ChrisA


[1] https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)
 
M

Mark H Harris

There's a cost to refactoring. Suddenly there's a new primitive on the
board - a new piece of language . . . Splitting out all sorts of things into
generators when you could use well-known primitives like enumerate
gets expensive fast {snip}


[1] https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)

Very good to remember. I am finding the temptation to make all kinds of
generators (as you noted above). Its just that the python generator
makes it so easy to define a function that maintains state between calls
(of next() in this case) and so its also so easy to want to use them...
almost forgetting about primitives!

And the rule of three is one of those things that sneaks up on oneself.
I have actually coded about seven (7) such cases when I discovered that
they were all identical. I am noticing that folks code the same file
reader cases "with open() as fh: yadda yadda" and I've noticed that
they are all pretty close to the same. Wouldn't it be nice to have one
simpler getline() or getnumline() name that does this one simple thing
once and for all. But as simple as it is, it isn't. Well, as you say,
use cases need to determine code refactoring.

The other thing I'm tempted to do is to find names (even new names) that
read like English closely (whatever I mean by that) so that there is no
question about what is going on to a non expert.

for line in getnumline(file):
{whatever}

Well, what if there were a project called SimplyPy, or some such, that
boiled the python language down to a (Rexx like) or (BASIC like) syntax
and usage so that ordinary folks could code out problems (like they did
in 1964) and expert users could use it too including everything else
they know about python? Would it be good?

A SimplyPy coder would use constructs similar to other procedural
languages (like Rexx, Pascal, even C) and without knowing the plethora
of Python intrinsics could solve problems, yet not be an "expert".

SimplyPy would be a structured subset of the normal language for
learning and use (very small book/tutorial/ think the Rexx handbook, or
the K&R).

Its a long way off, and I'm just now experimenting. I'm trying to get my
hands around context managers (and other things). This is an idea I got
from Anthony Briggs' Hello Python! (forward SteveHolden) from Manning
books. Its very small, lite weight, handles real work, but--- its still
too big. I am wanting to condense it even further, providing the minimal
basic core language as an end application product rather than the
"expert" computer science language that will run under it.

or, over it, as you like.

(you think this is a nutty idea?)

marcus
 
C

Chris Angelico

There's a cost to refactoring. Suddenly there's a new primitive on the
board - a new piece of language . . . Splitting out all sorts of things
into

generators when you could use well-known primitives like enumerate
gets expensive fast {snip}


[1] https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)


Very good to remember. I am finding the temptation to make all kinds of
generators (as you noted above). Its just that the python generator makes it
so easy to define a function that maintains state between calls (of next()
in this case) and so its also so easy to want to use them... almost
forgetting about primitives!

General rule of thumb: Every object in the same namespace should be
readily distinguishable by name alone. And if doing that makes your
names so long that the function signature is longer than the function
body, it might be better to not have that as a function :)

Also, I'd consider something code smell if a name is used in only one
place. Maybe not so much with local variables, as there are other
reasons to separate things out, but a function that gets called from
only one place probably doesn't need to exist at top-level. (Of
course, a published API will often seem to have unused or little-used
functions, because they're being provided to the caller. They don't
count.)
And the rule of three is one of those things that sneaks up on oneself. I
have actually coded about seven (7) such cases when I discovered that they
were all identical. I am noticing that folks code the same file reader cases
"with open() as fh: yadda yadda" and I've noticed that they are all pretty
close to the same. Wouldn't it be nice to have one simpler getline() or
getnumline() name that does this one simple thing once and for all. But as
simple as it is, it isn't. Well, as you say, use cases need to determine
code refactoring.

If getline() is doing nothing that the primitive doesn't, and
getnumline is just enumerate, then they're not achieving anything
beyond shielding you from the primitives.
The other thing I'm tempted to do is to find names (even new names) that
read like English closely (whatever I mean by that) so that there is no
question about what is going on to a non expert.

for line in getnumline(file):
{whatever}

The trouble is that your idea of getnumline(file) might well differ
from someone else's idea of getnumline(file). Using Python's
primitives removes that confusion - if you see enumerate(file), you
know exactly what it's doing, even in someone else's code.
Well, what if there were a project called SimplyPy, or some such, that
boiled the python language down to a (Rexx like) or (BASIC like) syntax and
usage so that ordinary folks could code out problems (like they did in 1964)
and expert users could use it too including everything else they know about
python? Would it be good?

A SimplyPy coder would use constructs similar to other procedural languages
(like Rexx, Pascal, even C) and without knowing the plethora of Python
intrinsics could solve problems, yet not be an "expert".

SimplyPy would be a structured subset of the normal language for learning
and use (very small book/tutorial/ think the Rexx handbook, or the K&R).

Its a long way off, and I'm just now experimenting. I'm trying to get my
hands around context managers (and other things). This is an idea I got from
Anthony Briggs' Hello Python! (forward SteveHolden) from Manning books. Its
very small, lite weight, handles real work, but--- its still too big. I am
wanting to condense it even further, providing the minimal basic core
language as an end application product rather than the "expert" computer
science language that will run under it.

or, over it, as you like.

(you think this is a nutty idea?)

To be quite frank, yes I do think it's a nutty idea. Like most nutty
things, there's a kernel of something good in it, but that's not
enough to build a system on :)

Python is already pretty simple. The trouble with adding a layer of
indirection is that you'll generally be limiting what the code can do,
which is usually a bad idea for a general purpose programming
language, and also forcing you to predict everything the programmer
might want to do. Or you might have an "escape clause" that lets the
programmer drop to "real Python"... but as soon as you allow that, you
suddenly force the subsequent reader to comprehend all of Python,
defeating the purpose.

We had a discussion along these lines a little while ago, about
designing a DSL [1] for window creation. On one side of the debate was
"hey look how much cleaner the code is if I use this DSL", and on the
other side was "hey look how much work you don't have to do if you
just write code directly". The more cushioning between the programmer
and the language, the more the cushion has to be built to handle
everything the programmer might want to do. Python is a buffer between
me and C. C is itself a buffer between me and assembly language. Each
of them provides something that I want, but each of them has to be so
extensive as to be able to handle _anything_ I might want to write.
(Or, pretty much anything. Sometimes I find that a high level language
lacks some little thing - recently I was yearning for a beep feature -
and find that I can shell out to some external utility to do it for
me.) Creating SimplyPy would put the onus on you to make it possible
to write general code in it, and I think you'll find it's just not
worth trying - more and more you'll want to add features from Python
itself, until you achieve the inner-platform effect. [2]

Note that there are times when this sort of cushioning and limiting
are absolutely appropriate. Sometimes you want to limit end users to
certain pieces of functionality only, and the easiest way to do it is
to create a special-purpose language that is interpreted by a (say)
Python script. Or maybe you want to write a dice roller that takes a
specific notation like "2d8 + d6 (fire) + 2d6 (sneak attack) + 4
(STR)" [3] and interprets that appropriately. But the idea isn't to
simplify general programming, then, and if you're doing that sort of
thing, you still might want to consider a general-purpose language
(Lua's good for that, as is JavaScript/ECMAScript). That's not what
you're suggesting.

ChrisA

[1] Domain-specific language. I'm never sure whether to footnote these
kinds of acronyms, but I want to clarify that I am not talking about a
copper-based internet connection here.
[2] http://thedailywtf.com/Articles/The_Inner-Platform_Effect.aspx and
https://en.wikipedia.org/wiki/Inner-platform_effect
[3] I actually have a dice roller that does exactly that as part of
Minstrel Hall - http://minstrelhall.com/
 
M

Mark H Harris

Good stuff Chris, and thanks for the footnotes, I appreciate it.
If getline() is doing nothing that the primitive doesn't, and
getnumline is just enumerate, then they're not achieving anything
beyond shielding you from the primitives.

Yes. getline(fn) is returning the raw line minus the newline \n.
And getnumline(fn) is 1) creating a name that is easily recognizable,
and 2) shielding the 'user' from the primitives; yup.
The trouble is that your idea of getnumline(file) might well differ
from someone else's idea of getnumline(file). Using Python's
primitives removes that confusion

I am seeing that; esp for folks used to seeing the primitives; don't
want confusion.
To be quite frank, yes I do think it's a nutty idea. Like most nutty
things, there's a kernel of something good in it, but that's not
enough to build a system on :)

Thanks for your candor. I appreciate that too. Well, like I said,
I'm just experimenting with the idea right now, just playing around
really. In the process I'm coming more up-to-speed with python3.3 all
the time. :)
Python is already pretty simple.

statement == True
We had a discussion along these lines a little while ago, about
designing a DSL [1] for window creation. On one side of the debate was
"hey look how much cleaner the code is if I use this DSL", and on the
other side was "hey look how much work you don't have to do if you
just write code directly".

Was that on python-dev, or python-ideas, or here? I'd like to read
through it sometime.

Well just for grins, here is the updated my_utils.py for compare with
where I started tonight, ending like before with the code:
print (line)

The quick brown fox jumped
over the lazy dog's tail.

Now is the time for all
good women to come to the
aid of computer science!
print (line)

(0, 26, 'The quick brown fox jumped')
(1, 25, "over the lazy dog's tail.")
(2, 0, '')
(3, 23, 'Now is the time for all')
(4, 25, 'good women to come to the')
(5, 24, 'aid of computer science!')
print (line)

#---------------------------------------------------------
# __fOpen__(filename) generator: file open internal
#---------------------------------------------------------
def __fOpen__(filename):
try:
with open(filename, 'r') as fh:
for linein in fh:
yield linein.strip('\n')
except FileNotFoundError as err_code:
print(err_code)
# think about error handling, logging
finally:
pass

#---------------------------------------------------------
# getnumline(filename) generator: enumerated file reader
#---------------------------------------------------------
def getnumline(filename):
for count, line in enumerate(__fOpen__(filename)):
yield((count, len(line), line))

#---------------------------------------------------------
# getline(filename) generator: raw file reader iterable
#---------------------------------------------------------
def getline(filename):
for line in __fOpen__(filename):
yield(line)

#---------------------------------------------------------
# {next util}
#---------------------------------------------------------
 
C

Chris Angelico

Thanks for your candor. I appreciate that too. Well, like I said, I'm
just experimenting with the idea right now, just playing around really. In
the process I'm coming more up-to-speed with python3.3 all the time. :)

Good, glad you can take it the right way :) Learning is not by doing
whatever you like and being told "Oh yes, very good job" like in
kindergarten. Learning is by doing something (or proposing doing
something) and getting solid feedback. Of course, that feedback may be
wrong - your idea might be brilliant even though I believe it's a bad
one - and you need to know when to stick to your guns and drive your
idea forward through the hail of oncoming ... okay, this metaphor's
getting a bit tangled in its own limbs... okay, the meta-metaphor is
getting... alright I'm stopping now.
We had a discussion along these lines a little while ago, about
designing a DSL [1] for window creation. On one side of the debate was
"hey look how much cleaner the code is if I use this DSL", and on the
other side was "hey look how much work you don't have to do if you
just write code directly".

Was that on python-dev, or python-ideas, or here? I'd like to read
through it sometime.

Was here on python-list:

https://mail.python.org/pipermail/python-list/2014-January/664617.html
https://mail.python.org/pipermail/python-list/2014-January/thread.html#664617

The thread rambled a bit, but if you like reading, there's some good
content in there. You'll see some example code from my Pike MUD
client, Gypsum, and Steven D'Aprano and I discuss it. If you'd rather
skip most of the thread and just go to the bits I'm talking about,
here's my explanation of the Pike code:

https://mail.python.org/pipermail/python-list/2014-January/665286.html

And here's Steven's take on it:

https://mail.python.org/pipermail/python-list/2014-January/665356.html

And keep reading from there. TL;DR: It's not perfect as a DSL, but
it's jolly good as something that is already there and takes no
effort.

ChrisA
 
S

Steven D'Aprano

The other thing I'm tempted to do is to find names (even new names) that
read like English closely (whatever I mean by that) so that there is no
question about what is going on to a non expert.

for line in getnumline(file):
{whatever}

I'm not an expert on your code, and I have very little idea what that is
supposed to do. Judging by the name "getnumline", my first guess is that
the function takes a line number n, and it will return the nth line of
some source:

getnumline(source, 5)
=> returns the 5th line from source

But that's not how you use it. You pass it a "file". Is that a file
object, or a file name? My guess is that it would be a file object, since
if you wanted a file name you would have written getnumline(filename). Is
that a file object that is open for reading or writing? I'd have to guess
that it's open for reading, since you're (probably?) "getting" from the
file rather than "putting".

So... something like this:

file = open("some thing")
for line in getnumline(file):
...

Presumably it iterates over the lines of the file, but what it does with
the lines is hard to say. If I had to guess, I'd say... maybe it's
extracting the lines that start with a line number? Something like this
perhaps?

def getnumline(file_object):
count = 0 # Or start at 1?
while True:
line = file_object.readline()
if line == '':
break
if line.startswith(str(count)):
yield line
count += 1

But this is only a guess, based on the assumption that while the function
name is misleading, it's not *entirely* misleading. I'm puzzled why the
function claims to do something with "line" singular, when you're
obviously using it to iterate over lines plural.

Contrast that with an example from the Python built-ins: enumerate. What
you get is exactly what it says on the tin: the function is called
enumerate, and enumerate is what it does:


enumerate
v 1: specify individually; "She enumerated the many obstacles
she had encountered"; "The doctor recited the list of
possible side effects of the drug" [syn: enumerate,
recite, itemize, itemise]
2: determine the number or amount of; "Can you count the books
on your shelf?"; "Count your change" [syn: count, number,
enumerate, numerate]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top