Is pythonic version of scanf() or sscanf() planned?

R

ryniek90

Hi

I know that in python, we can do the same with regexps or *.split()*,
but thats longer and less practical method than *scanf()*. I also found
that ( http://code.activestate.com/recipes/502213/ ), but the code
doesn't looks so simple for beginners. So, whether it is or has been
planned the core Python implementation of *scanf()* ? (prefered as a
batteries included method)

Thanks for any helpful answers.
 
G

Grant Edwards

So, whether it is or has been planned the core Python
implementation of *scanf()* ?

One of the fist things I remember being taught as a C progrmmer
was to never use scanf. Programs that use scanf tend to fail
in rather spectacular ways when presented with simple typos and
other forms of unexpected input.

Given the bad behavior and general fragility of scanf(), I
doubt there's much demand for something equally broken for
Python.
Thanks for any helpful answers.

Not sure if mine was helpful...
 
M

Martien Verbruggen

One of the fist things I remember being taught as a C progrmmer
was to never use scanf. Programs that use scanf tend to fail
in rather spectacular ways when presented with simple typos and
other forms of unexpected input.

That's right. One shouldn't use scanf() if the input is unpredictable
osr comes from people, because the pitfalls are many and hard to avoid.
However, for input that is formatted, scanf() is perfectly fine, and
nice and fast.

fgets() with sscanf() is better to control if your input is not as
guaranteed.
Given the bad behavior and general fragility of scanf(), I
doubt there's much demand for something equally broken for
Python.

scanf() is not broken. It's just hard to use correctly for unpredictable
input.

Having something equivalent in Python would be nice where most or all of
your input is numerical, i.e. floats or integers. Using the re module,
or splitting and converting everything with int() or float() slows down
your program rather spectacularly. If those conversions could be done
internally, and before storing the input as Python strings, the speed
improvements could be significant.

There is too much storing, splitting, regex matching and converting
going on if you need to read numerical data from columns in a file.
scanf() and friends make this sort of task rather quick and easy.

For example, if your data is the output of a numerical analysis program
or data coming from a set of measuring probes, it often takes the form
of one or more columns of numbers, and there can be many of them. If you
want to take one of these output files, and transform the data, Python
can be terribly slow.

It doesn't have to be scanf(), but something that would allow the direct
reading of text input as numerical data would be nice.

On the other hand, if something really needs to be fast, I generally
write it in C anyway :)

Martien
 
S

Simon Forman

That's right. One shouldn't use scanf() if the input is unpredictable
osr comes from people, because the pitfalls are many and hard to avoid.
However, for input that is formatted, scanf() is perfectly fine, and
nice and fast.

fgets() with sscanf() is better to control if your input is not as
guaranteed.


scanf() is not broken. It's just hard to use correctly for unpredictable
input.

Having something equivalent in Python would be nice where most or all of
your input is numerical, i.e. floats or integers. Using the re module,
or splitting and converting everything with int() or float() slows down
your program rather spectacularly. If those conversions could be done
internally, and before storing the input as Python strings, the speed
improvements could be significant.

There is too much storing, splitting, regex matching and converting
going on if you need to read numerical data from columns in a file.
scanf() and friends make this sort of task rather quick and easy.

For example, if your data is the output of a numerical analysis program
or data coming from a set of measuring probes, it often takes the form
of one or more columns of numbers, and there can be many of them. If you
want to take one of these output files, and transform the data, Python
can be terribly slow.

It doesn't have to be scanf(), but something that would allow the direct
reading of text input as numerical data would be nice.

On the other hand, if something really needs to be fast, I generally
write it in C anyway :)

Martien

I haven't tried it but couldn't you use scanf from ctypes?
 
M

Martien Verbruggen

I haven't tried it but couldn't you use scanf from ctypes?

I have just tried it. I wasn't aware of ctypes, being relatively new to
Python. :)

However, using ctypes makes the simple test program I wrote
actually slower, rather than faster. Probably the extra conversions
needed between ctypes internal types and Python's eat op more time.
Built in scanf()-like functionality would not need to convert the same
information two or three times. it would parse the bytes coming in from
the input stream directly, and set the values of the appropriate Python
variable directly.

Contrive an example:
Assume an input file with two integers, and three floats per line,
separated by a space. output should be the same two integers, followed
by the average of the three floats.

In pure python, now, there is string manipulation (file.readline(), and
split()) and conversion of floats going on:

from sys import *
for line in stdin:
a, b, u, v, w = line.split()
print a, " ", b, " ", (float(u) + float(v) + float(w))/3.0

(17.57s user 0.07s system 99% cpu 17.728 total)

With ctypes, it becomes something like:

from sys import *
from ctypes import *
from ctypes.util import find_library

libc = cdll.LoadLibrary(find_library('c'))
a = c_int()
b = c_int()
u = c_float()
v = c_float()
w = c_float()
for line in stdin:
libc.sscanf(line, "%d%d%f%f%f",
byref(a), byref(b), byref(u), byref(v), byref(w))
print "{0} {1} {2}".format(a.value, b.value,
(u.value + v.value + w.value)/3.0)

(22.21s user 0.10s system 98% cpu 22.628)

We no longer need split(), and the three conversions from string to
float(), but now we have the 5 c_types(), and the .value dereferences at
the end. And that makes it slower, unfortunately. (Maybe I'm still doing
things a bit clumsily and it could be faster)

It's not really a big deal: As I said before, if I really need the
speed, I'll write C:

#include <stdio.h>
int main(void)
{
int a, b;
float u, v, w;

while (scanf("%d%d%f%f%f", &a, &b, &u, &v, &w) == 5)
printf("%d %d %f\n", a, b, (u + v + w)/3.0);

return 0;
}

(5.96s user 0.06s system 99% cpu 6.042 total)

Martien
 
T

TerryP

In the last 4 years, I have never missed functions like .*scanf() or
atoi().

It's probably a greeaaat thing that Python provides nether as built
ins (per se).
 
D

Dennis Lee Bieber

In the last 4 years, I have never missed functions like .*scanf() or
atoi().

It's probably a greeaaat thing that Python provides nether as built
ins (per se).

Uhm... Isn't the second one spelled "int()" <G>
 
R

ryniek90

        Uhm... Isn't the second one spelled "int()" <G>



Ok thanks all for answers. Not counting .split() methods and regexps,
there's nothing interesting.
But I remember that lambda function also was unwelcome in Python, but
finally it is and is doing well. So maybe someone, someday decide to
put in Python an alternative, really great implementation of scanf() ?
 
B

Ben Sizer

Hi

I know that in python, we can do the same with regexps or *.split()*,
but thats longer and less practical method than *scanf()*. I also found
that (http://code.activestate.com/recipes/502213/), but the code
doesn't looks so simple for beginners. So, whether it is or has been
planned the core Python implementation of *scanf()* ? (prefered as a
batteries included method)

Perhaps struct.unpack is close to what you need? Admittedly that
doesn't read from a file, but that might not be a problem in most
cases.
 
S

Simon Brunning

2009/10/8 ryniek90 said:
Ok thanks all for answers. Not counting .split() methods and regexps,
there's nothing interesting.
But I remember that lambda function also was unwelcome in Python, but
finally it is and is doing well. So maybe someone, someday decide to
put in Python an alternative, really great implementation of scanf() ?

Write one, post it on Google Code, the Python cookbook or somewhere,
and if the world beats a path to your door then we'll talk.
 
T

Terry Reedy

ryniek90 said:
Ok thanks all for answers. Not counting .split() methods and regexps,
there's nothing interesting.
But I remember that lambda function also was unwelcome in Python, but
finally it is and is doing well. So maybe someone, someday decide to
put in Python an alternative, really great implementation of scanf() ?

scanf does three things: parses string fields out of text, optionally
converts strings to numbers, and puts the results into pointed-to boxes.
Since Python does not have pointer types, a python function cannot very
well do the last, but has to return the tuple of objects. However, if a
format string has named rather than positional fields, a Python function
could either return a dict or set sttributes on an object. That could be
useful.

If I were doing this, I would look into using the new str.format()
strings rather than %-formatted strings.
 
D

Dennis Lee Bieber

Perhaps struct.unpack is close to what you need? Admittedly that
doesn't read from a file, but that might not be a problem in most
cases.
I suspect the biggest drawback is that it doesn't do string->numeric
conversions, so one still has to run int(), float(), whatever() on the
fields.

It works great though if one needs to split up fixed-width records
which may not have delimiters, or is working with binary records in
which the data is already numeric.
 
J

Joshua Kugler

ryniek90 said:
So maybe someone, someday decide to
put in Python an alternative, really great implementation of scanf() ?

My idea of a "great scanf() function" would be a clever combination of
re.match(), int(), and float().

j
 
R

r

One of the fist things I remember being taught as a C progrmmer
was to never use scanf.  Programs that use scanf tend to fail
in rather spectacular ways when presented with simple typos and
other forms of unexpected input.  

Given the bad behavior and general fragility of scanf(), I
doubt there's much demand for something equally broken for
Python.

I don't think you can blame scanf() for that. More the "bad behavior"
of humans and "uncanny" ability of human fingers to press the the
wrong damn keys.
 
A

Aahz

But I remember that lambda function also was unwelcome in Python, but
finally it is and is doing well. So maybe someone, someday decide to
put in Python an alternative, really great implementation of scanf() ?

How long have you been using Python? lambda has been there almost from
the beginning.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"To me vi is Zen. To use vi is to practice zen. Every command is a
koan. Profound to the user, unintelligible to the uninitiated. You
discover truth everytime you use it." (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top