eval to dict problems NEWB going crazy !

M

manstey

Hi,

I have a text file called a.txt:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

I read it using this:

filAnsMorph = codecs.open('a.txt', 'r', 'utf-8') # Initialise input
file
dicAnsMorph = {}
for line in filAnsMorph:
if line[0] != '#': # Get rid of comment lines
x = eval(line)
dicAnsMorph[x[0][1]] = x[1][1] # recid is key, parse dict is
value

But it crashes every time on x = eval(line). Why is this? If I change
a.txt to:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

it works fine. Why doesn't it work with multiple lines? it's driving me
crazy!

Thanks,
Matthew
 
B

Bruno Desthuilliers

manstey said:
Hi,

I have a text file called a.txt:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

I read it using this:

filAnsMorph = codecs.open('a.txt', 'r', 'utf-8') # Initialise input
file
dicAnsMorph = {}
for line in filAnsMorph:
if line[0] != '#': # Get rid of comment lines
x = eval(line)
dicAnsMorph[x[0][1]] = x[1][1] # recid is key, parse dict is
value

But it crashes every time on x = eval(line). Why is this? If I change
a.txt to:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

it works fine. Why doesn't it work with multiple lines? it's driving me
crazy!

try with:
x = eval(line.strip('\n'))
 
M

manstey

That doesn't work. I just get an error:

x = eval(line.strip('\n'))
File "<string>", line 1
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

SyntaxError: unexpected EOF while parsing


any other ideas?

Bruno said:
manstey said:
Hi,

I have a text file called a.txt:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

I read it using this:

filAnsMorph = codecs.open('a.txt', 'r', 'utf-8') # Initialise input
file
dicAnsMorph = {}
for line in filAnsMorph:
if line[0] != '#': # Get rid of comment lines
x = eval(line)
dicAnsMorph[x[0][1]] = x[1][1] # recid is key, parse dict is
value

But it crashes every time on x = eval(line). Why is this? If I change
a.txt to:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

it works fine. Why doesn't it work with multiple lines? it's driving me
crazy!

try with:
x = eval(line.strip('\n'))
 
E

Eric Deveaud

manstey said:
That doesn't work. I just get an error:

x = eval(line.strip('\n'))
File "<string>", line 1
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

SyntaxError: unexpected EOF while parsing

is the last line of your file empty ??

what with

for line in filAnsMorph:
# remove any trailing and leading whitespace includes removing \n
line = line.strip()
# Get rid of comment lines
if line.startswith('#'):
continue
# Get rid of blank line
if line == '':
continue
#do the job
x = eval(line)


NB by default strip() removes leading and trailing characters from the target
string. with whitspace defined as whitespace = '\t\n\x0b\x0c\r '

Eric
 
F

Fredrik Lundh

manstey said:
That doesn't work. I just get an error:

x = eval(line.strip('\n'))
File "<string>", line 1
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

SyntaxError: unexpected EOF while parsing

any other ideas?

hint 1:
eval("[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]\n") [('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
eval("[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]")
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

hint 2:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 0

^
SyntaxError: unexpected EOF while parsingTraceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1

^
SyntaxError: unexpected EOF while parsing

hint 3: adding a "print" statement *before* the offending line is often a good way
to figure out why something's not working. "repr()" is also a useful thing:

if line[0] != '#': # Get rid of comment lines
print repr(line) # DEBUG: let's see what we're trying to evaluate
x = eval(line)
dicAnsMorph[x[0][1]] = x[1][1] # recid is key, parse dict is

</F>
 
R

Roel Schroeven

manstey schreef:
Hi,

I have a text file called a.txt:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

I read it using this:

filAnsMorph = codecs.open('a.txt', 'r', 'utf-8') # Initialise input
file
dicAnsMorph = {}
for line in filAnsMorph:
if line[0] != '#': # Get rid of comment lines
x = eval(line)
dicAnsMorph[x[0][1]] = x[1][1] # recid is key, parse dict is
value

But it crashes every time on x = eval(line). Why is this? If I change
a.txt to:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]

it works fine. Why doesn't it work with multiple lines? it's driving me
crazy!

It looks like it's because of the trailing newline. When you read a file
like that, the newline at the end of each line is still in line. You can
strip it e.g. with rstrip, like so:

x = eval(line.rstrip('\n'))
 
F

Fredrik Lundh

hint 1:

hint 1b:
eval("[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]") [('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
eval("[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]\n") [('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
eval("[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]\r\n")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
^
SyntaxError: invalid syntax

</F>
 
S

Steven D'Aprano

Hi,

I have a text file called a.txt:

# comments
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

I read it using this:

filAnsMorph = codecs.open('a.txt', 'r', 'utf-8') # Initialise input
file
dicAnsMorph = {}
for line in filAnsMorph:
if line[0] != '#': # Get rid of comment lines
x = eval(line)
dicAnsMorph[x[0][1]] = x[1][1] # recid is key, parse dict is
value

But it crashes every time on x = eval(line). Why is this?

Some people have incorrectly suggested the solution is to remove the
newline from the end of the line. Others have already pointed out one
possible solution.

I'd like to ask, why are you using eval in the first place?

The problem with eval is that it is simultaneously too finicky and too
powerful. It is finicky -- it has problems with lines ending with a
carriage return, empty lines, and probably other things. But it is also
too powerful. Your program wants a specific piece of data, but eval
will accept any string which is a valid Python expression. eval is quite
capable of giving you a dictionary, or an int, or just about anything --
and, depending on your code, you might not find out for a long time,
leading to hard-to-debug bugs.

Is your data under your control? Could some malicious person inject data
into your file a.txt? If so, you should be aware of the security
implications:

# comment
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
# line injected by a malicious user
"__import__('os').system('echo if I were bad I could do worse')"
[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

Now, if the malicious user can only damage their own system, maybe you
don't care -- but the security hole is there. Are you sure that no
malicious third party, given *only* write permission to the file a.txt,
could compromise your entire system?

Personally, I would never use eval on any string I didn't write myself. If
I was thinking about evaluating a user-string, I would always write a
function to parse the string and accept only the specific sort of data I
expected. In your case, a quick-and-dirty untested function might be:

def parse(s):
"""Parse string s, and return a two-item list like this:

[tuple(string, integer), tuple(string, dict(string: string)]
"""

def parse_tuple(s):
"""Parse a tuple with two items exactly."""
s = s.strip()
assert s.startswith("(")
assert s.endswith(")")
a, b = s[1:-1].split(",")
return (a.strip(), b.strip())

def parse_dict(s):
"""Parse a dict with two items exactly."""
s = s.strip()
assert s.startswith("{")
assert s.endswith("}")
a, b = s[1:-1].split(",")
key1, value1 = a.strip().split(":")
key2, value2 = b.strip().split(":")
return {key1.strip(): value1.strip(), key2.strip(): value2.strip()}

def parse_list(s):
"""Parse a list with two items exactly."""
s = s.strip()
assert s.startswith("[")
assert s.endswith("]")
a, b = s[1:-1].split(",")
return [a.strip(), b.strip()]

# Expected format is something like:
# [tuple(string, integer), tuple(string, dict(string: string)]
L = parse_list(s)
T0 = parse_tuple(L[0])
T1 = parse_tuple(L[1])
T0 = (T0[0], int(T0[1]))
T1 = (T1[0], parse_dict(T1[1]))
return [T0, T1]


That's a bit more work than eval, but I believe it is worth it.
 
A

Ant

[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
# line injected by a malicious user
"__import__('os').system('echo if I were bad I could do worse')"
[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

I'm curious, if you disabled import, could you make eval safe?

For example:
if I were bad I could do worse
0Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 0, in ?
AttributeError: 'NoneType' object has no attribute 'system'

So, it seems to be possible to disable access to imports, but is this
enough? Are there other ways to access modules, or do damage via
built-in commands?

It seems that there must be a way to use eval safely, as there are
plenty of apps that embed python as a scripting language - and what's
the point of an eval function if impossible to use safely, and you have
to write your own Python parser!!
 
F

Fredrik Lundh

Ant said:
It seems that there must be a way to use eval safely, as there are
plenty of apps that embed python as a scripting language - and what's
the point of an eval function if impossible to use safely, and you have
to write your own Python parser!!

embedding python != accepting scripts from anywhere.

</F>
 
F

Fredrik Lundh

Steven said:
Personally, I would never use eval on any string I didn't write myself. If
I was thinking about evaluating a user-string, I would always write a
function to parse the string and accept only the specific sort of data I
expected. In your case, a quick-and-dirty untested function might be:

for a more robust approach, you can use Python's tokenizer module,
together with the iterator-based approach described here:

http://online.effbot.org/2005_11_01_archive.htm#simple-parser-1

here's a (tested!) variant that handles lists and dictionaries as well:

import cStringIO, tokenize

def sequence(next, token, end):
out = []
token = next()
while token[1] != end:
out.append(atom(next, token))
token = next()
if token[1] == "," or token[1] == ":":
token = next()
return out

def atom(next, token):
if token[1] == "(":
return tuple(sequence(next, token, ")"))
elif token[1] == "[":
return sequence(next, token, "]")
elif token[1] == "{":
seq = sequence(next, token, "}")
res = {}
for i in range(0, len(seq), 2):
res[seq] = seq[i+1]
return res
elif token[0] in (tokenize.STRING, tokenize.NUMBER):
return eval(token[1]) # safe use of eval!
raise SyntaxError("malformed expression (%s)" % token[1])

def simple_eval(source):
src = cStringIO.StringIO(source).readline
src = tokenize.generate_tokens(src)
src = (token for token in src if token[0] is not tokenize.NL)
res = atom(src.next, src.next())
if src.next()[0] is not tokenize.ENDMARKER:
raise SyntaxError("bogus data after expression")
return res

(now waiting for paul to post the obligatory pyparsing example).

</F>
 
S

Steven D'Aprano

[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
# line injected by a malicious user
"__import__('os').system('echo if I were bad I could do worse')"
[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

I'm curious, if you disabled import, could you make eval safe?

Safer, but possibly not safe.
For example:

if I were bad I could do worse
0
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 0, in ?
AttributeError: 'NoneType' object has no attribute 'system'

So, it seems to be possible to disable access to imports, but is this
enough? Are there other ways to access modules, or do damage via
built-in commands?

Does your code already import os? Then there is no need for the import at
all.

eval("os.system('echo BOOM!')",{'__import__': lambda x:None})

Or, we can do this:

bomb = """eval("__import__('os').system('echo BOOM!')", __builtins__)"""
eval(bomb, {'__import__': None})

The obvious response is to block eval:

eval(bomb, {'__import__': None, 'eval': None})

Does this make it safe now? I don't know -- I've hunted around for ten
minutes trying to break it, and haven't, but that might just mean I'm not
enough of a hacker or thinking deviously enough. Possibly eval() is more
limited, and therefore "safer", than exec, but I wouldn't want to risk
real data on that assumption.

Of course, this approach only protects against one class of attacks.
Suppose Evil J. Cracker has write access to your file, and is happy enough
with just a denial of service attack:

[('recId', 3), ('parse', {'pos': u'np'*1024**4, 'gen': u'm'})]

Do you have a couple of terrabytes of free memory on your system?

Of course, if your code is only going to be used by *trusted* users, then
you don't have to worry about malicious attacks. You do have to worry
about accidental bugs though. What if one of the lines is missing a
delimiter or otherwise malformed? The call to eval() will fail, and your
code will halt. Is that what you want, or is it better to skip over the
bad data and continue? (A try...except... block could be useful here.)

Anyway, eval is a legitimate tool to use, although it is often over-kill
for the tasks people use it for. In the Original Poster's example, he
doesn't really want to evaluate an arbitrary Python expression, he wants
to evaluate a specific data structure.

It seems that there must be a way to use eval safely,

"Must" does not mean "I wish there was".
as there are
plenty of apps that embed python as a scripting language -

As Fredrik points out, embedded Python isn't the same as running
untrusted code. The reality is, Python has not been designed for running
untrusted code safely. There was an attempt at a restricted-execution
module, but Guido decided to remove it -- see this thread here for his
reasoning:

http://mail.python.org/pipermail/python-dev/2002-December/031160.html
and what's
the point of an eval function if impossible to use safely, and you have
to write your own Python parser!!

As for eval, it's a sledge-hammer. Sledge-hammers are legitimate tools,
for when you need one. eval is for evaluating arbitrary Python
expressions -- my rule of thumb (yours may be different) is that any time
I expect arbitrary data, eval is the right tool for the job, but if I
expect *specific* data, I use something else.

Imagine if the only way to get an integer was by calling eval on the
string -- I think we'd all agree that would be a bad move. Instead we have
a function which does nothing but convert strings (well, any object
really) to integers: int. It would be great if Python included tools to do
the same for dicts and lists, reducing the need for people to use a
sledge-hammer.

Anyway, my point was that you, the developer, have to weigh up the costs
and benefits of eval over a custom parser. The benefit is that eval is
already there, built-in and debugged. The costs are that it can be
insecure, and that it doesn't give you fine control over what data you
parse or how forgiving the parser is.

After that, the decision is yours.
 
S

Sion Arrowsmith

Fredrik Lundh said:
embedding python != accepting scripts from anywhere.

And also using eval (or exec or execfile) != accepting scripts from
anywhere. You've got to consider where the data can have come from
and what (broad) context it's being eval()'d in. Last time I did
something like this was with execfile for advanced configuration of
a server, and if a hostile party were in a position to inject
malicious code into *that* then subversion of our program would be
the least of anyone's concern.
 
A

Ant

As Fredrik points out, embedded Python isn't the same as running
untrusted code. The reality is, Python has not been designed for running
untrusted code safely.

So how do python app's typically embed python? For example things like
Zope and idle are scripted using Python - presumably they restrict the
execution of the scripts to a restricted set of modules/objects - but
how is this done?

Perhaps idle doesn't require safety from untrusted code, but surely
Zope does. So there must be some way of executing arbitrary untrusted
code in an app within some kind of sandbox...
 
F

Fredrik Lundh

Ant said:
So how do python app's typically embed python? For example things like
Zope and idle are scripted using Python - presumably they restrict the
execution of the scripts to a restricted set of modules/objects - but
how is this done?

why? anyone capable of adding code to idle already has access to
everything that code can access...
Perhaps idle doesn't require safety from untrusted code, but surely
Zope does. So there must be some way of executing arbitrary untrusted
code in an app within some kind of sandbox...

afaik, zope uses a custom parser.

</F>
 
D

Dennis Lee Bieber

So how do python app's typically embed python? For example things like
Zope and idle are scripted using Python - presumably they restrict the

Zope, at least the version I'd played with, required nearly all
Python code to be "installed" as a "Product" before it could be used.
This installation required the Zope administrator -- said administrator,
thereby, had to vet the code as "trusted".

DTMF did have a sort of one-line pass-through of Python, though it
wasn't really needed for most things; one could access variables through
DTMF notation itself.

Zope had also been a few Python versions behind, as it made use of
modules that had been removed as unsafe.


Idle is not an "embedded" situation; it is a full-up Python
application.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top