Is this a bug in int()?

M

MartinRinehart

int('0x', 16)
0

I'm working on a tokenizer and I'm thinking about returning a
MALFORMED_NUMBER token (1.2E, .5E+)
 
F

Fredrik Lundh

0

I'm working on a tokenizer and I'm thinking about returning a
MALFORMED_NUMBER token (1.2E, .5E+)

Somewhat surprisingly, "0x" is a valid integer literal in Python:
0

</F>
 
D

Duncan Booth

(e-mail address removed) wrote under the subject line "Is this a
bug in int()?":
I think it is a general problem in the tokenizer, not just the 'int'
constructor. The syntax for integers says:

hexinteger ::= "0" ("x" | "X") hexdigit+

but 0x appears to be accepted in source code as an integer.

If I were you, I'd try reporting it as a bug.
I'm working on a tokenizer and I'm thinking about returning a
MALFORMED_NUMBER token (1.2E, .5E+)
Why would you return a token rather than throwing an exception?
 
T

Terry Reedy

| (e-mail address removed) wrote under the subject line "Is this a
| bug in int()?":
| >>>>int('0x', 16)
| > 0
| >
| I think it is a general problem in the tokenizer, not just the 'int'
| constructor. The syntax for integers says:
|
| hexinteger ::= "0" ("x" | "X") hexdigit+
|
| but 0x appears to be accepted in source code as an integer.
|
| If I were you, I'd try reporting it as a bug.

The mismatch between doc and behavior certainly is a bug.
One should change.
 
M

MartinRinehart

Duncan said:
Why would you return a token rather than throwing an exception?

Tokenizers have lots of uses. Colorizing text in an editor, for
example. We've got a MALFORMED_NUMBER when you type '0x'. We've got an
INTEGER when we get your next keystroke (probably).
 
M

MartinRinehart

Tokenizer accepts "0x" as zero. Spec says its an error not to have at
least one hex digit after "0x".

This is a more serious bug than I had originally thought. Consider
this:

Joe types "security_code = 0x" and then goes off to the Guardian-of-
the-Codes to get the appropriate hex string. Returning to computer,
Joe's boss grabs him. Tells him that effective immediately he's on the
"rescue us from this crisis" team; his other project can wait.

Some hours, days or weeks later Joe returns to the first project. At
this point Joe has a line of code that says "security_code = 0x". I
think Joe would be well-served by a compiler error on that line. As is
now, Joe's program assigns 0 to security_code and compiles without
complaint. I'm pretty sure any line of the form "name = 0x" was a
product of some form of programmer interruptus.
 
G

George Sakkis

Tokenizer accepts "0x" as zero. Spec says its an error not to have at
least one hex digit after "0x".

This is a more serious bug than I had originally thought. Consider
this:

Joe types "security_code = 0x" and then goes off to the Guardian-of-
the-Codes to get the appropriate hex string. Returning to computer,
Joe's boss grabs him. Tells him that effective immediately he's on the
"rescue us from this crisis" team; his other project can wait.

Some hours, days or weeks later Joe returns to the first project. At
this point Joe has a line of code that says "security_code = 0x". I
think Joe would be well-served by a compiler error on that line. As is
now, Joe's program assigns 0 to security_code and compiles without
complaint. I'm pretty sure any line of the form "name = 0x" was a
product of some form of programmer interruptus.

:) Are you a fiction writer by any chance ? Nice story but I somehow
doubt that the number of lines of the form "name = 0x" ever written in
Python is greater than a single digit (with zero the most likely one).

George
 
S

Steven D'Aprano

Tokenizer accepts "0x" as zero. Spec says its an error not to have at
least one hex digit after "0x".

This is a more serious bug than I had originally thought. Consider this:

Joe types "security_code = 0x" and then goes off to the Guardian-of-
the-Codes to get the appropriate hex string.

Which is *hard coded* in the source code??? How do you revoke a
compromised code, or add a new one?

Let me guess... the Guardian of the Codes stores them on a postit note
stuck to the side of the fridge in the staff lunchroom? Written
backwards, so nobody can guess what they really are.

Returning to computer,
Joe's boss grabs him. Tells him that effective immediately he's on the
"rescue us from this crisis" team; his other project can wait.

Serves him write for writing in hex, when everybody knows that for *real*
security you should store your security codes as octal.

Some hours, days or weeks later Joe returns to the first project. At
this point Joe has a line of code that says "security_code = 0x". I
think Joe would be well-served by a compiler error on that line.

*shrug*

Maybe so, but how is that worse than if he had written "security_code =
123456" intending to come back and put the actual code in later, and
forgot?

As is
now, Joe's program assigns 0 to security_code and compiles without
complaint.

Which Joe will *instantly* discover, the first time he tries to test the
program and discovers that entering the *actual* security code doesn't
work.

I'm pretty sure any line of the form "name = 0x" was a
product of some form of programmer interruptus.

There's no doubt that 0x is a bug, according to the current specs. It
probably should be fixed (as opposed to changing the specs). But trying
to up-sell a trivial bug into a OMG The Sky Is Falling This Is A Huge
Security Risk Panic Panic Panic just makes you seem silly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,444
Messages
2,571,709
Members
48,796
Latest member
Greg L.
Top