readline tokenizer newline sticky wicket

A

Arthur

Given a "linemess.py" file with inconsistent line ending:

line 1 \r
\r\n
line \n

tokenized as per:

import tokenize
f=open('linemess.py','r')
tokens=tokenize.generate_tokens(f.readline)
for t in tokens:
print t

get output as follows:

(1, 'line', (1, 0), (1, 4), 'line 1\r\n')
(2, '1', (1, 5), (1, 6), 'line 1\r\n')
(4, '\r\n', (1, 6), (1, 8), 'line 1\r\n')
(1, 'line', (2, 0), (2, 4), 'line 2\n')
(2, '2', (2, 5), (2, 6), 'line 2\n')
(4, '\n', (2, 6), (2, 7), 'line 2\n')
(0, u'', (3, 0), (3, 0), u'')

So that the Windows \r\n is tokenized as a single literal token rather
than as \n under the convention of universal newline support.

Isn't this a problem?

I think this must have been at the route of the issue I ran into when a
file of messy inconsistent line ending that nonetheless compiled and ran
without a problem was rejected by tokenizer.py as having an indent issue.

On the theory that if tokenizer needs to fail when crap is thrown at it,
it should do so more gracefully - is this bug reportable?

Art
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,156
Latest member
KetoBurnSupplement
Top