readline tokenizer newline sticky wicket

Arthur · Feb 7, 2006

Given a "linemess.py" file with inconsistent line ending:

line 1 \r
\r\n
line \n

tokenized as per:

import tokenize
f=open('linemess.py','r')
tokens=tokenize.generate_tokens(f.readline)
for t in tokens:
print t

get output as follows:

(1, 'line', (1, 0), (1, 4), 'line 1\r\n')
(2, '1', (1, 5), (1, 6), 'line 1\r\n')
(4, '\r\n', (1, 6), (1, 8), 'line 1\r\n')
(1, 'line', (2, 0), (2, 4), 'line 2\n')
(2, '2', (2, 5), (2, 6), 'line 2\n')
(4, '\n', (2, 6), (2, 7), 'line 2\n')
(0, u'', (3, 0), (3, 0), u'')

So that the Windows \r\n is tokenized as a single literal token rather
than as \n under the convention of universal newline support.

Isn't this a problem?

I think this must have been at the route of the issue I ran into when a
file of messy inconsistent line ending that nonetheless compiled and ran
without a problem was rejected by tokenizer.py as having an indent issue.

On the theory that if tokenizer needs to fail when crap is thrown at it,
it should do so more gracefully - is this bug reportable?

Art

Can't solve problems! please Help	0	Sep 26, 2022
Minimum Total Difficulty	0	Nov 15, 2023
Calculate rang and derang of ordering of subsets	1	Feb 6, 2024
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Tic Tac Toe Game	2	Mar 10, 2024
How to remove the undefined thing?	1	Oct 19, 2022
Remote SSH and Configuring code help	0	Dec 13, 2023
URGENT	1	Jan 31, 2023

readline tokenizer newline sticky wicket

Arthur

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads