Tokenizer inconsistency wrt to new lines in comments

George Sakkis · Apr 4, 2008

The tokenize.generate_tokens function seems to handle in a context-
sensitive manner the new line after a comment:
.... # hello world
.... x = (
.... # hello world
.... )
.... '''.... print repr(t[1])
....
'\n'
'# hello world\n'
'x'
'='
'('
'\n'
'# hello world'
'\n'
')'
'\n'
''

Is there a reason that the newline is included in the first comment
but not in the second, or is it a bug ?

George

Kay Schluehr · Apr 4, 2008

The tokenize.generate_tokens function seems to handle in a context-
sensitive manner the new line after a comment:

... # hello world
... x = (
... # hello world
... )
... '''

... print repr(t[1])
...
'\n'
'# hello world\n'
'x'
'='
'('
'\n'
'# hello world'
'\n'
')'
'\n'
''

Is there a reason that the newline is included in the first comment
but not in the second, or is it a bug ?

George

I guess it's just an artifact of handling line continuations within
expressions where a different rule is applied. For compilation
purposes both the newlines within expressions as well as the comments
are irrelevant. There are even two different token namely NEWLINE and
NL which are produced for newlines. NL and COMMENT will be ignored.
NEWLINE is relevant for the parser.

If it was a bug it has to violate a functional requirement. I can't
see which one.

Kay

George Sakkis · Apr 4, 2008

I guess it's just an artifact of handling line continuations within
expressions where a different rule is applied. For compilation
purposes both the newlines within expressions as well as the comments
are irrelevant. There are even two different token namely NEWLINE and
NL which are produced for newlines. NL and COMMENT will be ignored.
NEWLINE is relevant for the parser.

If it was a bug it has to violate a functional requirement. I can't
see which one.

Perhaps it's not a functional requirement but it came up as a real
problem on a source colorizer I use. I count on newlines generating
token.NEWLINE or tokenize.NL tokens in order to produce <br> tags. It
took me some time and head scratching to find out why some comments
were joined together with the following line. Now I have to check
whether a comment ends in new line and if it does output an extra <br>
tag.. it works but it's a kludge.

George

Fredrik Lundh · Apr 4, 2008

George said:
Perhaps it's not a functional requirement but it came up as a real
problem on a source colorizer I use. I count on newlines generating
token.NEWLINE or tokenize.NL tokens in order to produce <br> tags. It
took me some time and head scratching to find out why some comments
were joined together with the following line. Now I have to check
whether a comment ends in new line and if it does output an extra <br>
tag.. it works but it's a kludge.

well, the real kludge here is of course that you're writing your own
colorizer, when you can just go and grab Pygments:

http://pygments.org/

or, if you prefer something tiny and self-contained, something like the
colorizer module in this directory:

http://svn.effbot.org/public/stuff/sandbox/pythondoc/

(the element_colorizer module in the same directory gives you XHTML in
an ElementTree instead of raw HTML, if you want to postprocess things)

</F>

George Sakkis · Apr 4, 2008

well, the real kludge here is of course that you're writing your own
colorizer, when you can just go and grab Pygments:

http://pygments.org/

or, if you prefer something tiny and self-contained, something like the
colorizer module in this directory:

http://svn.effbot.org/public/stuff/sandbox/pythondoc/

(the element_colorizer module in the same directory gives you XHTML in
an ElementTree instead of raw HTML, if you want to postprocess things)

</F>

First off, I didn't write it from scratch, I just tweaked a single
module colorizer I had found online. Second, whether I or someone else
had to deal with it is irrelevant; the point is that generate_tokens()
is not consistent with respect to new lines after comments.

George

Help with importing from multiple files and printing lines in designated spot to spit out one file.	1	Jan 16, 2023
Inconsistency with split() - Script, OS, or Package Problem?	3	May 9, 2011
readline tokenizer newline sticky wicket	0	Feb 7, 2006
Adding new lines to word document using zipfile module within python 2.7?	0	Aug 27, 2013
How to "find" new lines	10	Mar 25, 2011
Python and PEP8 - Recommendations on breaking up long lines?	19	Nov 28, 2013
comments? storing a function in an object	9	Jul 20, 2009
Problem with Comments in Emacs (want them to stop aligning)	0	Sep 5, 2012

Tokenizer inconsistency wrt to new lines in comments

George Sakkis

Kay Schluehr

George Sakkis

Fredrik Lundh

George Sakkis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads