Block comments

M

MartinRinehart

Tomorrow is block comment day. I want them to nest. I think the reason
that they don't routinely nest is that it's a lot of trouble to code.
Two questions:

1) Given a start and end location (line position and char index) in an
array of lines of text, how do you Pythonly extract the whole block
comment? (Goal: not to have Bruno accusing me - correctly - of writing
C in Python.)

2) My tokenizer has a bunch of module-level constants including ones
that define block comment starts/ends. Suppose I comment that code
out. This is the situation:

/* start of block comment
....
BLOCK_COMMENT_END_CHARS = '*/'
....
end of block comment */

Is this the reason for """?

(If this is a good test of tokenizer smarts, cpp and javac flunked.)
 
B

Bruno Desthuilliers

(e-mail address removed) a écrit :
Tomorrow is block comment day. I want them to nest. I think the reason
that they don't routinely nest is that it's a lot of trouble to code.
Indeed.

Two questions:

1) Given a start and end location (line position and char index) in an
array of lines of text, how do you Pythonly extract the whole block
comment? (Goal: not to have Bruno accusing me - correctly - of writing
C in Python.)

Is the array of lines the appropriate data structure here ?
2) My tokenizer has a bunch of module-level constants including ones
that define block comment starts/ends. Suppose I comment that code
out. This is the situation:

/* start of block comment
...
BLOCK_COMMENT_END_CHARS = '*/'
...
end of block comment */

Is this the reason for """?

Triple-quoted strings are not comments, they are a way to build
multilines string litterals. The fact is that they are commonly used for
doctrings - for obvious reasons - but then it's the position of this
string litteral that makes it a docstring, not the fact that it's
triple-quoted.

wrt/ your above example, making it a legal construct imply that you
should not consider the block start/end markers as comment markers if
they are enclosed in string-litteral markers.

Now this doesn't solve the problem of nested block comments. Here, I
guess the solution would be to only allow fully nested block comments -
that is, the nested block *must* be opened *and* closed within the
parent block. In which case it should not be harder to parse than any
other nested construct.

While we're at it, you may not know but there are already a couple
Python packages for building tokenizers/parsers - could it be the case
that you're guilty of ReinventingTheSquaredWheel(tm) ?-)

My 2 cents...
 
M

MartinRinehart

Bruno said:
Is the array of lines the appropriate data structure here ?

I've done tokenizers both as an array of lines and as a long string.
The former has seemed easier when the language treats EOL as a
statement separator.

re not letting literal strings in code terminate blocks, I think its
the tokenizer-writer's job to be nice to the tokenizer users, the
first one of which will be me, and I'll definitely have string
literals that enclose what would otherwise be a block end marker.
While we're at it, you may not know but there are already a couple
Python packages for building tokenizers/parsers

The tokenizer in the Python library is pretty close to what I want,
but it returns tuples, where I want an array of Token objects. It also
reads the source a line at a time, which seems a bit out of date.
Maybe two or three decades out of date.

Actually, it takes about a day to write a reasonable tokenizer. (That
is, if you are writing using a language that you know.) Since I know
the problem thoroughly, it seemed like a good starting point for
learning Python.

There's a tokenizer I wrote in java at http://www.MartinRinehart.com/src/language/Tokenizer.html
.. Actually, that's an HTML page written by my "javasrc" (parallel to
Sun's javadoc) based on the Tokenizer's tokenizing of its own source.

Have I got those quotes right?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top