Endless loop "--(end of buffer or a NULL)" in Flex++(windows)

Õ

ÕÅÑï Santa

Recently I am developing an SQL parser with Bison++ and Flex++ under
Windows. The SQL statements we parse may contain Chinese characters in
strings. Sometimes we could successfully parse the Chinese characters,
but most often the lexer will fail, going to an endless loop and
prints "--(end of buffer or a NULL)" continuously.

Since we are parsing statements from std::string, not from input
stream, we #define some macros in our lexer.l. The INPUT_CODE macros
is #define as:

%define INPUT_CODE \
result = 0; \
while (result < max_size && pos < inputText.length()) { \
buffer[result] = inputText[pos]; \
pos++; \
result++; } \
} \
return 0;


the inputText is a member of Lexer, in which we stored the SQL
statement to be parsed. And "pos" is an int member of Lexer which
stores the number of character we've lex'ed.


Could somebody please tell us how to deal with the endless loop "--
(end of buffer or a NULL)" ? Thank you very much! :)
 
Õ

ÕÅÑï Santa

Thanks very much, Helde :).

Chinese character could be stored in std::string, and each Chinese
character will cost 2 char.

As I've mentioned before, this piece of code is for Flex++, so it
seems a bit strange. The initialization of inputText is not posted,
but we are pretty sure it is correct. And "max_size", "buffer" are
variables provided by Flex++. It is made sure by Flex++ that length of
"buffer" is enough to accommodate the Chinese characters.


=?GB2312?B?1cXR7yBTYW50YQ==?= said:
Recently I am developing an SQL parser with Bison++ and Flex++ under
Windows. The SQL statements we parse may contain Chinese characters in
strings. Sometimes we could successfully parse the Chinese characters,
but most often the lexer will fail, going to an endless loop and
prints  "--(end of buffer or a NULL)"  continuously.
Since we are parsing statements from std::string, not from input

std::string is defined on 8-bit chars, which do not accomodate any of
Chinese character codes by itself, so you must be using some kind of
multi-bute encoding, which one? Does it contain zero bytes?

How do you initialise inputText?
stream, we #define some macros in our lexer.l.  The INPUT_CODE macros
is #define  as:
%define INPUT_CODE                                          \
  result = 0;                                               \
  while (result < max_size && pos < inputText.length()) {   \
    buffer[result] = inputText[pos];                        \
    pos++;                                                  \
    result++;                    }                           \
  }                                                        \
  return 0;

What's max_size? The length of the buffer? Is the allocated buffer long
enough to accommodate the extra bytes for the Chinese characters,
whatever the encoding?

hth
Paavo


the inputText is a member of Lexer, in which we stored the SQL
statement to be parsed. And "pos" is an int member of Lexer which
stores the number of character we've lex'ed.
Could somebody please tell us how to deal with the endless loop  "--
(end of buffer or a NULL)"   ? Thank you very much!  :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top