Possible bug in string handling (with kludgy work-around)

Charles Hixson · Dec 26, 2011

This doesn't cause a crash, but rather incorrect results.

self.wordList = ["The", "quick", "brown", "fox", "carefully",
"jumps", "over", "the", "lazy", "dog", "as", "it",
"stealthily", "wends", "its", "way", "homewards", '\b.']
for i in range (len (self.wordList) ):
if not isinstance(self.wordList, str):
self.wordList = ""
elif self.wordList != "" and self.wordList[0] == "\b":
print ("0: wordList[", i, "] = \"", self.wordList, "\"", sep
= "")
print ("0a: wordList[", i, "][1] = \"", self.wordList[1],
"\"", sep = "")
tmp = self.wordList[1] ## !! Kludge --
remove tmp to see the error
self.wordList = tmp + self.wordList[1:-1] ## !!
Kludge -- remove tmp + to see the error
print ("1: wordList[", i, "] = \"", self.wordList, "\"", sep
= "")
print ("len(wordList[", i, "]) = ", len(self.wordList) )

Rick Johnson · Dec 26, 2011

This doesn't cause a crash, but rather incorrect results.

self.wordList = ["The", "quick", "brown", "fox", "carefully",
"jumps", "over", "the", "lazy", "dog","as", "it",
"stealthily", "wends", "its", "way", "homewards", '\b.']
for i in range (len (self.wordList) ):
if not isinstance(self.wordList, str):
self.wordList = ""
elif self.wordList != "" and self.wordList[0] == "\b":
print ("0: wordList[", i, "] = \"", self.wordList, "\"", sep
= "")
print ("0a: wordList[", i, "][1] = \"", self.wordList[1],
"\"", sep = "")
tmp = self.wordList[1] ## !! Kludge --
remove tmp to see the error
self.wordList = tmp + self.wordList[1:-1] ## !!
Kludge -- remove tmp + to see the error
print ("1: wordList[", i, "] = \"", self.wordList, "\"", sep
= "")
print ("len(wordList[", i, "]) = ", len(self.wordList) )

Handy rules for reporting bugs:

1. Always format code properly.
2. Always trim excess fat from code.
3. Always include relative dependencies ("self.wordlist" is only valid
inside a class. In this case, change the code to a state that is NOT
dependent on a class definition.)

Most times after following these simple rules, you'll find egg on your
face BEFORE someone else has a chance to see it and ridicule you.

Chris Angelico · Dec 26, 2011

Handy rules for reporting bugs:

1. Always format code properly.
2. Always trim excess fat from code.
3. Always include relative dependencies ("self.wordlist" is only valid
inside a class. In this case, change the code to a state that is NOT
dependent on a class definition.)

Most times after following these simple rules, you'll find egg on your
face BEFORE someone else has a chance to see it and ridicule you.

4. Don't take it personally when a known troll insults you. His
advice, in this case, is valid; but don't feel that you're going to be
ridiculed. We don't work that way on this list.

ChrisA

Steven D'Aprano · Dec 27, 2011

This doesn't cause a crash, but rather incorrect results.

Charles, your code is badly formatted and virtually unreadable. You have
four spaces between some tokens, lines are too long to fit in an email or
News post without word-wrapping. It is a mess of unidiomatic code filled
with repeated indexing and unnecessary backslash escapes.

You also don't tell us what result you expect, or what result you
actually get. What is the intention of the code? What are you trying to
do, and what happens instead?

The code as given doesn't run -- what's self?

Despite all these problems, I can see one obvious problem in your code:
you test to see if self.wordList is a string, and if not, you replace
the *entire* wordList with the empty string. That is unlikely to do what
you want, although I admit I'm guessing what you are trying to do (since
you don't tell us).

Some hints for you:

(1) Python has two string delimiters, " and ' and you should use them
both. Instead of hard-to-read backslash escapes, just swap delimiters:

print "A string including a \" quote mark." # No!
print 'A string including a " quote mark.' # Yes, much easier to read.

The only time you should backslash-escape a quotation mark is if you need
to include both sorts in a single string:

print "Python has both single ' and double \" quotation marks."
print 'Python has both single \' and double " quotation marks.'

(2) Python is not Pascal, or whatever language you seem to be writing in
the style of. You almost never should write for-loops like this:

for i in range(len(something)):
print something

Instead, you should just iterate over "something" directly:

for obj in something:
print obj

If you also need the index, use the enumerate function:

for i,obj in enumerate(something):
print obj, i

If you are forced to use an ancient version of Python without enumerate,
do yourself a favour and write your loops like this:

for i in range(len(something)):
obj = something
print obj, i

instead of repeatedly indexing the list over and over and over and over
again, as you do in your own code. The use of a temporary variable makes
the code much easier to read and understand.

Dennis Lee Bieber · Dec 27, 2011

The only time you should backslash-escape a quotation mark is if you need
to include both sorts in a single string:

print "Python has both single ' and double \" quotation marks."
print 'Python has both single \' and double " quotation marks.'

You can get by without the backslash in this situation too, by using
triple quoting:

print """Python has both single ' and double " quotation marks."""
(substitute ''' for """ if it looks better to you, as long as you use
the same marker at both ends. I find """ clearer, ''' could be a " and '
packed tightly in some fonts, "', whereas """ can only be one construct)

Rick Johnson · Dec 27, 2011

--
Note: superfluous indention removed for clarity!
--

You can get by without the backslash in this situation too, by using
triple quoting:

I would not do that because:
1. Because Python already has TWO string literal delimiters (' and ")
2. Because triple quote string literals are SPECIFICALLY created to
solve the "multi-line issue"
3. Because you can confuse the hell out of someone who is reading
Python code and they may miss the true purpose of triple quotes in
Python

But this brings up a very important topic. Why do we even need triple
quote string literals to span multiple lines? Good question, and one i
have never really mused on until now. It's amazing how much BS we just
accept blindly! WE DON'T NEED TRIPLE QUOTE STRINGS! What we need is
single quote strings that span multiple lines and triple quotes then
become superfluous! For the problem of embedding quotes in string
literals, we should be using markup. A SIMPLISTIC MARKUP!

" This is a multi line
string with a single quote --> <SQ>
and a double quote --> <DQ>. Here is an
embedded newline --> <NL>. And a backspace <BS>.

Now we can dispense with all the BS!
"

I find """ clearer, ''' could be a " and '
packed tightly in some fonts, "', whereas """ can only be one construct)

Another reason to ONLY use fixed width font when viewing code! Why
would you use ANY font that would obscure chars SO ubiquitous as " and
'?

Lie Ryan · Dec 27, 2011

--
Note: superfluous indention removed for clarity!
--

I would not do that because:
1. Because Python already has TWO string literal delimiters (' and ")
2. Because triple quote string literals are SPECIFICALLY created to
solve the "multi-line issue"
3. Because you can confuse the hell out of someone who is reading
Python code and they may miss the true purpose of triple quotes in
Python

But this brings up a very important topic. Why do we even need triple
quote string literals to span multiple lines? Good question, and one i
have never really mused on until now. It's amazing how much BS we just
accept blindly! WE DON'T NEED TRIPLE QUOTE STRINGS! What we need is
single quote strings that span multiple lines and triple quotes then
become superfluous! For the problem of embedding quotes in string
literals, we should be using markup. A SIMPLISTIC MARKUP!

" This is a multi line
string with a single quote --> <SQ>
and a double quote --> <DQ>. Here is an
embedded newline --> <NL>. And a backspace<BS>.

Now we can dispense with all the BS!
"

Ok, you're trolling.

Terry Reedy · Dec 27, 2011

But this brings up a very important topic. Why do we even need triple
quote string literals to span multiple lines? Good question, and one i
have never really mused on until now.

I have, and the reason I thought of is that people, including me, too
ofter forget or accidentally fail to properly close a string literal,
and type something like 'this is a fairly long single line string"
and wonder why they get a syntax error lines later, or, in interactive
mode, why the interpreter does not respond to a newline.

Color coding editors make it easier to catch such errors, but they were
less common in 1991. And there is still uncolored interactive mode.

There may also be a technical reason as to how the lexer works.

Terry Reedy · Dec 27, 2011

But this brings up a very important topic. Why do we even need triple
quote string literals to span multiple lines? Good question, and one i
have never really mused on until now.

I have, and the reason I thought of is that people, including me, too
ofter forget or accidentally fail to properly close a string literal,
and type something like 'this is a fairly long single line string"
and wonder why they get a syntax error lines later, or, in interactive
mode, why the interpreter does not respond to a newline.

Color coding editors make it easier to catch such errors, but they were
less common in 1991. And there is still uncolored interactive mode.

There may also be a technical reason as to how the lexer works.

Rick Johnson · Dec 28, 2011

I have, and the reason I thought of is that people, including me, too
ofter forget or accidentally fail to properly close a string literal,

Yes, agreed.

Color coding editors make it easier to catch such errors, but they were
less common in 1991.

I would say the need for triple quote strings has passed long ago.
Like you say, since color lexers are ubiquitous now we don't need
them.

And there is still uncolored interactive mode.

I don't see interactive command line programming as a problem. I mean,
who drops into a cmd line and starts writing paragraphs of string
literals? Typically, one would just make a few one-liner calls here or
there. Also, un-terminated string literal errors can be very
aggravating. Not because they are difficult to fix, no, but because
they are difficult to find! -- and sending me an error message
like...

"Exception: Un-terminated string literal meets EOF! line: 50,466,638"

.... is about as helpful as a bullet in my head!

If the interpreter finds itself at EOF BEFORE a string closes, don't
you think it would be more helpful to include the currently "opened"
strings START POSITION also? Heck, it would be wonderful to only have
the start position since the likely-hood of a string ending at EOF is
astronomical!

As an intelligent lad must know, the odds that the distance from any
given string's start position to it's end position is more likely to
be shorter than the distance from the string's beginning to the
freaking EOF! Ruby and Python are both guilty of this atrocity.

Sharing: File Reader Generator with & w/o Policy	14	Mar 15, 2014
Problem with list.remove() method	7	Nov 20, 2012
Help with my 1st Tkinter program	0	Oct 20, 2004
comparing binary trees in C	12	May 1, 2009
generate and send mail with python: tutorial	8	Aug 11, 2011
Weekly Python Patch/Bug Summary	1	Jan 8, 2005
Substituting with regular expressions, in place	7	Mar 25, 2005
Better crypto hash functions, long, with code	2	Aug 26, 2005

Possible bug in string handling (with kludgy work-around)

Charles Hixson

Rick Johnson

Chris Angelico

Steven D'Aprano

Dennis Lee Bieber

Rick Johnson

Lie Ryan

Terry Reedy

Terry Reedy

Rick Johnson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads