Possible bug in string handling (with kludgy work-around)

C

Charles Hixson

This doesn't cause a crash, but rather incorrect results.

self.wordList = ["The", "quick", "brown", "fox", "carefully",
"jumps", "over", "the", "lazy", "dog", "as", "it",
"stealthily", "wends", "its", "way", "homewards", '\b.']
for i in range (len (self.wordList) ):
if not isinstance(self.wordList, str):
self.wordList = ""
elif self.wordList != "" and self.wordList[0] == "\b":
print ("0: wordList[", i, "] = \"", self.wordList, "\"", sep
= "")
print ("0a: wordList[", i, "][1] = \"", self.wordList[1],
"\"", sep = "")
tmp = self.wordList[1] ## !! Kludge --
remove tmp to see the error
self.wordList = tmp + self.wordList[1:-1] ## !!
Kludge -- remove tmp + to see the error
print ("1: wordList[", i, "] = \"", self.wordList, "\"", sep
= "")
print ("len(wordList[", i, "]) = ", len(self.wordList) )
 
R

Rick Johnson

This doesn't cause a crash, but rather incorrect results.

self.wordList    =    ["The", "quick", "brown", "fox", "carefully",
                 "jumps", "over", "the", "lazy", "dog","as", "it",
                 "stealthily", "wends", "its", "way", "homewards", '\b.']
for    i    in    range (len (self.wordList) ):
    if    not isinstance(self.wordList, str):
        self.wordList = ""
   elif self.wordList != "" and self.wordList[0] == "\b":
        print ("0: wordList[", i, "] = \"", self.wordList, "\"", sep
= "")
        print ("0a: wordList[", i, "][1] = \"", self.wordList[1],
"\"", sep = "")
        tmp    =    self.wordList[1]            ## !! Kludge --
remove tmp to see the error
        self.wordList    =    tmp + self.wordList[1:-1]  ## !!
Kludge -- remove tmp + to see the error
        print ("1: wordList[", i, "] = \"", self.wordList, "\"", sep
= "")
        print    ("len(wordList[", i, "]) = ", len(self.wordList) )


Handy rules for reporting bugs:

1. Always format code properly.
2. Always trim excess fat from code.
3. Always include relative dependencies ("self.wordlist" is only valid
inside a class. In this case, change the code to a state that is NOT
dependent on a class definition.)

Most times after following these simple rules, you'll find egg on your
face BEFORE someone else has a chance to see it and ridicule you.
 
C

Chris Angelico

Handy rules for reporting bugs:

1. Always format code properly.
2. Always trim excess fat from code.
3. Always include relative dependencies ("self.wordlist" is only valid
inside a class. In this case, change the code to a state that is NOT
dependent on a class definition.)

Most times after following these simple rules, you'll find egg on your
face BEFORE someone else has a chance to see it and ridicule you.

4. Don't take it personally when a known troll insults you. His
advice, in this case, is valid; but don't feel that you're going to be
ridiculed. We don't work that way on this list.

ChrisA
 
S

Steven D'Aprano

This doesn't cause a crash, but rather incorrect results.

Charles, your code is badly formatted and virtually unreadable. You have
four spaces between some tokens, lines are too long to fit in an email or
News post without word-wrapping. It is a mess of unidiomatic code filled
with repeated indexing and unnecessary backslash escapes.

You also don't tell us what result you expect, or what result you
actually get. What is the intention of the code? What are you trying to
do, and what happens instead?

The code as given doesn't run -- what's self?

Despite all these problems, I can see one obvious problem in your code:
you test to see if self.wordList is a string, and if not, you replace
the *entire* wordList with the empty string. That is unlikely to do what
you want, although I admit I'm guessing what you are trying to do (since
you don't tell us).

Some hints for you:

(1) Python has two string delimiters, " and ' and you should use them
both. Instead of hard-to-read backslash escapes, just swap delimiters:

print "A string including a \" quote mark." # No!
print 'A string including a " quote mark.' # Yes, much easier to read.

The only time you should backslash-escape a quotation mark is if you need
to include both sorts in a single string:

print "Python has both single ' and double \" quotation marks."
print 'Python has both single \' and double " quotation marks.'


(2) Python is not Pascal, or whatever language you seem to be writing in
the style of. You almost never should write for-loops like this:


for i in range(len(something)):
print something


Instead, you should just iterate over "something" directly:


for obj in something:
print obj


If you also need the index, use the enumerate function:


for i,obj in enumerate(something):
print obj, i


If you are forced to use an ancient version of Python without enumerate,
do yourself a favour and write your loops like this:


for i in range(len(something)):
obj = something
print obj, i


instead of repeatedly indexing the list over and over and over and over
again, as you do in your own code. The use of a temporary variable makes
the code much easier to read and understand.
 
D

Dennis Lee Bieber

The only time you should backslash-escape a quotation mark is if you need
to include both sorts in a single string:

print "Python has both single ' and double \" quotation marks."
print 'Python has both single \' and double " quotation marks.'

You can get by without the backslash in this situation too, by using
triple quoting:

print """Python has both single ' and double " quotation marks."""
(substitute ''' for """ if it looks better to you, as long as you use
the same marker at both ends. I find """ clearer, ''' could be a " and '
packed tightly in some fonts, "', whereas """ can only be one construct)
 
R

Rick Johnson

--
Note: superfluous indention removed for clarity!
--

You can get by without the backslash in this situation too, by using
triple quoting:

I would not do that because:
1. Because Python already has TWO string literal delimiters (' and ")
2. Because triple quote string literals are SPECIFICALLY created to
solve the "multi-line issue"
3. Because you can confuse the hell out of someone who is reading
Python code and they may miss the true purpose of triple quotes in
Python

But this brings up a very important topic. Why do we even need triple
quote string literals to span multiple lines? Good question, and one i
have never really mused on until now. It's amazing how much BS we just
accept blindly! WE DON'T NEED TRIPLE QUOTE STRINGS! What we need is
single quote strings that span multiple lines and triple quotes then
become superfluous! For the problem of embedding quotes in string
literals, we should be using markup. A SIMPLISTIC MARKUP!

" This is a multi line
string with a single quote --> <SQ>
and a double quote --> <DQ>. Here is an
embedded newline --> <NL>. And a backspace <BS>.

Now we can dispense with all the BS!
"
I find """ clearer, ''' could be a " and '
packed tightly in some fonts, "', whereas """ can only be one construct)

Another reason to ONLY use fixed width font when viewing code! Why
would you use ANY font that would obscure chars SO ubiquitous as " and
'?
 
L

Lie Ryan

--
Note: superfluous indention removed for clarity!
--



I would not do that because:
1. Because Python already has TWO string literal delimiters (' and ")
2. Because triple quote string literals are SPECIFICALLY created to
solve the "multi-line issue"
3. Because you can confuse the hell out of someone who is reading
Python code and they may miss the true purpose of triple quotes in
Python

But this brings up a very important topic. Why do we even need triple
quote string literals to span multiple lines? Good question, and one i
have never really mused on until now. It's amazing how much BS we just
accept blindly! WE DON'T NEED TRIPLE QUOTE STRINGS! What we need is
single quote strings that span multiple lines and triple quotes then
become superfluous! For the problem of embedding quotes in string
literals, we should be using markup. A SIMPLISTIC MARKUP!

" This is a multi line
string with a single quote --> <SQ>
and a double quote --> <DQ>. Here is an
embedded newline --> <NL>. And a backspace<BS>.

Now we can dispense with all the BS!
"

Ok, you're trolling.
 
T

Terry Reedy

But this brings up a very important topic. Why do we even need triple
quote string literals to span multiple lines? Good question, and one i
have never really mused on until now.

I have, and the reason I thought of is that people, including me, too
ofter forget or accidentally fail to properly close a string literal,
and type something like 'this is a fairly long single line string"
and wonder why they get a syntax error lines later, or, in interactive
mode, why the interpreter does not respond to a newline.

Color coding editors make it easier to catch such errors, but they were
less common in 1991. And there is still uncolored interactive mode.

There may also be a technical reason as to how the lexer works.
 
T

Terry Reedy

But this brings up a very important topic. Why do we even need triple
quote string literals to span multiple lines? Good question, and one i
have never really mused on until now.

I have, and the reason I thought of is that people, including me, too
ofter forget or accidentally fail to properly close a string literal,
and type something like 'this is a fairly long single line string"
and wonder why they get a syntax error lines later, or, in interactive
mode, why the interpreter does not respond to a newline.

Color coding editors make it easier to catch such errors, but they were
less common in 1991. And there is still uncolored interactive mode.

There may also be a technical reason as to how the lexer works.
 
R

Rick Johnson

I have, and the reason I thought of is that people, including me, too
ofter forget or accidentally fail to properly close a string literal,

Yes, agreed.
Color coding editors make it easier to catch such errors, but they were
less common in 1991.

I would say the need for triple quote strings has passed long ago.
Like you say, since color lexers are ubiquitous now we don't need
them.
And there is still uncolored interactive mode.

I don't see interactive command line programming as a problem. I mean,
who drops into a cmd line and starts writing paragraphs of string
literals? Typically, one would just make a few one-liner calls here or
there. Also, un-terminated string literal errors can be very
aggravating. Not because they are difficult to fix, no, but because
they are difficult to find! -- and sending me an error message
like...

"Exception: Un-terminated string literal meets EOF! line: 50,466,638"

.... is about as helpful as a bullet in my head!

If the interpreter finds itself at EOF BEFORE a string closes, don't
you think it would be more helpful to include the currently "opened"
strings START POSITION also? Heck, it would be wonderful to only have
the start position since the likely-hood of a string ending at EOF is
astronomical!

As an intelligent lad must know, the odds that the distance from any
given string's start position to it's end position is more likely to
be shorter than the distance from the string's beginning to the
freaking EOF! Ruby and Python are both guilty of this atrocity.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top