Strange behavior

Discussion in 'Python' started by light1quark@gmail.com, Aug 14, 2012.

  1. Guest

    Hi, I am migrating from PHP to Python and I am slightly confused.

    I am making a function that takes a startingList, finds all the strings in the list that begin with 'x', removes those strings and puts them into a xOnlyList.

    However if you run the code you will notice only one of the strings beginning with 'x' is removed from the startingList.
    If I comment out 'startingList.remove(str);' the code runs with both strings beginning with 'x' being put in the xOnlyList.
    Using the print statement I noticed that the second string that begins with 'x' isn't even identified by the function. Why does this happen?

    def testFunc(startingList):
    xOnlyList = [];
    for str in startingList:
    if (str[0] == 'x'):
    print str;
    xOnlyList.append(str)
    startingList.remove(str) #this seems to be the problem
    print xOnlyList;
    print startingList
    testFunc(['xasd', 'xjkl', 'sefwr', 'dfsews'])

    #Thanks for your help!
    , Aug 14, 2012
    #1
    1. Advertising

  2. writes:

    > However if you run the code you will notice only one of the strings
    > beginning with 'x' is removed from the startingList.


    >
    > def testFunc(startingList):
    > xOnlyList = [];
    > for str in startingList:
    > if (str[0] == 'x'):
    > print str;
    > xOnlyList.append(str)
    > startingList.remove(str) #this seems to be the problem
    > print xOnlyList;
    > print startingList
    > testFunc(['xasd', 'xjkl', 'sefwr', 'dfsews'])
    >
    > #Thanks for your help!


    Try with ['xasd', 'sefwr', 'xjkl', 'dfsews'] and you'll understand what
    happens. Also, have a look at:

    http://docs.python.org/reference/compound_stmts.html#the-for-statement

    You can't modify the list you're iterating on, better use another list
    to collect the result.

    -- Alain.

    P/S: str is a builtin, you'd better avoid assigning to it.
    Alain Ketterlin, Aug 14, 2012
    #2
    1. Advertising

  3. Terry Reedy Guest

    On 8/14/2012 11:59 AM, Alain Ketterlin wrote:
    > writes:
    >
    >> However if you run the code you will notice only one of the strings
    >> beginning with 'x' is removed from the startingList.

    >
    >>
    >> def testFunc(startingList):
    >> xOnlyList = [];
    >> for str in startingList:
    >> if (str[0] == 'x'):
    >> print str;
    >> xOnlyList.append(str)
    >> startingList.remove(str) #this seems to be the problem
    >> print xOnlyList;
    >> print startingList
    >> testFunc(['xasd', 'xjkl', 'sefwr', 'dfsews'])
    >>
    >> #Thanks for your help!

    >
    > Try with ['xasd', 'sefwr', 'xjkl', 'dfsews'] and you'll understand what
    > happens. Also, have a look at:
    >
    > http://docs.python.org/reference/compound_stmts.html#the-for-statement
    >
    > You can't modify the list you're iterating on,


    Except he obviously did ;-).
    (Modifying set or dict raises SomeError.)

    Indeed, people routine *replace* items while iterating.

    def squarelist(lis):
    for i, n in enumerate(lis):
    lis = n*n
    return lis

    print(squarelist([0,1,2,3,4,5]))
    # [0, 1, 4, 9, 16, 25]

    Removals can be handled by iterating in reverse. This works even with
    duplicates because if the item removed is not the one tested, the one
    tested gets retested.

    def removeodd(lis):
    for n in reversed(lis):
    if n % 2:
    lis.remove(n)
    print(n, lis)

    ll = [0,1, 5, 5, 4, 5]
    removeodd(ll)
    >>>

    5 [0, 1, 5, 4, 5]
    5 [0, 1, 4, 5]
    5 [0, 1, 4]
    4 [0, 1, 4]
    1 [0, 4]
    0 [0, 4]

    > better use another list to collect the result.


    If there are very many removals, a new list will be faster, even if one
    needs to copy the new list back into the original, as k removals from
    len n list is O(k*n) versus O(n) for new list and copy.

    > P/S: str is a builtin, you'd better avoid assigning to it.


    Agreed. People have actually posted code doing something like

    ....
    list = [1,2,3]
    ....
    z = list(x)
    ....
    and wondered and asked why it does not work.

    --
    Terry Jan Reedy
    Terry Reedy, Aug 14, 2012
    #3
  4. Guest

    , Aug 14, 2012
    #4
  5. On 2012-08-14 17:38, wrote:
    > Hi, I am migrating from PHP to Python and I am slightly confused.
    >
    > I am making a function that takes a startingList, finds all the strings in the list that begin with 'x', removes those strings and puts them into a xOnlyList.
    >
    > However if you run the code you will notice only one of the strings beginning with 'x' is removed from the startingList.
    > If I comment out 'startingList.remove(str);' the code runs with both strings beginning with 'x' being put in the xOnlyList.
    > Using the print statement I noticed that the second string that begins with 'x' isn't even identified by the function. Why does this happen?
    >
    > def testFunc(startingList):
    > xOnlyList = [];
    > for str in startingList:
    > if (str[0] == 'x'):
    > print str;
    > xOnlyList.append(str)
    > startingList.remove(str) #this seems to be the problem
    > print xOnlyList;
    > print startingList
    > testFunc(['xasd', 'xjkl', 'sefwr', 'dfsews'])
    >
    > #Thanks for your help!


    You might find the following useful:

    def testFunc(startingList):
    xOnlyList = []; j = -1
    for xl in startingList:
    if (xl[0] == 'x'):
    xOnlyList.append(xl)
    else:
    j += 1
    startingList[j] = xl
    if j == -1:
    startingList = []
    else:
    del startingList[j:-1]

    return(xOnlyList)


    testList1 = ['xasd', 'xjkl', 'sefwr', 'dfsews']
    testList2 = ['xasd', 'xjkl', 'xsefwr', 'xdfsews']
    testList3 = ['xasd', 'jkl', 'sefwr', 'dfsews']
    testList4 = ['asd', 'jkl', 'sefwr', 'dfsews']

    xOnlyList = testFunc(testList1)
    print 'xOnlyList = ',xOnlyList
    print 'testList = ',testList1
    xOnlyList = testFunc(testList2)
    print 'xOnlyList = ',xOnlyList
    print 'testList = ',testList2
    xOnlyList = testFunc(testList3)
    print 'xOnlyList = ',xOnlyList
    print 'testList = ',testList3
    xOnlyList = testFunc(testList4)
    print 'xOnlyList = ',xOnlyList
    print 'testList = ',testList4

    And here is another version using list comprehension that I prefer

    testList1 = ['xasd', 'xjkl', 'sefwr', 'dfsews']
    testList2 = ['xasd', 'xjkl', 'xsefwr', 'xdfsews']
    testList3 = ['xasd', 'jkl', 'sefwr', 'dfsews']
    testList4 = ['asd', 'jkl', 'sefwr', 'dfsews']

    def testFunc2(startingList):
    return([x for x in startingList if x[0] == 'x'], [x for x in
    startingList if x[0] != 'x'])

    xOnlyList,testList = testFunc2(testList1)
    print xOnlyList
    print testList
    xOnlyList,testList = testFunc2(testList2)
    print xOnlyList
    print testList
    xOnlyList,testList = testFunc2(testList3)
    print xOnlyList
    print testList
    xOnlyList,testList = testFunc2(testList4)
    print xOnlyList
    print testList
    Virgil Stokes, Aug 14, 2012
    #5
  6. On Wed, Aug 15, 2012 at 1:38 AM, <> wrote:
    > def testFunc(startingList):
    > xOnlyList = [];
    > for str in startingList:
    > if (str[0] == 'x'):
    > print str;
    > xOnlyList.append(str)
    > startingList.remove(str) #this seems to be the problem
    > print xOnlyList;
    > print startingList
    > testFunc(['xasd', 'xjkl', 'sefwr', 'dfsews'])


    Other people have explained the problem with your code. I'll take this
    example as a way of introducing you to one of Python's handy features
    - it's an idea borrowed from functional languages, and is extremely
    handy. It's called the "list comprehension", and can be looked up in
    the docs under that name,

    def testFunc(startingList):
    xOnlyList = [strng for strng in startingList if strng[0] == 'x']
    startingList = [strng for strng in startingList if strng[0] != 'x']
    print(xOnlyList)
    print(startingList)

    It's a compact notation for building a list from another list. (Note
    that I changed "str" to "strng" to avoid shadowing the built-in name
    "str", as others suggested.)

    (Unrelated side point: Putting parentheses around the print statements
    makes them compatible with Python 3, in which 'print' is a function.
    Unless something's binding you to Python 2, consider working with the
    current version - Python 2 won't get any more features added to it any
    more.)

    Python's an awesome language. You may have to get your head around a
    few new concepts as you shift thinking from PHP's, but it's well worth
    while.

    Chris Angelico
    Chris Angelico, Aug 14, 2012
    #6
  7. On Tue, 14 Aug 2012 21:40:10 +0200, Virgil Stokes wrote:

    > You might find the following useful:
    >
    > def testFunc(startingList):
    > xOnlyList = []; j = -1
    > for xl in startingList:
    > if (xl[0] == 'x'):


    That's going to fail in the starting list contains an empty string. Use
    xl.startswith('x') instead.


    > xOnlyList.append(xl)
    > else:
    > j += 1
    > startingList[j] = xl


    Very cunning, but I have to say that your algorithm fails the "is this
    obviously correct without needing to study it?" test. Sometimes that is
    unavoidable, but for something like this, there are simpler ways to solve
    the same problem.


    > if j == -1:
    > startingList = []
    > else:
    > del startingList[j:-1]
    > return(xOnlyList)



    > And here is another version using list comprehension that I prefer


    > def testFunc2(startingList):
    > return([x for x in startingList if x[0] == 'x'], [x for x in
    > startingList if x[0] != 'x'])


    This walks over the starting list twice, doing essentially the same thing
    both times. It also fails to meet the stated requirement that
    startingList is modified in place, by returning a new list instead.
    Here's an example of what I mean:

    py> mylist = mylist2 = ['a', 'x', 'b', 'xx', 'cx'] # two names for one
    list
    py> result, mylist = testFunc2(mylist)
    py> mylist
    ['a', 'b', 'cx']
    py> mylist2 # should be same as mylist
    ['a', 'x', 'b', 'xx', 'cx']

    Here is the obvious algorithm for extracting and removing words starting
    with 'x'. It walks the starting list only once, and modifies it in place.
    The only trick needed is list slice assignment at the end.

    def extract_x_words(words):
    words_with_x = []
    words_without_x = []
    for word in words:
    if word.startswith('x'):
    words_with_x.append(word)
    else:
    words_without_x.append(word)
    words[:] = words_without_x # slice assignment
    return words_with_x


    The only downside of this is that if the list of words is so enormous
    that you can fit it in memory *once* but not *twice*, this may fail. But
    the same applies to the list comprehension solution.



    --
    Steven
    Steven D'Aprano, Aug 15, 2012
    #7
  8. Chris Angelico <> writes:

    > Other people have explained the problem with your code. I'll take this
    > example as a way of introducing you to one of Python's handy features
    > - it's an idea borrowed from functional languages, and is extremely
    > handy. It's called the "list comprehension", and can be looked up in
    > the docs under that name,
    >
    > def testFunc(startingList):
    > xOnlyList = [strng for strng in startingList if strng[0] == 'x']
    > startingList = [strng for strng in startingList if strng[0] != 'x']
    > print(xOnlyList)
    > print(startingList)
    >
    > It's a compact notation for building a list from another list. (Note
    > that I changed "str" to "strng" to avoid shadowing the built-in name
    > "str", as others suggested.)


    Fully agree with you: list comprehension is, imo, the most useful
    program construct ever. Extremely useful.

    But not when it makes the program traverse twice the same list, where
    one traversal is enough.

    -- Alain.
    Alain Ketterlin, Aug 15, 2012
    #8
  9. writes:

    > I got my answer by reading your posts and referring to:
    > http://docs.python.org/reference/compound_stmts.html#the-for-statement
    > (particularly the shaded grey box)


    Not that the problem is not specific to python (if you erase the current
    element when traversing a STL list in C++ you'll get a crash as well).

    > I guess I should have (obviously) looked at the doc's before posting
    > here; but im a noob.


    Python has several surprising features. I think it is a good idea to
    take some time to read the language reference, from cover to cover
    (before or after the various tutorials, depending on your background).

    -- Alain.
    Alain Ketterlin, Aug 15, 2012
    #9
  10. Peter Otten Guest

    Virgil Stokes wrote:

    >>> def testFunc(startingList):
    >>>xOnlyList = []; j = -1
    >>>for xl in startingList:
    >>>if (xl[0] == 'x'):

    >> That's going to fail in the starting list contains an empty string. Use
    >> xl.startswith('x') instead.

    > Yes, but this was by design (tacitly assumed that startingList was both a
    > list and non-empty).


    You missunderstood it will fail if the list contains an empty string, not if
    the list itself is empty:

    >>> words = ["alpha", "", "xgamma"]
    >>> [word for word in words if word[0] == "x"]

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    IndexError: string index out of range

    The startswith() version:

    >>> [word for word in words if word.startswith("x")]

    ['xgamma']

    Also possible:

    >>> [word for word in words if word[:1] == "x"]

    ['xgamma']

    > def testFunc1(startingList):
    > '''
    > Algorithm-1
    > Note:
    > One should check for an empty startingList before
    > calling testFunc1 -- If this possibility exists!
    > '''
    > return([x for x in startingList if x[0] == 'x'],
    > [x for x in startingList if x[0] != 'x'])
    >
    >
    > I would be interested in seeing code that is faster than algorithm-1


    In pure Python? Perhaps the messy variant:

    def test_func(words):
    nox = []
    append = nox.append
    withx = [x for x in words if x[0] == 'x' or append(x)]
    return withx, nox
    Peter Otten, Aug 16, 2012
    #10
  11. On Thu, 16 Aug 2012 13:18:59 +0200, Virgil Stokes wrote:

    > On 15-Aug-2012 02:19, Steven D'Aprano wrote:
    >> On Tue, 14 Aug 2012 21:40:10 +0200, Virgil Stokes wrote:
    >>
    >>> You might find the following useful:
    >>>
    >>> def testFunc(startingList):
    >>> xOnlyList = []; j = -1
    >>> for xl in startingList:
    >>> if (xl[0] == 'x'):

    >> That's going to fail in the starting list contains an empty string. Use
    >> xl.startswith('x') instead.

    >
    > Yes, but this was by design (tacitly assumed that startingList was both
    > a list and non-empty).


    As Peter already pointed out, I said it would fail if the list contains
    an empty string, not if the list was empty.


    >>> xOnlyList.append(xl)
    >>> else:
    >>> j += 1
    >>> startingList[j] = xl

    >>
    >> Very cunning, but I have to say that your algorithm fails the "is this
    >> obviously correct without needing to study it?" test. Sometimes that is
    >> unavoidable, but for something like this, there are simpler ways to
    >> solve the same problem.

    >
    > Sorry, but I do not sure what you mean here.


    In a perfect world, you should be able to look at a piece of code, read
    it once, and see whether or not it is correct. That is what I mean by
    "obviously correct". For example, if I have a function that takes an
    argument, doubles it, and prints the result:

    def f1(x):
    print(2*x)


    that is obviously correct. Whereas this is not:

    def f2(x):
    y = (x + 5)**2 - (x + 4)**2
    sys.stdout.write(str(y - 9) + '\n')


    because you have to study it to see whether or not it works correctly.

    Not all programs are simple enough to be obviously correct. Sometimes you
    have no choice but to write something which requires cleverness to get
    the right result. But this is not one of those cases. You should almost
    always prefer simple code over clever code, because the greatest expense
    in programming (time, effort and money) is to make code correct.

    Most code does not need to be fast. But all code needs to be correct.


    [...]
    > This can meet the requirement that startingList is modified in place via
    > the call to this function (see the attached code).


    Good grief! See, that's exactly the sort of thing I'm talking about.
    Without *detailed* study of your attached code, how can I possibly know
    what it does or whether it does it correctly?

    Your timing code calculates the mean using a recursive algorithm. Why
    don't you calculate the mean the standard way: add the numbers and divide
    by the total? What benefit do you gain from a more complicated algorithm
    when a simple one will do the job just as well?

    You have spent a lot of effort creating a complicated, non-obvious piece
    of timing code, with different random seeds for each run, and complicated
    ways of calculating timing statistics... but unfortunately the most
    important part of any timing test, the actually *timing*, is not done
    correctly. Consequently, your code is not correct.

    With an average time of a fraction of a second, none of those timing
    results are trustworthy, because they are vulnerable to interference from
    other processes, the operating system, and other random noise. You spend
    a lot of time processing the timing results, but it is Garbage In,
    Garbage Out -- the results are not trustworthy, and if they are correct,
    it is only by accident.

    Later in your post, you run some tests, and are surprised by the result:

    > Why is algorithm-2A slower than algorithm-2?


    It isn't slower. It is physically impossible, since 2A does *less* work
    than 2. This demonstrates that you are actually taking a noisy
    measurement: the values you get have random noise, and you don't make any
    effort to minimise that noise. Hence GIGO.

    The right way to test small code snippets is with the timeit module. It
    is carefully written to overcome as much random noise as possible. But
    even there, the authors of the timeit module are very clear that you
    should not try to calculate means, let alone higher order statistics like
    standard deviation. The only statistic which is trustworthy is to run as
    many trials as you can afford, and select the minimum value.

    So here is my timing code, which is much shorter and simpler and doesn't
    try to do too much. You do need to understand the timeit.Timer class:

    timeit.Timer creates a timer object; timer.repeat does the actual timing.
    The specific arguments to them are not vital to understand, but you can
    read the documentation if you wish to find out what they mean.

    First, I define the two functions. I compare similar functions that have
    the same effect. Neither modifies the input argument in place. Copy and
    paste the following block into an interactive interpreter:

    # Start block

    def f1(startingList):
    return ([x for x in startingList if x[0] == 'x'],
    [x for x in startingList if x[0] != 'x'])

    # Note that the above function is INCORRECT, it will fail if a string is
    # empty; nevertheless I will use it for timing purposes anyway.


    def f2(startingList):
    words_without_x = []
    words_with_x = []
    for word in startingList:
    if word.startswith('x'):
    words_with_x.append(word)
    else:
    words_without_x.append(word)
    return (words_with_x, words_without_x)

    # Set up some test data. There's no point being too clever about this.
    # Keep it simple.

    import random
    data = ['aa', 'bb', 'cb', 'xa', 'xb', 'xc']*1000000
    random.shuffle(data)

    # Set up two timers.
    from timeit import Timer
    setup = "from __main__ import data, f1, f2"
    t1 = Timer("a, b = f1(data)", setup)
    t2 = Timer("a, b = f2(data)", setup)

    # and run the timers
    best1 = min(t1.repeat(number=1, repeat=10))
    best2 = min(t2.repeat(number=1, repeat=10))

    # End block


    On my computer, here are the results. Yours may differ.

    best1: 3.5199968814849854
    best2: 3.515479803085327

    No significant difference. And that is to be expected: the bulk of the
    time is spent building up two lists of three million items each.

    So let's run it again with less data:

    data = data[:10000]
    best1 = min(t1.repeat(number=200, repeat=10))/200
    best2 = min(t2.repeat(number=200, repeat=10))/200

    which gives results:

    best1: 0.0037816047668457033
    best2: 0.005841898918151856

    The double list comp solution is faster, but it's also incorrect -- it
    fails if there is an empty string in the list. What happens if we replace
    it with a version that doesn't have the empty string bug?

    def f1(startingList):
    return ([x for x in startingList if x.startswith('x')],
    [x for x in startingList if not x.startswith('x')])

    best1 = min(t1.repeat(number=200, repeat=10))/200
    best2 = min(t2.repeat(number=200, repeat=10))/200


    which gives these results:

    best1: 0.008604295253753662
    best2: 0.005863149166107178


    So there's the first lesson: it's easy to be fast if you don't mind
    writing buggy code.

    Can we do better? Try this:


    def f3(startingList):
    words_with_x = []
    words_without_x = []
    append_with = words_with_x.append
    append_without = words_without_x.append
    for word in iter(startingList):
    if word[:1] == 'x':
    append_with(word)
    else:
    append_without(word)
    return (words_with_x, words_without_x)

    t3 = Timer('a, b = f3(data)', 'from __main__ import f3, data')
    best3 = min(t3.repeat(number=200, repeat=10))/200

    And the result:

    best3: 0.0033271098136901855


    which is even faster than your original version.

    Or is it? No, I can't conclude that. The difference between the original
    f1 function (0.00378s) and my f3 function (0.00332s) is too small to be
    sure it is real from just ten trials of each. A better statistician than
    me could probably estimate the number of trials needed to be confident
    that one is better than the other.

    But then, with a difference that small, who cares? In the real world, a
    difference that small is lost in the noise. Because of the noise,
    probably 50% of the time the slower code will finish first.


    [...]
    > Suppose words was not a list --- you have tacitly assumed that words is
    > a list.


    Actually, no I have not. I have assumed it is an iterable object, such as
    a list, a tuple, or an iterator. So what? You have done the same thing.
    Doing an isinstance type check at the beginning of both functions will
    just slow them both down by the same amount.



    --
    Steven
    Steven D'Aprano, Aug 16, 2012
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sstark
    Replies:
    0
    Views:
    458
    sstark
    Mar 6, 2005
  2. ryang
    Replies:
    1
    Views:
    935
    Wes Groleau
    Apr 11, 2005
  3. Apogee

    Strange Behavior with ViewState

    Apogee, Jul 3, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    323
    Apogee
    Jul 3, 2003
  4. PJ

    DropDownList Strange Behavior

    PJ, Jul 8, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    348
  5. Mantorok Redgormor
    Replies:
    70
    Views:
    1,734
    Dan Pop
    Feb 17, 2004
Loading...

Share This Page