dictionary initialization

Discussion in 'Python' started by Weiguang Shi, Nov 25, 2004.

  1. Weiguang Shi

    Weiguang Shi Guest

    Hi,

    With awk, I can do something like
    $ echo 'hello' |awk '{a[$1]++}END{for(i in a)print i, a}'

    That is, a['hello'] was not there but allocated and initialized to
    zero upon reference.

    With Python, I got
    >>> b={}
    >>> b[1] = b[1] +1

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    KeyError: 1

    That is, I have to initialize b[1] explicitly in the first place.

    Personally, I think

    a++

    in awk is much more elegant than

    if i in a: a += 1
    else: a = 1

    I wonder how the latter is justified in Python.

    Thanks,
    Weiguang
    Weiguang Shi, Nov 25, 2004
    #1
    1. Advertising

  2. Weiguang Shi

    Weiguang Shi Guest

    Hi,

    In article <>, Caleb Hattingh wrote:
    > ...
    >Dict entries accessed with 'string' keys,

    Not necessarily. And doesn't make a difference in my question.

    > ...
    >
    >Which feature specifically do you want justification for?

    Have it your way: string-indexed dictionaries.

    >>> a={}
    >>> a['1']+=1

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    KeyError: '1'

    a['1'] when it referenced, is detected non-existent but not
    automatically initialized so that it exists before adding 1 to its
    value.

    Weiguang
    Weiguang Shi, Nov 25, 2004
    #2
    1. Advertising

  3. On Thu, 25 Nov 2004 18:38:17 +0000 (UTC), (Weiguang Shi) wrote:

    >Hi,
    >
    >With awk, I can do something like
    > $ echo 'hello' |awk '{a[$1]++}END{for(i in a)print i, a}'
    >
    >That is, a['hello'] was not there but allocated and initialized to
    >zero upon reference.
    >
    >With Python, I got
    > >>> b={}
    > >>> b[1] = b[1] +1

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > KeyError: 1
    >
    >That is, I have to initialize b[1] explicitly in the first place.
    >
    >Personally, I think
    >
    > a++
    >
    >in awk is much more elegant than
    >
    > if i in a: a += 1
    > else: a = 1
    >
    >I wonder how the latter is justified in Python.
    >

    You wrote it, so you have to "justify" it ;-)

    While I agree that ++ and -- are handy abbreviations, and creating a key by default
    makes for concise notation, a++ means you have to make some narrow assumptions -- i.e.,
    that you want to create a zero integer start value. You can certainly make a dict subclass
    that behaves that way if you want it:

    >>> class D(dict):

    ... def __getitem__(self, i):
    ... if i not in self: self = 0
    ... return dict.__getitem__(self, i)
    ...
    >>> dink = D()
    >>> dink

    {}
    >>> dink['a'] +=1
    >>> dink

    {'a': 1}
    >>> dink['a'] +=1
    >>> dink

    {'a': 2}
    >>> dink['b']

    0
    >>> dink['b']

    0
    >>> dink

    {'a': 2, 'b': 0}


    Otherwise the usual ways are along the lines of

    >>> d = {}
    >>> d.setdefault('hello',[0])[0] += 1
    >>> d

    {'hello': [1]}
    >>> d.setdefault('hello',[0])[0] += 1
    >>> d

    {'hello': [2]}

    Or
    >>> d['hi'] = d.get('hi', 0) + 1
    >>> d

    {'hi': 1, 'hello': [2]}
    >>> d['hi'] = d.get('hi', 0) + 1
    >>> d

    {'hi': 2, 'hello': [2]}
    >>> d['hi'] = d.get('hi', 0) + 1
    >>> d

    {'hi': 3, 'hello': [2]}

    Or
    >>> for x in xrange(3):

    ... try: d['yo'] += 1
    ... except KeyError: d['yo'] = 1
    ... print d
    ...
    {'hi': 3, 'hello': [2], 'yo': 1}
    {'hi': 3, 'hello': [2], 'yo': 2}
    {'hi': 3, 'hello': [2], 'yo': 3}

    Regards,
    Bengt Richter
    Bengt Richter, Nov 25, 2004
    #3
  4. Weiguang Shi

    Weiguang Shi Guest

    Hi,

    In article <>, Bengt Richter wrote:
    > On Thu, 25 Nov 2004 18:38:17 +0000 (UTC),
    > (Weiguang Shi) wrote:
    >You wrote it, so you have to "justify" it ;-)

    I guess :)

    >While I agree that ++ and -- are handy abbreviations, and creating a
    >key by default makes for concise notation, a++ means you have to
    >make some narrow assumptions ...

    Right, though generalization can be painful for the uninitiated/newbie.

    >You can certainly make a dict subclass that behaves that way if you
    >want it:
    > ...

    This is nice even for someone hopelessly lazy as me.

    >
    >Otherwise the usual ways are along the lines of
    >...

    I would happily avoid them all.

    Thanks a lot,
    Weiguang
    Weiguang Shi, Nov 25, 2004
    #4
  5. Weiguang Shi

    Dan Perl Guest

    I don't know awk, so I don't know how your awk statement works.

    Even when it comes to the python statements, I'm not sure exactly what the
    intentions of design intention were in this case, but I can see at least one
    justification. Python being dynamically typed, b[1] can be of any type, so
    you have to initialize b[1] to give it a type and only then adding something
    to it makes sense. Otherwise, the 'add' operation not being implemented for
    all types, 'b[1]+1' may not even be allowed.

    You're saying that in awk a['hello'] is initialized to 0. That would not be
    justified in python. The type of b[1] is undetermined until initialization
    and I don't see why it should be an int by default.

    Dan

    "Weiguang Shi" <> wrote in message
    news:...
    > Hi,
    >
    > With awk, I can do something like
    > $ echo 'hello' |awk '{a[$1]++}END{for(i in a)print i, a}'
    >
    > That is, a['hello'] was not there but allocated and initialized to
    > zero upon reference.
    >
    > With Python, I got
    > >>> b={}
    > >>> b[1] = b[1] +1

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > KeyError: 1
    >
    > That is, I have to initialize b[1] explicitly in the first place.
    >
    > Personally, I think
    >
    > a++
    >
    > in awk is much more elegant than
    >
    > if i in a: a += 1
    > else: a = 1
    >
    > I wonder how the latter is justified in Python.
    >
    > Thanks,
    > Weiguang
    Dan Perl, Nov 25, 2004
    #5
  6. Weiguang Shi

    Weiguang Shi Guest

    In article <>, Dan Perl wrote:
    >I don't know awk, so I don't know how your awk statement works.

    It doesn't hurt to give it a try :)

    >
    >Even when it comes to the python statements, I'm not sure exactly what the
    > ...

    I see your point.

    >
    >You're saying that in awk a['hello'] is initialized to 0.

    More than that; I said awk recognizes a['hello']++ as an
    arithmetic operation and initializes a['hello'] to 0 and add one to
    it. (This is all guess. I didn't implement gawk. But you see my point.)

    > That would not be justified in python. The type of b[1] is
    > undetermined until initialization and I don't see why it should be
    > an int by default.

    In my example, it was b[1]+=1. "+=1" should at least tell Python two
    things: this is an add operation and one of the operands is an
    integer. Based on these, shouldn't Python be able to insert the pair
    "1:0" into a{} before doing the increment?

    Weiguang
    Weiguang Shi, Nov 25, 2004
    #6
  7. Weiguang Shi

    Weiguang Shi Guest

    Hi,

    In article <>, Caleb Hattingh wrote:
    > ...
    > ***
    > # You *must* use a={}, just start as below
    > '>>> a={}

    Yeah I know. I can live with that.

    > '>>> a['1']=0
    > '>>> a['1']+=1

    Right here. You have to say a['1'] = 0 before you can say a['1'] +=1
    Python does not do the former for you. That's what I'm asking
    justifications for.

    Regards,
    Weiguang
    Weiguang Shi, Nov 25, 2004
    #7
  8. Weiguang Shi

    Peter Hansen Guest

    Weiguang Shi wrote:
    > In article <>, Dan Perl wrote:
    >>That would not be justified in python. The type of b[1] is
    >>undetermined until initialization and I don't see why it should be
    >>an int by default.

    >
    > In my example, it was b[1]+=1. "+=1" should at least tell Python two
    > things: this is an add operation and one of the operands is an
    > integer.


    Why would it tell Python that?

    >>> b = {1: 2.5}
    >>> b[1] += 1
    >>> b

    {1: 3.5}

    So at this point, it can clearly be either an integer or
    a float. Doubtless it could also be an object which
    overloads the += operator with integer arguments, though
    what it might actually do is anyone's guess.

    -Peter
    Peter Hansen, Nov 25, 2004
    #8
  9. (Weiguang Shi) writes:

    > Hi,
    >
    > With awk, I can do something like
    > $ echo 'hello' |awk '{a[$1]++}END{for(i in a)print i, a}'
    >
    > That is, a['hello'] was not there but allocated and initialized to
    > zero upon reference.
    >
    > With Python, I got
    > >>> b={}
    > >>> b[1] = b[1] +1

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > KeyError: 1
    >
    > That is, I have to initialize b[1] explicitly in the first place.
    >
    > Personally, I think
    >
    > a++
    >
    > in awk is much more elegant than
    >
    > if i in a: a += 1
    > else: a = 1
    >
    > I wonder how the latter is justified in Python.


    It isn't :)

    >>> a={}
    >>> a[1] = a.get(1, 0) + 1
    >>> a

    {1: 1}
    >>> a[1] = a.get(1, 0) + 1
    >>> a

    {1: 2}

    Regards
    Berthold
    --
    / <http://höllmanns.de/>
    / <http://starship.python.net/crew/bhoel/>
    =?iso-8859-15?q?Berthold_H=F6llmann?=, Nov 25, 2004
    #9
  10. (Weiguang Shi) wrote:
    >
    > In article <>, Dan Perl wrote:
    > >I don't know awk, so I don't know how your awk statement works.

    > It doesn't hurt to give it a try :)
    >
    > >
    > >Even when it comes to the python statements, I'm not sure exactly what the
    > > ...

    > I see your point.
    >
    > >
    > >You're saying that in awk a['hello'] is initialized to 0.

    > More than that; I said awk recognizes a['hello']++ as an
    > arithmetic operation and initializes a['hello'] to 0 and add one to
    > it. (This is all guess. I didn't implement gawk. But you see my point.)
    >
    > > That would not be justified in python. The type of b[1] is
    > > undetermined until initialization and I don't see why it should be
    > > an int by default.

    > In my example, it was b[1]+=1. "+=1" should at least tell Python two
    > things: this is an add operation and one of the operands is an
    > integer. Based on these, shouldn't Python be able to insert the pair
    > "1:0" into a{} before doing the increment?


    As Peter has already mentioned, since b[1] doesn't exist until you
    assign it, the type of b[1] is ambiguous.

    The reason Python doesn't do automatic assignments on unknown access is
    due to a few Python 'Zens'

    >>> import this

    The Zen of Python, by Tim Peters

    Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than *right* now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea -- let's do more of those!

    Specifically:
    Explicit is better than implicit.
    (you should assign what you want, not expect Python to know what you
    want)
    Special cases aren't special enough to break the rules.
    (incrementing non-existant values in a dictionary shouldn't be any
    different from accessing non-existant values)
    In the face of ambiguity, refuse the temptation to guess.
    (what class/value should the non-existant value initialize to?)


    Learn the zens. Any time you have a design question about the Python,
    check the zens, then check google, then check here.

    - Josiah
    Josiah Carlson, Nov 25, 2004
    #10
  11. Weiguang Shi

    Weiguang Shi Guest

    I see.

    Thanks
    Weiguang
    Weiguang Shi, Nov 25, 2004
    #11
  12. Weiguang Shi wrote:

    > With awk, I can do something like
    > $ echo 'hello' |awk '{a[$1]++}END{for(i in a)print i, a}'
    >
    > That is, a['hello'] was not there but allocated and initialized to
    > zero upon reference.
    >
    > With Python ... <snip>
    > I have to initialize b[1] explicitly in the first place.


    You could use the dictionary's setdefault method, if your value is mutable:

    >>> b={}
    >>> for n in xrange(100):

    .... b.setdefault('foo', [0])[0] += 1
    ....
    >>> b['foo'][0]

    100

    Jeffrey
    Jeffrey Froman, Nov 26, 2004
    #12
  13. Hmm :)

    "b[1]" looks like a List (but you created a Dict)
    "b['1'] looks more like a Dict (but this is not what you used).

    If lists are your thing:

    >>> a = []
    >>> a.append(1)
    >>> a

    [1]
    >>> a[0] += 1
    >>> a

    [2]

    If dicts are your thing:

    >>> b = {}
    >>> b['1'] = 1
    >>> b

    {'1': 1}
    >>> b['1'] += 1
    >>> b

    {'1': 2}

    Lists are ordered, Dicts are not.
    Dict entries accessed with 'string' keys, List entries accessed with a
    position integer.

    Which feature specifically do you want justification for?

    thx
    Caleb





    > With Python, I got
    > >>> b={}
    > >>> b[1] = b[1] +1

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > KeyError: 1
    >
    > That is, I have to initialize b[1] explicitly in the first place.
    >
    > Personally, I think
    >
    > a++
    >
    > in awk is much more elegant than
    >
    > if i in a: a += 1
    > else: a = 1
    >
    > I wonder how the latter is justified in Python.
    >
    > Thanks,
    > Weiguang
    Caleb Hattingh, Nov 26, 2004
    #13
  14. Hi

    I apologise, but I don't actually know what the problem is? If you could
    restate it a little, that would help.

    I didn't check the code I posted earlier; This below is checked:
    ***
    # Dont use a={}, just start as below
    '>>> a['1']=0
    '>>> a['1']+=1
    '>>> a
    {'1': 1}
    ***

    Like I said, I am unsure of what your specific problem is?

    Thanks
    Caleb


    On Thu, 25 Nov 2004 19:27:46 +0000 (UTC), Weiguang Shi
    <> wrote:

    > Hi,
    >
    > In article <>, Caleb Hattingh wrote:
    >> ...
    >> Dict entries accessed with 'string' keys,

    > Not necessarily. And doesn't make a difference in my question.
    >
    >> ...
    >>
    >> Which feature specifically do you want justification for?

    > Have it your way: string-indexed dictionaries.
    >
    > >>> a={}
    > >>> a['1']+=1

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > KeyError: '1'
    >
    > a['1'] when it referenced, is detected non-existent but not
    > automatically initialized so that it exists before adding 1 to its
    > value.
    >
    > Weiguang
    Caleb Hattingh, Nov 26, 2004
    #14
  15. And I haven't even been drinking!

    I apologise once more, this is better:

    ***
    # You *must* use a={}, just start as below
    '>>> a={}
    '>>> a['1']=0
    '>>> a['1']+=1
    '>>> a
    {'1': 1}
    ***

    Like I said, I am unsure of what your specific problem is?

    Thanks
    Caleb
    Caleb Hattingh, Nov 26, 2004
    #15
  16. Weiguang Shi

    Gerrit Guest

    Peter Hansen wrote:
    > >In my example, it was b[1]+=1. "+=1" should at least tell Python two
    > >things: this is an add operation and one of the operands is an
    > >integer.

    >
    > Why would it tell Python that?


    Well, the rhs of 'foo+=1' is always an integer.

    Gerrit.

    --
    Weather in Lulea / Kallax, Sweden 26/11 17:20:
    -8.0°C wind 6.7 m/s NW (34 m above NAP)
    --
    In the councils of government, we must guard against the acquisition of
    unwarranted influence, whether sought or unsought, by the
    military-industrial complex. The potential for the disastrous rise of
    misplaced power exists and will persist.
    -Dwight David Eisenhower, January 17, 1961
    Gerrit, Nov 26, 2004
    #16
  17. Weiguang Shi

    Weiguang Shi Guest

    Just received an email from Batista, Facundo. Below are some quote and
    my reply.

    On Fri, Nov 26, 2004 at 09:09:46AM -0300, Batista, Facundo wrote:
    > ...
    > >>> a = {}
    > >>> a['1'] = 5
    > >>> a['1'] *= 2
    > >>> a['1']

    > 10
    >
    > >>> a['1'] = "blah"
    > >>> a['1'] *= 2
    > >>> a['1']

    > 'blahblah'
    >
    > >>> a['1'] = ['a', 8]
    > >>> a['1'] *= 2
    > >>> a['1']

    > ['a', 8, 'a', 8]
    >
    > The type of the right hand operator does not have nothing to do with
    > the
    > type of the left operand!
    >


    You mean in Python, of course. I can see this is going the religious
    direction now.

    All in all, I've realized when a language generalizes and abstracts,
    it loses convenience. Because of this, however powerful other
    languages become, awk always has its place as long as the application
    is there.

    Weiguang
    Weiguang Shi, Nov 26, 2004
    #17
  18. Weiguang Shi

    Weiguang Shi Guest

    Caleb,

    In article <>, Caleb Hattingh wrote:
    > ...
    >And then have x=1? Is this the question of debate here? One line of
    >initialisation to specify the type?

    Right.

    >
    >IF this is the point you are making, and the awk functionality
    >demostrated in this particular example is a really significant
    >feature for you in your specific problem domain, then I must concede
    >that awk is probably right for you, and you shouldn't waste your
    >time with Python.

    Thanks for the advice. I'll stay with awk and shell for most of my
    text processing (simple but, hey, 90% of the time I'm not doing
    anything complex) and go Python for binary data processing and larger
    projects. BTW, I think learning Python is a good use of my time.

    Weiguang
    Weiguang Shi, Nov 26, 2004
    #18
  19. Weiguang Shi

    Dan Perl Guest

    "Caleb Hattingh" <> wrote in message
    news:eek:...
    > Hi Weiguang
    >
    > I know how it is when discussion becomes religious, and I want to avoid
    > that. First, I want to clarify exactly what it is that you are saying:
    >
    > Would I be correct in saying that your point is that with awk, you can
    > just do something like (ignore the syntax)
    >
    > (x not existing yet)
    > x+=1
    >
    > And have x = 1, while in Python you have to do
    >
    > (x not existing yet)
    > x=0
    > x+=1
    >
    > And then have x=1? Is this the question of debate here? One line of
    > initialisation to specify the type?
    >
    > IF this is the point you are making, and the awk functionality demostrated
    > in this particular example is a really significant feature for you in your
    > specific problem domain, then I must concede that awk is probably right
    > for you, and you shouldn't waste your time with Python.


    And just like that, the discussion turned religious. It's hard to assess
    someone's tone when it comes in writing, but, Caleb, you sound sarcastic and
    belligerent to me.

    Yes, 2 lines instead of 1 is an issue. And it is not the only example where
    the "explicit is better than implicit" principle shows a downside. However,
    addressing Weiguang's statements, I wouldn't say that python is less
    convenient than other languages (particularly awk, although I don't know
    that language), because I am sure we can find examples where python can
    implement something in a simpler way.

    Dan

    > Keep well
    > Caleb
    Dan Perl, Nov 26, 2004
    #19
  20. Weiguang Shi

    Terry Reedy Guest

    "Dan Perl" <> wrote in message
    news:...
    >
    > "Caleb Hattingh" <> wrote in message
    > news:eek:...
    >> IF this is the point you are making, and the awk functionality
    >> demostrated in this particular example is a really significant feature
    >> for you in your specific problem domain, then I must concede that awk is
    >> probably right for you, and you shouldn't waste your time with Python.

    >
    > And just like that, the discussion turned religious. It's hard to assess
    > someone's tone when it comes in writing, but, Caleb, you sound sarcastic
    > and belligerent to me.


    To me, Caleb was being only slightly and possibly sarcastic in the process
    of giving friendly good advice to the effect of "better to use Awk and
    produce than to beat you head against a wall trying to change a basic
    Python design decision.

    Almost every design decision has plusses and minuses for designers and
    others to weigh. No matter what the designer decides, there will be users
    who weigh the factors enough differently to really wish that the decision
    was otherwise. In fact, there will probably be another language whose
    designer did decide otherwise. And in this case, with regard to the
    handling of uninitialized variables, there is.

    A Python religion fanatic might have made the opposite suggestion --
    something like 'your factor weighting is wrong; see the light and bow to
    the superior wisdom of how Python does it'.

    Terry J. Reedy
    Terry Reedy, Nov 26, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JKop
    Replies:
    10
    Views:
    944
  2. Matthias Kaeppler
    Replies:
    2
    Views:
    441
    Victor Bazarov
    Jul 18, 2005
  3. Replies:
    6
    Views:
    459
    Ron Natalie
    Dec 11, 2005
  4. toton
    Replies:
    5
    Views:
    934
    Victor Bazarov
    Sep 28, 2006
  5. Jess
    Replies:
    23
    Views:
    926
Loading...

Share This Page