dictionary initialization

W

Weiguang Shi

Hi,

With awk, I can do something like
$ echo 'hello' |awk '{a[$1]++}END{for(i in a)print i, a}'

That is, a['hello'] was not there but allocated and initialized to
zero upon reference.

With Python, I got
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: 1

That is, I have to initialize b[1] explicitly in the first place.

Personally, I think

a++

in awk is much more elegant than

if i in a: a += 1
else: a = 1

I wonder how the latter is justified in Python.

Thanks,
Weiguang
 
W

Weiguang Shi

Hi,

...
Dict entries accessed with 'string' keys,
Not necessarily. And doesn't make a difference in my question.
...

Which feature specifically do you want justification for?
Have it your way: string-indexed dictionaries.
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: '1'

a['1'] when it referenced, is detected non-existent but not
automatically initialized so that it exists before adding 1 to its
value.

Weiguang
 
B

Bengt Richter

Hi,

With awk, I can do something like
$ echo 'hello' |awk '{a[$1]++}END{for(i in a)print i, a}'

That is, a['hello'] was not there but allocated and initialized to
zero upon reference.

With Python, I got
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: 1

That is, I have to initialize b[1] explicitly in the first place.

Personally, I think

a++

in awk is much more elegant than

if i in a: a += 1
else: a = 1

I wonder how the latter is justified in Python.

You wrote it, so you have to "justify" it ;-)

While I agree that ++ and -- are handy abbreviations, and creating a key by default
makes for concise notation, a++ means you have to make some narrow assumptions -- i.e.,
that you want to create a zero integer start value. You can certainly make a dict subclass
that behaves that way if you want it:
... def __getitem__(self, i):
... if i not in self: self = 0
... return dict.__getitem__(self, i)
...
>>> dink = D()
>>> dink {}
>>> dink['a'] +=1
>>> dink {'a': 1}
>>> dink['a'] +=1
>>> dink {'a': 2}
>>> dink['b'] 0
>>> dink['b'] 0
>>> dink
{'a': 2, 'b': 0}


Otherwise the usual ways are along the lines of
>>> d = {}
>>> d.setdefault('hello',[0])[0] += 1
>>> d {'hello': [1]}
>>> d.setdefault('hello',[0])[0] += 1
>>> d
{'hello': [2]}

Or
>>> d['hi'] = d.get('hi', 0) + 1
>>> d {'hi': 1, 'hello': [2]}
>>> d['hi'] = d.get('hi', 0) + 1
>>> d {'hi': 2, 'hello': [2]}
>>> d['hi'] = d.get('hi', 0) + 1
>>> d
{'hi': 3, 'hello': [2]}

Or ... try: d['yo'] += 1
... except KeyError: d['yo'] = 1
... print d
...
{'hi': 3, 'hello': [2], 'yo': 1}
{'hi': 3, 'hello': [2], 'yo': 2}
{'hi': 3, 'hello': [2], 'yo': 3}

Regards,
Bengt Richter
 
W

Weiguang Shi

Hi,

You wrote it, so you have to "justify" it ;-)
I guess :)
While I agree that ++ and -- are handy abbreviations, and creating a
key by default makes for concise notation, a++ means you have to
make some narrow assumptions ...

Right, though generalization can be painful for the uninitiated/newbie.
You can certainly make a dict subclass that behaves that way if you
want it:
...
This is nice even for someone hopelessly lazy as me.
Otherwise the usual ways are along the lines of
...
I would happily avoid them all.

Thanks a lot,
Weiguang
 
D

Dan Perl

I don't know awk, so I don't know how your awk statement works.

Even when it comes to the python statements, I'm not sure exactly what the
intentions of design intention were in this case, but I can see at least one
justification. Python being dynamically typed, b[1] can be of any type, so
you have to initialize b[1] to give it a type and only then adding something
to it makes sense. Otherwise, the 'add' operation not being implemented for
all types, 'b[1]+1' may not even be allowed.

You're saying that in awk a['hello'] is initialized to 0. That would not be
justified in python. The type of b[1] is undetermined until initialization
and I don't see why it should be an int by default.

Dan

Weiguang Shi said:
Hi,

With awk, I can do something like
$ echo 'hello' |awk '{a[$1]++}END{for(i in a)print i, a}'

That is, a['hello'] was not there but allocated and initialized to
zero upon reference.

With Python, I got
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: 1

That is, I have to initialize b[1] explicitly in the first place.

Personally, I think

a++

in awk is much more elegant than

if i in a: a += 1
else: a = 1

I wonder how the latter is justified in Python.

Thanks,
Weiguang
 
W

Weiguang Shi

I don't know awk, so I don't know how your awk statement works.
It doesn't hurt to give it a try :)
Even when it comes to the python statements, I'm not sure exactly what the
...
I see your point.
You're saying that in awk a['hello'] is initialized to 0.
More than that; I said awk recognizes a['hello']++ as an
arithmetic operation and initializes a['hello'] to 0 and add one to
it. (This is all guess. I didn't implement gawk. But you see my point.)
That would not be justified in python. The type of b[1] is
undetermined until initialization and I don't see why it should be
an int by default.
In my example, it was b[1]+=1. "+=1" should at least tell Python two
things: this is an add operation and one of the operands is an
integer. Based on these, shouldn't Python be able to insert the pair
"1:0" into a{} before doing the increment?

Weiguang
 
W

Weiguang Shi

Hi,

...
***
# You *must* use a={}, just start as below
'>>> a={}
Yeah I know. I can live with that.
'>>> a['1']=0
'>>> a['1']+=1
Right here. You have to say a['1'] = 0 before you can say a['1'] +=1
Python does not do the former for you. That's what I'm asking
justifications for.

Regards,
Weiguang
 
P

Peter Hansen

Weiguang said:
That would not be justified in python. The type of b[1] is
undetermined until initialization and I don't see why it should be
an int by default.

In my example, it was b[1]+=1. "+=1" should at least tell Python two
things: this is an add operation and one of the operands is an
integer.

Why would it tell Python that?
>>> b = {1: 2.5}
>>> b[1] += 1
>>> b
{1: 3.5}

So at this point, it can clearly be either an integer or
a float. Doubtless it could also be an object which
overloads the += operator with integer arguments, though
what it might actually do is anyone's guess.

-Peter
 
?

=?iso-8859-15?q?Berthold_H=F6llmann?=

Hi,

With awk, I can do something like
$ echo 'hello' |awk '{a[$1]++}END{for(i in a)print i, a}'

That is, a['hello'] was not there but allocated and initialized to
zero upon reference.

With Python, I got
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: 1

That is, I have to initialize b[1] explicitly in the first place.

Personally, I think

a++

in awk is much more elegant than

if i in a: a += 1
else: a = 1

I wonder how the latter is justified in Python.


It isn't :)
a={}
a[1] = a.get(1, 0) + 1
a {1: 1}
a[1] = a.get(1, 0) + 1
a
{1: 2}

Regards
Berthold
 
J

Josiah Carlson

I don't know awk, so I don't know how your awk statement works.
It doesn't hurt to give it a try :)
Even when it comes to the python statements, I'm not sure exactly what the
...
I see your point.
You're saying that in awk a['hello'] is initialized to 0.
More than that; I said awk recognizes a['hello']++ as an
arithmetic operation and initializes a['hello'] to 0 and add one to
it. (This is all guess. I didn't implement gawk. But you see my point.)
That would not be justified in python. The type of b[1] is
undetermined until initialization and I don't see why it should be
an int by default.
In my example, it was b[1]+=1. "+=1" should at least tell Python two
things: this is an add operation and one of the operands is an
integer. Based on these, shouldn't Python be able to insert the pair
"1:0" into a{} before doing the increment?

As Peter has already mentioned, since b[1] doesn't exist until you
assign it, the type of b[1] is ambiguous.

The reason Python doesn't do automatic assignments on unknown access is
due to a few Python 'Zens'
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Specifically:
Explicit is better than implicit.
(you should assign what you want, not expect Python to know what you
want)
Special cases aren't special enough to break the rules.
(incrementing non-existant values in a dictionary shouldn't be any
different from accessing non-existant values)
In the face of ambiguity, refuse the temptation to guess.
(what class/value should the non-existant value initialize to?)


Learn the zens. Any time you have a design question about the Python,
check the zens, then check google, then check here.

- Josiah
 
J

Jeffrey Froman

Weiguang said:
With awk, I can do something like
$ echo 'hello' |awk '{a[$1]++}END{for(i in a)print i, a}'

That is, a['hello'] was not there but allocated and initialized to
zero upon reference.

With Python ... <snip>
I have to initialize b[1] explicitly in the first place.


You could use the dictionary's setdefault method, if your value is mutable:
.... b.setdefault('foo', [0])[0] += 1
....
100

Jeffrey
 
C

Caleb Hattingh

Hmm :)

"b[1]" looks like a List (but you created a Dict)
"b['1'] looks more like a Dict (but this is not what you used).

If lists are your thing:
a = []
a.append(1)
a [1]
a[0] += 1
a
[2]

If dicts are your thing:
b = {}
b['1'] = 1
b {'1': 1}
b['1'] += 1
b
{'1': 2}

Lists are ordered, Dicts are not.
Dict entries accessed with 'string' keys, List entries accessed with a
position integer.

Which feature specifically do you want justification for?

thx
Caleb




With Python, I got
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: 1

That is, I have to initialize b[1] explicitly in the first place.

Personally, I think

a++

in awk is much more elegant than

if i in a: a += 1
else: a = 1

I wonder how the latter is justified in Python.

Thanks,
Weiguang
 
C

Caleb Hattingh

Hi

I apologise, but I don't actually know what the problem is? If you could
restate it a little, that would help.

I didn't check the code I posted earlier; This below is checked:
***
# Dont use a={}, just start as below
'>>> a['1']=0
'>>> a['1']+=1
'>>> a
{'1': 1}
***

Like I said, I am unsure of what your specific problem is?

Thanks
Caleb


Hi,

...
Dict entries accessed with 'string' keys,
Not necessarily. And doesn't make a difference in my question.
...

Which feature specifically do you want justification for?
Have it your way: string-indexed dictionaries.
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: '1'

a['1'] when it referenced, is detected non-existent but not
automatically initialized so that it exists before adding 1 to its
value.

Weiguang
 
C

Caleb Hattingh

And I haven't even been drinking!

I apologise once more, this is better:

***
# You *must* use a={}, just start as below
'>>> a={}
'>>> a['1']=0
'>>> a['1']+=1
'>>> a
{'1': 1}
***

Like I said, I am unsure of what your specific problem is?

Thanks
Caleb
 
G

Gerrit

Peter said:
In my example, it was b[1]+=1. "+=1" should at least tell Python two
things: this is an add operation and one of the operands is an
integer.

Why would it tell Python that?

Well, the rhs of 'foo+=1' is always an integer.

Gerrit.

--
Weather in Lulea / Kallax, Sweden 26/11 17:20:
-8.0°C wind 6.7 m/s NW (34 m above NAP)
--
In the councils of government, we must guard against the acquisition of
unwarranted influence, whether sought or unsought, by the
military-industrial complex. The potential for the disastrous rise of
misplaced power exists and will persist.
-Dwight David Eisenhower, January 17, 1961
 
W

Weiguang Shi

Just received an email from Batista, Facundo. Below are some quote and
my reply.

...
a = {}
a['1'] = 5
a['1'] *= 2
a['1']
10
a['1'] = "blah"
a['1'] *= 2
a['1']
'blahblah'
a['1'] = ['a', 8]
a['1'] *= 2
a['1']
['a', 8, 'a', 8]

The type of the right hand operator does not have nothing to do with
the
type of the left operand!

You mean in Python, of course. I can see this is going the religious
direction now.

All in all, I've realized when a language generalizes and abstracts,
it loses convenience. Because of this, however powerful other
languages become, awk always has its place as long as the application
is there.

Weiguang
 
W

Weiguang Shi

Caleb,

...
And then have x=1? Is this the question of debate here? One line of
initialisation to specify the type? Right.


IF this is the point you are making, and the awk functionality
demostrated in this particular example is a really significant
feature for you in your specific problem domain, then I must concede
that awk is probably right for you, and you shouldn't waste your
time with Python.
Thanks for the advice. I'll stay with awk and shell for most of my
text processing (simple but, hey, 90% of the time I'm not doing
anything complex) and go Python for binary data processing and larger
projects. BTW, I think learning Python is a good use of my time.

Weiguang
 
D

Dan Perl

Caleb Hattingh said:
Hi Weiguang

I know how it is when discussion becomes religious, and I want to avoid
that. First, I want to clarify exactly what it is that you are saying:

Would I be correct in saying that your point is that with awk, you can
just do something like (ignore the syntax)

(x not existing yet)
x+=1

And have x = 1, while in Python you have to do

(x not existing yet)
x=0
x+=1

And then have x=1? Is this the question of debate here? One line of
initialisation to specify the type?

IF this is the point you are making, and the awk functionality demostrated
in this particular example is a really significant feature for you in your
specific problem domain, then I must concede that awk is probably right
for you, and you shouldn't waste your time with Python.

And just like that, the discussion turned religious. It's hard to assess
someone's tone when it comes in writing, but, Caleb, you sound sarcastic and
belligerent to me.

Yes, 2 lines instead of 1 is an issue. And it is not the only example where
the "explicit is better than implicit" principle shows a downside. However,
addressing Weiguang's statements, I wouldn't say that python is less
convenient than other languages (particularly awk, although I don't know
that language), because I am sure we can find examples where python can
implement something in a simpler way.

Dan
 
T

Terry Reedy

Dan Perl said:
And just like that, the discussion turned religious. It's hard to assess
someone's tone when it comes in writing, but, Caleb, you sound sarcastic
and belligerent to me.

To me, Caleb was being only slightly and possibly sarcastic in the process
of giving friendly good advice to the effect of "better to use Awk and
produce than to beat you head against a wall trying to change a basic
Python design decision.

Almost every design decision has plusses and minuses for designers and
others to weigh. No matter what the designer decides, there will be users
who weigh the factors enough differently to really wish that the decision
was otherwise. In fact, there will probably be another language whose
designer did decide otherwise. And in this case, with regard to the
handling of uninitialized variables, there is.

A Python religion fanatic might have made the opposite suggestion --
something like 'your factor weighting is wrong; see the light and bow to
the superior wisdom of how Python does it'.

Terry J. Reedy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top