Difficulty w/json keys

R

Red

My apologies for what is probably a simple question to most on this
group. However, I am new to python and very new to json.

I am trying to read in a json file from a twitter download. There are,
generally, two types of lines: those lines with "text" and the other
lines. I am only interested in the lines with "text". I am also only
interested in lines with "lang":"en", but I haven't gotten far enough
to implement that condition in the code snippets below.

I have gotten Option 2 below to sort of work. It works fine for
'text', but doesn't work for 'lang'.

FWIW I am using Python 2.6.4

Can someone tell me what I'm doing wrong with the following code
snippets and/or point me toward an example of an implementation?

Many thanks for your patience.

---------------------------------

import sys
import json

f = open(sys.argv[1])

#option 1

for line in f:
j = json.loads(line)
try:
'text' in j
print "TEXT: ", j
except:
print "EXCEPTION: ", j
continue
else:
text=j['text']
----snip --------




#option 2 does basically the same thing as option 1 , but also looks
for 'lang'

for line in f:
j = json.loads(line)
if 'text' in j:
if 'lang' in j:
lang = j['lang']
print "language", lang
text = j['text']
----snip --------

------ Two Sample Twitter lines -------------

{"text":"tech managers what size for your teams? better to have 10-20
ppl per manager or 2-5 and have the managers be more hands
on?","in_reply_to_user_id":null,"coordinates":null,"geo":null,"created_at":"Thu
Apr 22 17:35:42 +0000 2010","contributors":null,"source":"<a href=
\"http://twitterfeed.com\" rel=\"nofollow\">twitterfeed</
a>","in_reply_to_status_id":null,"place":null,"truncated":false,"in_reply_to_screen_name":null,"user":
{"favourites_count":
0,"profile_text_color":"000000","time_zone":"Eastern Time (US &
Canada)","created_at":"Tue Oct 27 19:50:51 +0000
2009","statuses_count":
286,"notifications":null,"profile_link_color":"0000ff","description":"I
write code and talk to people.
","lang":"en","profile_background_image_url":"http://s.twimg.com/a/
1271891196/images/themes/theme1/bg.png","profile_image_url":"http://
s.twimg.com/a/1271891196/images/
default_profile_0_normal.png","location":"Near the
water.","contributors_enabled":false,"following":null,"geo_enabled":false,"profile_sidebar_fill_color":"e0ff92","profile_background_tile":false,"screen_name":"sstatik","profile_sidebar_border_color":"87bc44","followers_count":
40,"protected":false,"verified":false,"url":"http://
elliotmurphy.com/","name":"statik","friends_count":18,"id":
85646316,"utc_offset":-18000,"profile_background_color":"9ae4e8"},"id":
12651537502,"favorited":false}
{"delete":{"status":{"id":12650137902,"user_id":128090723}}}
 
J

Jim Byrnes

Red said:
My apologies for what is probably a simple question to most on this
group. However, I am new to python and very new to json.

I am trying to read in a json file from a twitter download. There are,
generally, two types of lines: those lines with "text" and the other
lines. I am only interested in the lines with "text". I am also only
interested in lines with "lang":"en", but I haven't gotten far enough
to implement that condition in the code snippets below.

I have gotten Option 2 below to sort of work. It works fine for
'text', but doesn't work for 'lang'.

FWIW I am using Python 2.6.4

Can someone tell me what I'm doing wrong with the following code
snippets and/or point me toward an example of an implementation?

Many thanks for your patience.

---------------------------------

import sys
import json

f = open(sys.argv[1])

#option 1

for line in f:
j = json.loads(line)
try:
'text' in j
print "TEXT: ", j
except:
print "EXCEPTION: ", j
continue
else:
text=j['text']
----snip --------




#option 2 does basically the same thing as option 1 , but also looks
for 'lang'

for line in f:
j = json.loads(line)
if 'text' in j:
if 'lang' in j:
lang = j['lang']
print "language", lang
text = j['text']
----snip --------

------ Two Sample Twitter lines -------------

{"text":"tech managers what size for your teams? better to have 10-20
ppl per manager or 2-5 and have the managers be more hands
on?","in_reply_to_user_id":null,"coordinates":null,"geo":null,"created_at":"Thu
Apr 22 17:35:42 +0000 2010","contributors":null,"source":"<a href=
\"http://twitterfeed.com\" rel=\"nofollow\">twitterfeed</
a>","in_reply_to_status_id":null,"place":null,"truncated":false,"in_reply_to_screen_name":null,"user":
{"favourites_count":
0,"profile_text_color":"000000","time_zone":"Eastern Time (US&
Canada)","created_at":"Tue Oct 27 19:50:51 +0000
2009","statuses_count":
286,"notifications":null,"profile_link_color":"0000ff","description":"I
write code and talk to people.
","lang":"en","profile_background_image_url":"http://s.twimg.com/a/
1271891196/images/themes/theme1/bg.png","profile_image_url":"http://
s.twimg.com/a/1271891196/images/
default_profile_0_normal.png","location":"Near the
water.","contributors_enabled":false,"following":null,"geo_enabled":false,"profile_sidebar_fill_color":"e0ff92","profile_background_tile":false,"screen_name":"sstatik","profile_sidebar_border_color":"87bc44","followers_count":
40,"protected":false,"verified":false,"url":"http://
elliotmurphy.com/","name":"statik","friends_count":18,"id":
85646316,"utc_offset":-18000,"profile_background_color":"9ae4e8"},"id":
12651537502,"favorited":false}
{"delete":{"status":{"id":12650137902,"user_id":128090723}}}

I can't help you directly with your problem but have you seen this:

http://arstechnica.com/open-source/...itters-new-real-time-stream-api-in-python.ars

Regards, Jim
 
R

Rolando Espinoza La Fuente

for line in f:
       j = json.loads(line)
       if 'text' in j:
               if 'lang' in j:
                       lang = j['lang']
                       print "language", lang
               text = j['text']

"lang" key is in "user" dict
'tech managers what size for your teams? better to have 10-20 ppl per
manager or 2-5 and have the managers be more hands on?'
[...]
KeyError: 'lang'
'en'

~Rolando
 
J

J. Cliff Dyer

You need to know what your input data actually looks like, and the best
thing for that is a little bit of formatting. I bet you can figure out
the problem yourself, once you see the structure of your data more
clearly. I've reformatted the JSON for you to help out.


------ Two Sample Twitter lines -------------
{
"text":"tech managers what size for your teams? better to have 10-20
ppl per manager or 2-5 and have the managers be more hands
on?",
"in_reply_to_user_id":null,
"coordinates":null,
"geo":null,
"created_at":"Thu Apr 22 17:35:42 +0000 2010",
"contributors":null,
"source":"<a href=\"http://twitterfeed.com\" rel=\"nofollow
\">twitterfeed</a>",
"in_reply_to_status_id":null,
"place":null,
"truncated":false,
"in_reply_to_screen_name":null,
"user": {
"favourites_count":0,
"profile_text_color":"000000",
"time_zone":"Eastern Time (US & Canada)",
"created_at":"Tue Oct 27 19:50:51 +0000 2009",
"statuses_count": 286,
"notifications":null,
"profile_link_color":"0000ff",
"description":"I write code and talk to people.",
"lang":"en",
"profile_background_image_url":"http://s.twimg.com/a/
1271891196/images/themes/theme1/bg.png",
"profile_image_url":"http://s.twimg.com/a/1271891196/images/
default_profile_0_normal.png",
"location":"Near the water.",
"contributors_enabled":false,
"following":null,
"geo_enabled":false,
"profile_sidebar_fill_color":"e0ff92",
"profile_background_tile":false,
"screen_name":"sstatik",
"profile_sidebar_border_color":"87bc44",
"followers_count": 40,
"protected":false,
"verified":false,
"url":"http://elliotmurphy.com/",
"name":"statik",
"friends_count":18,
"id":85646316,
"utc_offset":-18000,
"profile_background_color":"9ae4e8"
},
"id": 12651537502,
"favorited":false
}
{
"delete": {
"status":{
"id":12650137902,
"user_id":128090723
}
}
}
 
T

Terry Reedy

My apologies for what is probably a simple question to most on this
group. However, I am new to python and very new to json.

I am trying to read in a json file from a twitter download. There are,
generally, two types of lines: those lines with "text" and the other
lines. I am only interested in the lines with "text". I am also only
interested in lines with "lang":"en", but I haven't gotten far enough
to implement that condition in the code snippets below.

I have gotten Option 2 below to sort of work. It works fine for
'text', but doesn't work for 'lang'.

You do not define 'work', 'sort of work', and "doesn't work".
FWIW I am using Python 2.6.4

Can someone tell me what I'm doing wrong with the following code
snippets and/or point me toward an example of an implementation?

Many thanks for your patience.

---------------------------------

import sys
import json

f = open(sys.argv[1])

#option 1

for line in f:
j = json.loads(line)
try:
'text' in j

This does not raise an exception when false
print "TEXT: ", j

so this should always print.
Forget this option.

except:
print "EXCEPTION: ", j
continue
else:
text=j['text']
----snip --------




#option 2 does basically the same thing as option 1 ,

Not at all when 'text' in not in j.


but also looks
for 'lang'

for line in f:
j = json.loads(line)
if 'text' in j:
if 'lang' in j:
lang = j['lang']
print "language", lang
text = j['text']
----snip --------

tjr
 
R

Red

Thanks to Cliff and Rolando who saw where my real problem was.
Terry,Jim: I had not seen the tutorial before, so I'll have to dig
into that as well. So little time.

Cheers
 
R

Red

My apologies for what is probably a simple question to most on this
group. However, I am new to python and very new to json.
I am trying to read in a json file from a twitter download. There are,
generally, two types of lines: those lines with "text" and the other
lines.  I am only interested in the lines with "text".  I am also only
interested in lines with "lang":"en", but I haven't gotten far enough
to implement that condition in the code snippets below.
I have gotten Option 2 below to sort of work.  It works fine for
'text', but doesn't work for 'lang'.

You do not define 'work', 'sort of work', and "doesn't work".


FWIW I am using Python 2.6.4
Can someone tell me what I'm doing wrong with the following code
snippets and/or point me toward an example of an implementation?
Many thanks for your patience.

import sys
import json
f = open(sys.argv[1])
#option 1
for line in f:
   j = json.loads(line)
   try:
           'text' in j

This does not raise an exception when false
           print "TEXT:  ", j

so this should always print.
Forget this option.
   except:
           print "EXCEPTION:   ", j
           continue
   else:
           text=j['text']
----snip --------
#option 2  does basically the same thing as option 1 ,

Not at all when 'text' in not in j.

  but also looks
for 'lang'
for line in f:
   j = json.loads(line)
   if 'text' in j:
           if 'lang' in j:
                   lang = j['lang']
                   print "language", lang
           text = j['text']
----snip --------

tjr

I need to think about the logic here again and see what I'm missing
beyond my original question. Thanks for taking the time to explain.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,072
Latest member
trafficcone

Latest Threads

Top