Sorting Countries by Region

P

patrick.waldo

Nov 16, 2007

#1

Hi all,

I'm analyzing some data that has a lot of country data. What I need
to do is sort through this data and output it into an excel doc with
summary information. The countries, though, need to be sorted by
region, but the way I thought I could do it isn't quite working out.
So far I can only successfully get the data alphabetically.

Any ideas?

import xlrd
import pyExcelerator

def get_countries_list(list):
countries_list=[]
for country in countries:
if country not in countries_list:
countries_list.append(country)

EU = ["Austria","Belgium", "Cyprus","Czech Republic",
"Denmark","Estonia", "Finland"]
NA = ["Canada", "United States"]
AP = ["Australia", "China", "Hong Kong", "India", "Indonesia",
"Japan"]
Regions_tot = {'European Union':EU, 'North America':NA, 'Asia
Pacific':AP,}

path_file = "c:\\1\country_data.xls"
book = xlrd.open_workbook(path_file)
Counts = book.sheet_by_index(1)
countries= Counts.col_values(0,start_rowx=1, end_rowx=None)

get_countries_list(countries)

wb=pyExcelerator.Workbook()
matrix = wb.add_sheet("matrix")

n=1
for country in unique_countries:
matrix.write(n,1, country)
n = n+1

wb.save('c:\\1\\matrix.xls')

S

Sergio Correia

Nov 16, 2007

#2

Just a few notes:

1) get_countries_list

What is the purpose of that function? Besides a few errors (an
argument named list, no value returned), it seems you just want to
remove duplicates from a list called countries. You can do that
transforming the list to a 'set'.

new_countries = list( set(countries) )

2) I would suggest using countries.sort(...) or sorted(countries,...),
specifying cmp or key options too sort by region instead.

3) Instead of doing this:

for country in unique_countries:
matrix.write(n,1, country)
n = n+1

Do something like

for i, cou in enumerate(unique_countries):
matrix.write(1+i, 1, cou)

(so you dont need to increment the 'i' variable manually)

M

martyw

Nov 16, 2007

#3

Hi all,

I'm analyzing some data that has a lot of country data. What I need
to do is sort through this data and output it into an excel doc with
summary information. The countries, though, need to be sorted by
region, but the way I thought I could do it isn't quite working out.
So far I can only successfully get the data alphabetically.

Any ideas?

import xlrd
import pyExcelerator

def get_countries_list(list):

it isn't a good idea to use a built-in object as a variable name

countries_list=[]
for country in countries:ii
if country not in countries_list:
countries_list.append(country)

EU = ["Austria","Belgium", "Cyprus","Czech Republic",
"Denmark","Estonia", "Finland"]
NA = ["Canada", "United States"]
AP = ["Australia", "China", "Hong Kong", "India", "Indonesia",
"Japan"]
Regions_tot = {'European Union':EU, 'North America':NA, 'Asia
Pacific':AP,}

i would create a class to capture country information, e.g.
class country(object):
def __init__(self, name, size = 0, population = 0):
self.name = name
self.size = size
self.poplation = population

def __cmp__(self, other):
if self.name < other.name:
return -1
elif self.name > other.name:
return 1
else:
return 0

then you can set up the world as

world = {'NA': [country("United States"), country("Canada")], \
'Europe': [country("Belgium"), country("Austria")]}

now you can sort and print it easy

for region in world:
print region
lands = world[region]
lands.sort()
for land in lands:
print land.name

the sort works because the country objects have a method __cmp__

path_file = "c:\\1\country_data.xls"
book = xlrd.open_workbook(path_file)
Counts = book.sheet_by_index(1)
countries= Counts.col_values(0,start_rowx=1, end_rowx=None)

get_countries_list(countries)

wb=pyExcelerator.Workbook()
matrix = wb.add_sheet("matrix")

n=1
for country in unique_countries:
matrix.write(n,1, country)
n = n+1

wb.save('c:\\1\\matrix.xls')

i'm not familiar with the excel modules so i can't help you with that

A

Alan

Nov 16, 2007

#4

i would create a class to capture country information, e.g.

<snip>

What is the advantage of this:

def __cmp__(self, other):
if self.name < other.name:
return -1
elif self.name > other.name:
return 1
else:
return 0

over
def __cmp__(self,other):
return cmp(self.name,other.name)

?

P

patrick.waldo

Nov 16, 2007

#5

Great, this is very helpful. I'm new to Python, so hence the
inefficient or nonsensical code!

2) I would suggest using countries.sort(...) or sorted(countries,...),
specifying cmp or key options too sort by region instead.

I don't understand how to do this. The countries.sort() lists
alphabetically and I tried to do a lambda x,y: cmp() type function,
but it doesn't sort correctly. Help with that?

For martyw's example, I don't need to get any sort of population
info. I'm actually getting the number of various types of documents.
So the entry is like this:

Argentina Food and Consumer Products Food Additives Color
Additives 1
Argentina Food and Consumer Products Food Additives Flavors 1
Argentina Food and Consumer Products Food Additives
General 6
Argentina Food and Consumer Products Food Additives labeling 1
Argentina Food and Consumer Products Food Additives Prohibited
Additives 1
Argentina Food and Consumer Products Food Contact Cellulose 1
Argentina Food and Consumer Products Food Contact Food
Packaging 1
Argentina Food and Consumer Products Food Contact Plastics 4
Argentina Food and Consumer Products Food Contact
Waxes 1
Belize
etc...

So I'll need to add up all the entries for Food Additives and Food
contacts, the other info like Color Additives isn't important.

So I will have an output like this
Food Additives Food Contact
Argentina 10 7
Belize
etc...

Thanks so much for the help!

S

Sergio Correia

Nov 17, 2007

#6

About the sort:

Check this (also on http://pastebin.com/f12b5b6ca )

def make_regions():

# Values you provided
EU = ["Austria","Belgium", "Cyprus","Czech Republic",
"Denmark","Estonia", "Finland"]
NA = ["Canada", "United States"]
AP = ["Australia", "China", "Hong Kong", "India", "Indonesia",
"Japan"]
regions = {'European Union':EU, 'North America':NA, 'Asia Pacific':AP}

ans = {}
for reg_name, reg in regions.items():
for cou in reg:
ans[cou] = reg_name
return ans

def cmp_region(cou1, cou2):
ans = cmp(regions[cou1], regions[cou2])
if ans: # If the region is the same, sort by country
return cmp(cou1, cou2)
else:
return ans

regions = make_regions()
some_countries = ['Austria', 'Canada', 'China', 'India']

print 'Old:', some_countries
some_countries.sort(cmp_region)
print 'New:', some_countries

Why that code?
Because the first thing I want is a dictionary where the key is the
name of the country and the value is the region. Then, I just make a
quick function that compares considering the region and country.
Finally, I sort.

Btw, the code is just a quick hack, as it can be improved -a lot-.

About the rest of your code:
- martyw's example is much more useful than you think. Why? because
you can just iterate across your document, adding the values you get
to the adequate object property. That is, instead of using size or
pop, use the variables you are interested in.

Best, and good luck with python,
Sergio

P

patrick.waldo

Nov 17, 2007

#7

This is how I solved it last night in my inefficient sort of way and
after re-reading some of my Python books on dictionaries. So far this
gets the job done. However, I'd like to test if there are any
countries in the excel input that are not represented, ie the input is
all the information I have and the dictionary functions as the
information I expect. What I did worked yesterday, but doesn't work
anymore more...see comment

Otherwise I tried doing this:

for i, country in countries_list:
if country in REGIONS_COUNTRIES['European Union']:
matrix.write(i+2, 1, country)
but I got "ValueError: too many values to unpack"

Again, this has been a great help. Any ideas of how I can make this a
bit more efficient, as I'm dealing with 5 regions and numerous
countries, would be greatly appreciated. Here's the code:

#keeping all the countries short
REGIONS_COUNTRIES = {'European Union':["Austria","Belgium"," "France",
"Germany", "Greece"],\
'North America':["Canada", "United States"]}

path_file = "c:\\1\\build\\data\\matrix2\\Update_oct07a.xls"

book = xlrd.open_workbook(path_file)
Counts = book.sheet_by_index(1)
wb=pyExcelerator.Workbook()
matrix = wb.add_sheet("matrix")

countries = Counts.col_values(0,start_rowx=1, end_rowx=None)
countries_list = list(set(countries))
countries_list.sort()

#This seems to not work today and I don't know why
#for country in countries_list:
# if country not in REGIONS_COUNTRIES['European Union'] or not in
REGIONS_COUNTRIES['North America']:
# print "%s is not in the expected list", country

#This sorts well
n=2
for country in countries_list:
if country in REGIONS_COUNTRIES['European Union']:
matrix.write(n, 1, country)
n=n+1
for country in countries_list:
if country in REGIONS_COUNTRIES['North America']:
matrix.write(n, 1, country)
n=n+1

wb.save('c:\\1\\matrix.xls')

About the sort:

Check this (also onhttp://pastebin.com/f12b5b6ca)

def make_regions():

# Values you provided
EU = ["Austria","Belgium", "Cyprus","Czech Republic",
"Denmark","Estonia", "Finland"]
NA = ["Canada", "United States"]
AP = ["Australia", "China", "Hong Kong", "India", "Indonesia",
"Japan"]
regions = {'European Union':EU, 'North America':NA, 'Asia Pacific':AP}

ans = {}
for reg_name, reg in regions.items():
for cou in reg:
ans[cou] = reg_name
return ans

def cmp_region(cou1, cou2):
ans = cmp(regions[cou1], regions[cou2])
if ans: # If the region is the same, sort by country
return cmp(cou1, cou2)
else:
return ans

regions = make_regions()
some_countries = ['Austria', 'Canada', 'China', 'India']

print 'Old:', some_countries
some_countries.sort(cmp_region)
print 'New:', some_countries

Why that code?
Because the first thing I want is a dictionary where the key is the
name of the country and the value is the region. Then, I just make a
quick function that compares considering the region and country.
Finally, I sort.

Btw, the code is just a quick hack, as it can be improved -a lot-.

About the rest of your code:
- martyw's example is much more useful than you think. Why? because
you can just iterate across your document, adding the values you get
to the adequate object property. That is, instead of using size or
pop, use the variables you are interested in.

Best, and good luck with python,
Sergio

Great, this is very helpful. I'm new to Python, so hence the
inefficient or nonsensical code!

Click to expand...

I don't understand how to do this. The countries.sort() lists
alphabetically and I tried to do a lambda x,y: cmp() type function,
but it doesn't sort correctly. Help with that?

Click to expand...

For martyw's example, I don't need to get any sort of population
info. I'm actually getting the number of various types of documents.
So the entry is like this:

Click to expand...

Argentina Food and Consumer Products Food Additives Color
Additives 1
Argentina Food and Consumer Products Food Additives Flavors 1
Argentina Food and Consumer Products Food Additives
General 6
Argentina Food and Consumer Products Food Additives labeling 1
Argentina Food and Consumer Products Food Additives Prohibited
Additives 1
Argentina Food and Consumer Products Food Contact Cellulose 1
Argentina Food and Consumer Products Food Contact Food
Packaging 1
Argentina Food and Consumer Products Food Contact Plastics 4
Argentina Food and Consumer Products Food Contact
Waxes 1
Belize
etc...

Click to expand...

So I'll need to add up all the entries for Food Additives and Food
contacts, the other info like Color Additives isn't important.

Click to expand...

So I will have an output like this
Food Additives Food Contact
Argentina 10 7
Belize
etc...

Click to expand...

Thanks so much for the help!
--

Click to expand...

http://mail.python.org/mailman/listinfo/python-list

Click to expand...

G

Gabriel Genellina

Nov 17, 2007

#8

En Sat said:
for i, country in countries_list:
if country in REGIONS_COUNTRIES['European Union']:
matrix.write(i+2, 1, country)
but I got "ValueError: too many values to unpack"

Remove the i, and try again...

J

J. Clifford Dyer

Nov 17, 2007

#9

#This seems to not work today and I don't know why
#for country in countries_list:
# if country not in REGIONS_COUNTRIES['European Union'] or not in
REGIONS_COUNTRIES['North America']:
# print "%s is not in the expected list", country

This snippet fails because you want 'and' rather than 'or'. In this
one, you test to see that country is not in EU, and if you have, say
'France' as your country, the first half evaluates false, so the or
tells it to try again with North America. Lo and behold, france is not
in North America, so the second half returns true. False or True
returns True, so your code prints "France is not in the expected list."

Cheers,
Cliff

M

martyw

Nov 19, 2007

#10

<snip>

What is the advantage of this:

over
def __cmp__(self,other):
return cmp(self.name,other.name)

?

yours is better!

Yet Another Tabular Data Question	2	Nov 29, 2007
action_page.php form	2	Oct 25, 2020
xlrd and cPickle.dump	6	Apr 1, 2008
Help Free business education courses in 80 countries feasibility study marketing plan	1	Mar 28, 2006
asp newbie question	3	Oct 30, 2007
Code Performance Issue	3	Feb 13, 2004
Opportunity of a lifetime to Attend a Amazing Event.	0	Apr 12, 2008
CountrySelector control	0	Feb 29, 2004

patrick.waldo

Sergio Correia

martyw

Alan

patrick.waldo

Sergio Correia

patrick.waldo

Gabriel Genellina

J. Clifford Dyer

martyw

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads