Creating a dictionary from a .txt file

C

C.T.

Hello,

I'm currently working on a homework problem that requires me to create a dictionary from a .txt file that contains some of the worst cars ever made. The file looks something like this:

1958 MGA Twin Cam
1958 Zunndapp Janus
1961 Amphicar
1961 Corvair
1966 Peel Trident
1970 AMC Gremlin
1970 Triumph Stag
1971 Chrysler Imperial LeBaron Two-Door Hardtop

The car manufacturer should be the key and a tuple containing the year and the model should be the key's value. I tried the following to just get the contents of the file into a list, but only the very last line in the txt file is shown as a list with three elements (ie, ['2004', 'Chevy', 'SSR']) when I print temp.

d={}
car_file = open('worstcars.txt', 'r')
for line in car_file:
temp = line.split()
print (temp)
car_file.close()

After playing around with the code, I came up with the following code to get everything into a list:

d=[]
car_file = open('worstcars.txt', 'r')
for line in car_file:
d.append(line.strip('\n'))
print (d)
car_file.close()

Every line is now an element in list d. The question I have now is how can I make a dictionary out of the list d with the car manufacturer as the key and a tuple containing the year and the model should be the key's value. Here is a sample of what list d looks like:

['1899 Horsey Horseless', '1909 Ford Model T', '1911 Overland OctoAuto', '2003 Hummer H2', '2004 Chevy SSR']

Any help would be appreciated!
 
C

Chris Angelico

After playing around with the code, I came up with the following code to get everything into a list:

d=[]
car_file = open('worstcars.txt', 'r')
for line in car_file:
d.append(line.strip('\n'))
print (d)
car_file.close()

Every line is now an element in list d. The question I have now is how can I make a dictionary out of the list d with the car manufacturer as the key and a tuple containing the year and the model should be the key's value.

Ah, a nice straight-forward text parsing problem!

The question is how to recognize the manufacturer. Is it guaranteed to
be the second blank-delimited word, with the year being the first? If
so, you were almost there with .split().

car_file = open('worstcars.txt', 'r')
# You may want to consider the 'with' statement here - no need to close()
for line in car_file:
temp = line.split(None, 2)
if len(temp)==3:
year, mfg, model = temp
# Now do something with these three values
print("Manufacturer: %s Year: %s Model: %s"%(mfg,year,model))

That's sorted out the parsing side of things. Do you know how to build
up the dictionary from there?

What happens if there are multiple entries in the file for the same
manufacturer? Do you need to handle that?

ChrisA
 
M

Mark Janssen

Every line is now an element in list d. The question I have now is how can
I make a dictionary out of the list d with the car manufacturer as the key
and a tuple containing the year and the model should be the key's value.
Here is a sample of what list d looks like:

['1899 Horsey Horseless', '1909 Ford Model T', '1911 Overland OctoAuto',
'2003 Hummer H2', '2004 Chevy SSR']

Any help would be appreciated!
As long as your data is consistently ordered, just use list indexing. d[2]
is your key, and (d[1],d[3]) the key's value.

Mark
Tacoma, Washington
 
R

Roy Smith

C.T. said:
Hello,

I'm currently working on a homework problem that requires me to create a
dictionary from a .txt file that contains some of the worst cars ever made.
The file looks something like this:

1958 MGA Twin Cam
1958 Zunndapp Janus
1961 Amphicar
1961 Corvair
1966 Peel Trident
1970 AMC Gremlin
1970 Triumph Stag
1971 Chrysler Imperial LeBaron Two-Door Hardtop

The car manufacturer should be the key and a tuple containing the year and
the model should be the key's value. I tried the following to just get the
contents of the file into a list, but only the very last line in the txt file
is shown as a list with three elements (ie, ['2004', 'Chevy', 'SSR']) when I
print temp.

d={}
car_file = open('worstcars.txt', 'r')
for line in car_file:
temp = line.split()
print (temp)
car_file.close()

Yup. Because you run through the whole file, putting each line into
temp, overwriting the previous temp value.
d=[]
car_file = open('worstcars.txt', 'r')
for line in car_file:
d.append(line.strip('\n'))
print (d)
car_file.close()

You could do most of that with just:

car_file = open('worstcars.txt', 'r')
d = car_file.readlines()

but there's no real reason to read the whole file into a list. What you
probably want to do is something like:

d = {}
car_file = open('worstcars.txt', 'r')
for line in car_file:
year, manufacturer, model = parse_line(line)
d[manufacturer] = (year, model)

One comment about the above; it assumes that there's only a single entry
for a given manufacturer in the file. If that's not true, the above
code will only keep the last one. But let's assume it's true for the
moment.

Now, we're just down to writing parse_line(). This takes a string and
breaks it up into 3 strings. I'm going to leave this as an exercise for
you to work out. The complicated part is going to be figuring out some
logic to deal with anything from multi-word model names ("Imperial
LeBaron Two-Door Hardtop"), to lines like the Corvair where there is no
manufacturer (or maybe there's no model?).
 
C

C.T.

Every line is now an element in list d. The question I have now is how can I make a dictionary out of the list d with the car manufacturer as the key and a tuple containing the year and the model should be the key's value. Here is a sample of what list d looks like:




['1899 Horsey Horseless', '1909 Ford Model T', '1911 Overland OctoAuto', '2003 Hummer H2', '2004 Chevy SSR']



Any help would be appreciated!




As long as your data is consistently ordered, just use list indexing.  d[2] is your key, and (d[1],d[3]) the key's value.



Mark
Tacoma, Washington


Thank you, Mark! My problem is the data isn't consistently ordered. I can use slicing and indexing to put the year into a tuple, but because a car manufacturer could have two names (ie, Aston Martin) or a car model could havetwo names(ie, Iron Duke), its harder to use slicing and indexing for thosetwo. I've added the following, but the output is still not what I need itto be.


t={}
for i in d :
t[d[d.index(i)][5:]]= tuple(d[d.index(i)][:4])

print (t)

The output looks something like this:

{'Ford Model T': ('1', '9', '0', '9'), 'Mosler Consulier GTP': ('1', '9', '8', '5'), 'Scripps-Booth Bi-Autogo': ('1', '9', '1', '3'), 'Morgan Plus 8 Propane': ('1', '9', '7', '5'), 'Fiat Multipla': ('1', '9', '9', '8'), 'FordPinto': ('1', '9', '7', '1'), 'Triumph Stag': ('1', '9', '7', '0'), 'BMW 7-series': ('2', '0', '0', '2')}


Here the key is the car manufacturer and car model and the value is a tuplecontaining the year separated by a comma.( Not sure why that is ?)
 
C

C.T.

Every line is now an element in list d. The question I have now is how can I make a dictionary out of the list d with the car manufacturer as the key and a tuple containing the year and the model should be the key's value. Here is a sample of what list d looks like:




['1899 Horsey Horseless', '1909 Ford Model T', '1911 Overland OctoAuto', '2003 Hummer H2', '2004 Chevy SSR']



Any help would be appreciated!




As long as your data is consistently ordered, just use list indexing.  d[2] is your key, and (d[1],d[3]) the key's value.



Mark
Tacoma, Washington


Thank you, Mark! My problem is the data isn't consistently ordered. I can use slicing and indexing to put the year into a tuple, but because a car manufacturer could have two names (ie, Aston Martin) or a car model could havetwo names(ie, Iron Duke), its harder to use slicing and indexing for thosetwo. I've added the following, but the output is still not what I need itto be.


t={}
for i in d :
t[d[d.index(i)][5:]]= tuple(d[d.index(i)][:4])

print (t)

The output looks something like this:

{'Ford Model T': ('1', '9', '0', '9'), 'Mosler Consulier GTP': ('1', '9', '8', '5'), 'Scripps-Booth Bi-Autogo': ('1', '9', '1', '3'), 'Morgan Plus 8 Propane': ('1', '9', '7', '5'), 'Fiat Multipla': ('1', '9', '9', '8'), 'FordPinto': ('1', '9', '7', '1'), 'Triumph Stag': ('1', '9', '7', '0'), 'BMW 7-series': ('2', '0', '0', '2')}


Here the key is the car manufacturer and car model and the value is a tuplecontaining the year separated by a comma.( Not sure why that is ?)
 
C

C.T.

On Mon, Apr 1, 2013 at 2:52 AM, C.T.
After playing around with the code, I came up with the following code to get everything into a list:

car_file = open('worstcars.txt', 'r')
for line in car_file:

print (d)


Every line is now an element in list d. The question I have now is how can I make a dictionary out of the list d with the car manufacturer as the key and a tuple containing the year and the model should be the key's value.



Ah, a nice straight-forward text parsing problem!



The question is how to recognize the manufacturer. Is it guaranteed to

be the second blank-delimited word, with the year being the first? If

so, you were almost there with .split().



car_file = open('worstcars.txt', 'r')

# You may want to consider the 'with' statement here - no need to close()

for line in car_file:

temp = line.split(None, 2)

if len(temp)==3:

year, mfg, model = temp

# Now do something with these three values

print("Manufacturer: %s Year: %s Model: %s"%(mfg,year,model))



That's sorted out the parsing side of things. Do you know how to build

up the dictionary from there?



What happens if there are multiple entries in the file for the same

manufacturer? Do you need to handle that?



ChrisA

Thank you, Chris! I could use slicing and indexing to build the dictionary but the problem is with the car manufacturer an the car model. Either or both could be multiple names.
 
C

C.T.

On Mon, Apr 1, 2013 at 2:52 AM, C.T.
After playing around with the code, I came up with the following code to get everything into a list:

car_file = open('worstcars.txt', 'r')
for line in car_file:

print (d)


Every line is now an element in list d. The question I have now is how can I make a dictionary out of the list d with the car manufacturer as the key and a tuple containing the year and the model should be the key's value.



Ah, a nice straight-forward text parsing problem!



The question is how to recognize the manufacturer. Is it guaranteed to

be the second blank-delimited word, with the year being the first? If

so, you were almost there with .split().



car_file = open('worstcars.txt', 'r')

# You may want to consider the 'with' statement here - no need to close()

for line in car_file:

temp = line.split(None, 2)

if len(temp)==3:

year, mfg, model = temp

# Now do something with these three values

print("Manufacturer: %s Year: %s Model: %s"%(mfg,year,model))



That's sorted out the parsing side of things. Do you know how to build

up the dictionary from there?



What happens if there are multiple entries in the file for the same

manufacturer? Do you need to handle that?



ChrisA

Thank you, Chris! I could use slicing and indexing to build the dictionary but the problem is with the car manufacturer an the car model. Either or both could be multiple names.
 
C

Chris Angelico

Thank you, Chris! I could use slicing and indexing to build the dictionary but the problem is with the car manufacturer an the car model. Either or both could be multiple names.

Then you're going to need some other form of magic to recognize where
the manufacturer ends and the model starts. Do you have, say, tabs
between the fields and spaces within?

ChrisA
 
C

C.T.

"C.T." wrote:


Hello,

I'm currently working on a homework problem that requires me to create a
dictionary from a .txt file that contains some of the worst cars ever made.
The file looks something like this:

1958 MGA Twin Cam
1958 Zunndapp Janus
1961 Amphicar
1961 Corvair
1966 Peel Trident
1970 AMC Gremlin
1970 Triumph Stag
1971 Chrysler Imperial LeBaron Two-Door Hardtop

The car manufacturer should be the key and a tuple containing the year and
the model should be the key's value. I tried the following to just get the
contents of the file into a list, but only the very last line in the txt file
is shown as a list with three elements (ie, ['2004', 'Chevy', 'SSR']) when I
print temp.


car_file = open('worstcars.txt', 'r')
for line in car_file:
temp = line.split()
print (temp)
car_file.close()



Yup. Because you run through the whole file, putting each line into

temp, overwriting the previous temp value.


car_file = open('worstcars.txt', 'r')
for line in car_file:

print (d)
car_file.close()



You could do most of that with just:



car_file = open('worstcars.txt', 'r')

d = car_file.readlines()



but there's no real reason to read the whole file into a list. What you

probably want to do is something like:



d = {}

car_file = open('worstcars.txt', 'r')

for line in car_file:

year, manufacturer, model = parse_line(line)

d[manufacturer] = (year, model)



One comment about the above; it assumes that there's only a single entry

for a given manufacturer in the file. If that's not true, the above

code will only keep the last one. But let's assume it's true for the

moment.



Now, we're just down to writing parse_line(). This takes a string and

breaks it up into 3 strings. I'm going to leave this as an exercise for

you to work out. The complicated part is going to be figuring out some

logic to deal with anything from multi-word model names ("Imperial

LeBaron Two-Door Hardtop"), to lines like the Corvair where there is no

manufacturer (or maybe there's no model?).

Roy, thank you so much! I'll do some more research to see how I can achieve this. Thank you!
 
D

Dave Angel

Thank you, Mark! My problem is the data isn't consistently ordered. I can use slicing and indexing to put the year into a tuple, but because a car manufacturer could have two names (ie, Aston Martin) or a car model could have two names(ie, Iron Duke), its harder to use slicing and indexing for those two. I've added the following, but the output is still not what I need it to be.

So the correct answer is "it cannot be done," and an explanation.

Many times I've been given impossible conditions for a problem. And
invariably the correct solution is to press bac on the supplier of the
constraints.

Unless there are some invisible characters in that file, lie tabs in
between the fields, it loocs liec you're out of luc. Or you could
manually edit the file before running the program.

[The character after 'j' is broccen on this cceyboard.]
 
R

Roy Smith

Dave Angel said:
Thank you, Mark! My problem is the data isn't consistently ordered. I can
use slicing and indexing to put the year into a tuple, but because a car
manufacturer could have two names (ie, Aston Martin) or a car model could
have two names(ie, Iron Duke), its harder to use slicing and indexing for
those two. I've added the following, but the output is still not what I
need it to be.

So the correct answer is "it cannot be done," and an explanation.

Many times I've been given impossible conditions for a problem. And
invariably the correct solution is to press [back] on the supplier of the
constraints.

In real life, you often have to deal with crappy input data (and bogus
project requirements). Sometimes you just need to be creative.

There's only a small set of car manufacturers. A good start would be
mining wikipedia's [[List of automobile manufacturers]]. Once you've
got that list, you could try matching portions of the input against the
list.

Depending on how much effort you wanted to put into this, you could
explore all sorts of fuzzy matching (ie "delorean" vs "delorean motor
company"), but even a simple search is better than giving up.

And, this is a good excuse to explore some of the interesting
third-party modules. For example, mwclient ("pip install mwclient")
gives you a neat Python interface to wikipedia. And there's a whole
landscape of string matching packages to explore.

We deal with this every day at Songza. Are Kesha and Ke$ha the same
artist? Pushing back on the record labels to clean up their catalogs
isn't going to get us very far.
 
T

Terry Jan Reedy

Hello,

I'm currently working on a homework problem that requires me to create a dictionary from a .txt file that contains some of the worst cars ever made. The file looks something like this:

1958 MGA Twin Cam
1958 Zunndapp Janus
1961 Amphicar
1961 Corvair
1966 Peel Trident
1970 AMC Gremlin
1970 Triumph Stag
1971 Chrysler Imperial LeBaron Two-Door Hardtop

The car manufacturer should be the key and a tuple containing the year and the model should be the key's value. I tried the following to just get the contents of the file into a list, but only the very last line in the txt file is shown as a list with three elements (ie, ['2004', 'Chevy', 'SSR']) when I print temp.

d={}
car_file = open('worstcars.txt', 'r')
for line in car_file:
temp = line.split()

If all makers are one word (Austen-Martin would be ok, and if the file
is otherwise consistently year maker model words, then adding
'maxsplit=3' to the split call would be all the parsing you need.
 
D

Dave Angel

Dave Angel said:
On Sunday, March 31, 2013 12:20:25 PM UTC-4, zipher wrote:
<SNIP>


Thank you, Mark! My problem is the data isn't consistently ordered. I can
use slicing and indexing to put the year into a tuple, but because a car
manufacturer could have two names (ie, Aston Martin) or a car model could
have two names(ie, Iron Duke), its harder to use slicing and indexing for
those two. I've added the following, but the output is still not what I
need it to be.

So the correct answer is "it cannot be done," and an explanation.

Many times I've been given impossible conditions for a problem. And
invariably the correct solution is to press [back] on the supplier of the
constraints.

In real life, you often have to deal with crappy input data (and bogus
project requirements). Sometimes you just need to be creative.

There's only a small set of car manufacturers. A good start would be
mining wikipedia's [[List of automobile manufacturers]]. Once you've
got that list, you could try matching portions of the input against the
list.

Depending on how much effort you wanted to put into this, you could
explore all sorts of fuzzy matching (ie "delorean" vs "delorean motor
company"), but even a simple search is better than giving up.

And, this is a good excuse to explore some of the interesting
third-party modules. For example, mwclient ("pip install mwclient")
gives you a neat Python interface to wikipedia. And there's a whole
landscape of string matching packages to explore.

We deal with this every day at Songza. Are Kesha and Ke$ha the same
artist? Pushing back on the record labels to clean up their catalogs
isn't going to get us very far.

I agree with everything you've said, although in your case, presumably
the record labels are not your client/boss, so that's not who you push
back against. The client should know when the data is being fudged, and
have a say in how it's to be done.

But this is a homework assignment. I think the OP is learning Python,
not how to second-guess a client.
 
N

Neil Cerutti

And, this is a good excuse to explore some of the interesting
third-party modules. For example, mwclient ("pip install
mwclient") gives you a neat Python interface to wikipedia. And
there's a whole landscape of string matching packages to
explore.

We deal with this every day at Songza. Are Kesha and Ke$ha the
same artist? Pushing back on the record labels to clean up
their catalogs isn't going to get us very far.

I tried searching for Frost*, an interesting artist I recently
learned about. His name, in combination with a similarly named
rap artist, breaks most search tools.

My guess is this homework is simply borken.
 
S

Steven D'Aprano

I tried searching for Frost*, an interesting artist I recently learned
about.

"Interesting artist" -- is that another term for "wanker"?

*wink*

His name, in combination with a similarly named rap artist,
breaks most search tools.

As far as I'm concerned, anyone in the 21st century who names themselves
or their work (a movie, book, programming language, etc.) something which
breaks search tools is just *begging* for obscurity, and we ought to
respect their wishes.
 
C

C.T.

Thanks for all the help everyone! After I manually edited the txt file, this is what I came up with:

car_dict = {}
car_file = open('cars.txt', 'r')



for line in car_file:
temp = line.strip().split(None, 2)
temp2 = line.strip().split('\t')


if len(temp)==3:
year, manufacturer, model = temp[0] ,temp2[0][5:], temp2[1]
value = (year, model)
if manufacturer in car_dict:
car_dict.setdefault(manufacturer,[]).append(value)
else:
car_dict[manufacturer] = [value]


elif len(temp)==2:
year, manufacturer, model = temp[0], 'Unknown' , temp2[1]
value = (year, model)
if manufacturer in car_dict:
car_dict.setdefault(manufacturer,[]).append(value)
else:
car_dict[manufacturer] = [value]


car_file.close()

print (car_dict)

It may not be the most pythonic way of doing this, but it works for me. I am learning python, and this problem was problem the most challenging so far. Thank you all, again!
 
D

Dave Angel

Thanks for all the help everyone! After I manually edited the txt file, this is what I came up with:

car_dict = {}
car_file = open('cars.txt', 'r')



for line in car_file:
temp = line.strip().split(None, 2)
temp2 = line.strip().split('\t')


if len(temp)==3:
year, manufacturer, model = temp[0] ,temp2[0][5:], temp2[1]
value = (year, model)
if manufacturer in car_dict:
car_dict.setdefault(manufacturer,[]).append(value)

That's rather redundant. Once you've determined that the particular key
is already there, why bother with the setdefault() call? Or to put it
another way, why bother to test if it's there when you're going to use
setdefault to handle the case where it's not?

else:
car_dict[manufacturer] = [value]


elif len(temp)==2:
year, manufacturer, model = temp[0], 'Unknown' , temp2[1]
value = (year, model)
if manufacturer in car_dict:
car_dict.setdefault(manufacturer,[]).append(value)
else:
car_dict[manufacturer] = [value]


car_file.close()

print (car_dict)

It may not be the most pythonic way of doing this, but it works for me. I am learning python, and this problem was problem the most challenging so far. Thank you all, again!
 
N

Neil Cerutti

"Interesting artist" -- is that another term for "wanker"?

*wink*

hee-hee. It depends on how much of a hankering you have for
pretentious progressive synth-rock.
As far as I'm concerned, anyone in the 21st century who names
themselves or their work (a movie, book, programming language,
etc.) something which breaks search tools is just *begging* for
obscurity, and we ought to respect their wishes.

I do think it's something he did on purpose. The asterisk, I
believe, symbolizes the exclusive genius of his fans.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top