Help with inverted dictionary

R

rorley

I'm new to Python and I'm struggling. I have a text file (*.txt) with
a couple thousand entries, each on their own line (similar to a phone
book). How do I make a script to create something like an inverted
dictionary that will allow me to call "robert" and create a new text
file of all of the lines that contain "robert"?


Thanks so much.
 
A

ajikoe

Hello,

First I'm not so clear about your problem, but you can do the following
steps:


1. Transform your file into list (list1)
2. Use regex to capture 'robert' in every member of list1 and add to
list2
3. Transform your list2 into a file

pujo
 
D

Devan L

import re
name = "Robert"
f = file('phonebook.txt','r')
lines = [line.rstrip("\n") for line in f.readlines()]
pat = re.compile(name, re.I)
related_lines = [line for line in lines if pat.search(line)]

And then you write the lines in related_lines to a file. I don't really
write text to files much so, um, yeah.
 
R

rorley

OK, so my problem is I have a text file with all of these instances,
for example 5000 facts about animals. I need to go through the file
and put all of the facts (lines) that contain the word lion into a file
called lion.txt. If I come across an animal or other word for which a
file does not yet exist I need to create a file for that word and put
all instances of that word into that file. I realize that this should
probably create 30,000 files or so. Any help would be REALLY
appreciated. Thanks. Reece
 
D

Devan L

I think you need to get a database. Anyways, if anything, it should
create no more than 5,000 files, since 5,000 facts shouldn't talk about
30,000 animals. There have been a few discussions about looking at
files in directories though, if you want to look at those.
 
R

rorley

I will transfer eventually use a database but is there any way for now
you could help me make the text files? Thank you so much. Reece
 
R

rorley

I will transfer eventually use a database but is there any way for now
you could help me make the text files? Thank you so much. Reece
 
R

rorley

I will transfer eventually use a database but is there any way for now
you could help me make the text files? Thank you so much. Reece
 
D

Devan L

Oh, I seem to have missed the part saying 'or other word'. Are you
doing this for every single word in the file?
 
S

Steven D'Aprano

OK, so my problem is I have a text file with all of these instances,
for example 5000 facts about animals. I need to go through the file
and put all of the facts (lines) that contain the word lion into a file
called lion.txt. If I come across an animal or other word for which a
file does not yet exist I need to create a file for that word and put
all instances of that word into that file. I realize that this should
probably create 30,000 files or so. Any help would be REALLY
appreciated. Thanks. Reece

Sounds like homework to me...

Start by breaking the big problem down into little problems:

Step 1: read the data from the file

You do that with something like this:

data = file("MyFile.txt", "r").read()

Notice I said *something like* -- that's a hint that you want to change
that to something slightly different.

Step 2: grab each line, one at a time

Somehow you want to read lines (hint! hint!) from the file, so that you
have a list of text lines in data. How do you read lines (hint!) from a
file in Python?

Once you do that, data should look something like this:

["lions are mammals\n", "lions eat meat\n", "sheep eat grass\n"]

So you can work with each line in data with:

for line in data:
do_something(line)

Step 3: grab each word from the line

I'll make this one easy for you:

words = line.split()

words now looks like: ["lions", "are", "mammals"]

Step 4: for each word, open a file:

This one is also easy:

for word in words:
fp = file(word, "w")
fp.write(all the other words)
fp.close()

Hint: this code won't quite do what you want. You need to change a few
things.

Does this help? Is that enough to get started? See how far you get, and
then come back for more help.
 
D

Dark Cowherd

As Steven said this looks too much like home work
But what the heck I am also learning python. So I wrote a small
program. A very small program. I am fairly new to Python, I am stunned
each time to see how small programs like this can be.

Since I am also learning can somebody comment if anything here is not
Pythonesque.

dictwords = dict()
for line in open('testfile.txt','r'):
for word in line.rstrip('\n').split():
dictwords.setdefault(word,set()).update((line.rstrip('\n'),))
for wordfound in dictwords.items():
open(wordfound[0],'w').write('\n'.join(wordfound[1]))
 
R

rorley

Not quite homework but a special project. Thanks for the advice. I'll
let you know if I run into anymore stumbling blocks. Reece
 
R

rorley

Thanks for the hints, I think I've figured it out. I've only been
using Python for 2 days so I really needed the direction. If you were
curious, this is not homework but an attempt to use the ConceptNet data
(its an MIT AI project) to make a website in a Wiki-like format that
would allow the data to be edited on the fly. I'll ask again if I need
more help. You guys are great. Reece
 
J

John Machin

I will transfer eventually use a database but is there any way for now
you could help me make the text files? Thank you so much. Reece

No. There is utterly no reason why you should create 5000 or 30000 text
files. While you are waiting to get a clue about databases, do it in
Python, in memory. It should only take a very tiny time to suck your
5000-fact file into memory, index the data appropriately, and do some
queries e.g. list all facts about "lion".
 
S

Steven D'Aprano

No. There is utterly no reason why you should create 5000 or 30000 text
files.

There is one possible reason: if it is a homework assignment, and creating
all those files is part of the assignment.

(I've seen stupider ideas, but not by much.)
 
D

Dennis Lee Bieber

Thanks for the hints, I think I've figured it out. I've only been
using Python for 2 days so I really needed the direction. If you were
curious, this is not homework but an attempt to use the ConceptNet data
(its an MIT AI project) to make a website in a Wiki-like format that
would allow the data to be edited on the fly. I'll ask again if I need
more help. You guys are great. Reece

You may still want to consider filtering the words you use as
keys... Do you REALLY want something like a KWIC file just for words
like: a, in, of, the, etc.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,161
Latest member
GertrudeMa
Top