Problem with list.insert

S

SUBHABRATA

Dear Group,
I wrote one program,
There is a dictionary.
There is an input string.
Every word of input string the word is matched against the dictionary
If the word of input string is matched against the dictionary it gives
the word of the dictionary.
But if it does not find it gives the original word.
After searching the words are joined back.
But as I am joining I am finding the words which are not available in
dictionary are printed in the last even if the word is given in the
first/middle.
Now, I want to take them in order.
I am applying a thumb rule that the position of the word of the string
is exact with the resultant string.
So, I am determining the word which is not in the dictionary, and its
position in the input string.
Now I am inserting it in the target string, for this I am splitting
both the given string and the output/result string.
Till now it is working fine.
But a problem happening is that if I insert it it is inserting same
words multiple times and the program seems to be an unending process.
What is the error happening?
If any one can suggest.
The code is given below:
import re
def wordchecker1(n):
# INPUTTING STRING
a1=raw_input("PRINT ONE ENGLISH SENTENCE FOR DICTIONARY CHECK:")
#CONVERTING TO LOWER CASE
a2=a1.lower()
#CONVERTING INTO LIST
a3=a2.split()
#DICTIONARY
a4=open("/python25/Changedict3.txt","r")
a5=a4.read()
a6=a5.split()
found=[]
not_found=[]
#SEARCHING DICTIONARY
for x in a3:
a7="\n"
a8=a7+x
if a8 in a5:
a9=a5.index(a8)
a10=a5[a9:]
a11=re.search("\xe0.*?\n",a10)
a12=a11.group()
a13=a12[:-1]
found.append(a13)
elif a8 not in a5:
a14=x
not_found.append(a14)
else:
print "Error"
found.extend(not_found)
# THE OUTPUT
print "OUTPUT STRING IS"
a15=(' '.join(found))
#THE OUTPUT STRING
print a15
# SPLITTING OUTPUT STRING IN WORDS
a16=a15.split()
#TAKING OUT THE WORD FROM OUTPUT STRING
for word in a16:
#MATCHING WITH GIVEN STRING
a17=a2.find(word)
if a17>-1:
print "The word is found in the Source String"
a18=a3.index(word)
a19=a3[a18]
print a19
#INSERTING IN THE LIST OF TARGET STRING
a20=a16.insert(a18,a19)
print a16
a21=(" ".join(a16))
print a21
Best Regards,
Subhabrata.
 
M

Marc 'BlackJack' Rintsch

import re
def wordchecker1(n):
# INPUTTING STRING
a1=raw_input("PRINT ONE ENGLISH SENTENCE FOR DICTIONARY CHECK:")
#CONVERTING TO LOWER CASE
a2=a1.lower()
#CONVERTING INTO LIST
a3=a2.split()
#DICTIONARY
a4=open("/python25/Changedict3.txt","r") a5=a4.read()
a6=a5.split()
found=[]
not_found=[]
#SEARCHING DICTIONARY
for x in a3:
a7="\n"
a8=a7+x
if a8 in a5:
a9=a5.index(a8)
a10=a5[a9:]
a11=re.search("\xe0.*?\n",a10)
a12=a11.group()
a13=a12[:-1]
found.append(a13)
elif a8 not in a5:
a14=x
not_found.append(a14)
else:
print "Error"
found.extend(not_found)
# THE OUTPUT
print "OUTPUT STRING IS"
a15=(' '.join(found))
#THE OUTPUT STRING
print a15
# SPLITTING OUTPUT STRING IN WORDS
a16=a15.split()
#TAKING OUT THE WORD FROM OUTPUT STRING for word in a16:
#MATCHING WITH GIVEN STRING
a17=a2.find(word)
if a17>-1:
print "The word is found in the Source String"
a18=a3.index(word)
a19=a3[a18]
print a19
#INSERTING IN THE LIST OF TARGET STRING
a20=a16.insert(a18,a19)
print a16
a21=(" ".join(a16))
print a21

a1, a2, a2, …, a20? You must be kidding. Please stop numbering names
and use *meaningful* names instead!

Could you describe them problem better, with sample inputs and expected
outputs. There must be a better way that that unreadable mess above.

Ciao,
Marc 'BlackJack' Rintsch
 
S

SUBHABRATA

Some people in the room told I am kidding, but I learnt Python from
Python docs which gives examples like these,
But I write explicit comments,
an excerpt from python docs:
# Measure some strings:
.... a = ['cat', 'window', 'defenestrate'].... print x, len(x)
....
cat 3
window 6
defenestrate 12
But well, if you are suggesting improvement I'll surely listen.

The outputs are given in Hindi, it is a dictionary look up program,
the matching words are in Hindi, you may leave aside them.
How to debug the result string is to see the words which are in
English as the group page does not take italics so I am putting one
asterisk* after it
NO PROBLEM:
INPUT:
he has come
OUTPUT IS
उओह/ उनà¥à¤¹à¥‹à¤¨à¥‡ रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾
PROBLEM:
INPUT:
(i) Lincoln* has come
OUTPUT IS:
रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾ lincoln*
lincoln lincoln* रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾ lincoln
lincoln lincoln* lincoln* रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾ lincoln
….and increasing the number and seems a never ending process.
MY EXPEPECTED STRING IS:
lincoln रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾ lincoln^
The latter places marked^ I am editing don't worry for that,
though MY FINAL EXPECTED STRING IS:
lincoln रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾
Best Regards,
Subhabrata.


import re
def wordchecker1(n):
# INPUTTING STRING
a1=raw_input("PRINT ONE ENGLISH SENTENCE FOR DICTIONARY CHECK:")
#CONVERTING TO LOWER CASE
a2=a1.lower()
#CONVERTING INTO LIST
a3=a2.split()
#DICTIONARY
a4=open("/python25/Changedict3.txt","r") a5=a4.read()
a6=a5.split()
found=[]
not_found=[]
#SEARCHING DICTIONARY
for x in a3:
a7="\n"
a8=a7+x
if a8 in a5:
a9=a5.index(a8)
a10=a5[a9:]
a11=re.search("\xe0.*?\n",a10)
a12=a11.group()
a13=a12[:-1]
found.append(a13)
elif a8 not in a5:
a14=x
not_found.append(a14)
else:
print "Error"
found.extend(not_found)
# THE OUTPUT
print "OUTPUT STRING IS"
a15=(' '.join(found))
#THE OUTPUT STRING
print a15
# SPLITTING OUTPUT STRING IN WORDS
a16=a15.split()
#TAKING OUT THE WORD FROM OUTPUT STRING for word in a16:
#MATCHING WITH GIVEN STRING
a17=a2.find(word)
if a17>-1:
print "The word is found in the Source String"
a18=a3.index(word)
a19=a3[a18]
print a19
#INSERTING IN THE LIST OF TARGET STRING
a20=a16.insert(a18,a19)
print a16
a21=(" ".join(a16))
print a21

a1, a2, a2, …, a20? You must be kidding. Please stop numbering names
and use *meaningful* names instead!

Could you describe them problem better, with sample inputs and expected
outputs. There must be a better way that that unreadable mess above.

Ciao,
Marc 'BlackJack' Rintsch
 
D

Diez B. Roggisch

SUBHABRATA said:
Some people in the room told I am kidding, but I learnt Python from
Python docs which gives examples like these,
But I write explicit comments,
an excerpt from python docs:
# Measure some strings:
... a = ['cat', 'window', 'defenestrate']... print x, len(x)
...
cat 3
window 6
defenestrate 12
But well, if you are suggesting improvement I'll surely listen.

Please! Just because a tiny 3 lines example involing just *one* list
doesn't give that a long & speaking name does not mean
The outputs are given in Hindi, it is a dictionary look up program,
the matching words are in Hindi, you may leave aside them.
How to debug the result string is to see the words which are in
English as the group page does not take italics so I am putting one
asterisk* after it
NO PROBLEM:
INPUT:
he has come
OUTPUT IS
उओह/ उनà¥à¤¹à¥‹à¤¨à¥‡ रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾
PROBLEM:
INPUT:
(i) Lincoln* has come
OUTPUT IS:
रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾ lincoln*
lincoln lincoln* रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾ lincoln
lincoln lincoln* lincoln* रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾ lincoln
….and increasing the number and seems a never ending process.
MY EXPEPECTED STRING IS:
lincoln रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾ lincoln^
The latter places marked^ I am editing don't worry for that,
though MY FINAL EXPECTED STRING IS:
lincoln रहेसाकà¥à¤¤à¤¾ २.यातà¥à¤°à¤¾à¤•à¤°à¤¨à¤¾
Best Regards,
Subhabrata.


import re
def wordchecker1(n):
# INPUTTING STRING
a1=raw_input("PRINT ONE ENGLISH SENTENCE FOR DICTIONARY CHECK:")
#CONVERTING TO LOWER CASE
a2=a1.lower()
#CONVERTING INTO LIST
a3=a2.split()
#DICTIONARY
a4=open("/python25/Changedict3.txt","r") a5=a4.read()
a6=a5.split()
found=[]
not_found=[]
#SEARCHING DICTIONARY
for x in a3:
a7="\n"
a8=a7+x
if a8 in a5:
a9=a5.index(a8)
a10=a5[a9:]
a11=re.search("\xe0.*?\n",a10)
a12=a11.group()
a13=a12[:-1]
found.append(a13)
elif a8 not in a5:
a14=x
not_found.append(a14)
else:
print "Error"
found.extend(not_found)
# THE OUTPUT
print "OUTPUT STRING IS"
a15=(' '.join(found))
#THE OUTPUT STRING
print a15
# SPLITTING OUTPUT STRING IN WORDS
a16=a15.split()
#TAKING OUT THE WORD FROM OUTPUT STRING for word in a16:
#MATCHING WITH GIVEN STRING
a17=a2.find(word)
if a17>-1:
print "The word is found in the Source String"
a18=a3.index(word)
a19=a3[a18]
print a19
#INSERTING IN THE LIST OF TARGET STRING
a20=a16.insert(a18,a19)
print a16
a21=(" ".join(a16))
print a21
a1, a2, a2, …, a20? You must be kidding. Please stop numbering names
and use *meaningful* names instead!

Could you describe them problem better, with sample inputs and expected
outputs. There must be a better way that that unreadable mess above.

Ciao,
Marc 'BlackJack' Rintsch
 
D

Diez B. Roggisch

Diez said:
SUBHABRATA said:
Some people in the room told I am kidding, but I learnt Python from
Python docs which gives examples like these,
But I write explicit comments,
an excerpt from python docs:
# Measure some strings:
... a = ['cat', 'window', 'defenestrate']
for x in a:
... print x, len(x)
...
cat 3
window 6
defenestrate 12
But well, if you are suggesting improvement I'll surely listen.

Please! Just because a tiny 3 lines example involing just *one* list
doesn't give that a long & speaking name does not mean

discard my last post - I accidentially pressed submit to early.

Numbering variable names surely is *not* found in any python example.
Short names, as the examples are clear & don't require more meaningful
names occur, yes. But nowhere you will find 2-figure enumerations.

Each book or tutorial about programming will teach you to use meaningful
variables for your program.

As far as your explanation goes: there is *nothing* to be understood
from a bunch of questionmarks + sometimes "lincoln" spread in between is
not really helping.

This is most probably not your fault, as somehow the hindi get's twisted
to the questionmarks - however, I suggest you provide an example where
the hindi is replaced with english words (translations, or placeholders)
- otherwise, you won't be understood, and can't be helped.

Diez
 
B

bearophileHUGS

Subhabrata, it's very difficult for me to understand what your short
program has to do, or what you say. I think that formatting and code
style are important.

So I suggest you to give meaningful names to all your variable names,
to remove unused variables (like n), to add blank likes here and there
to separate logically separated parts of your program, or even better
to split it into functions. You can remove some intermediate function,
coalescing few logically related operations into a line, you can put
spaces around operators like = and after a commas, you can show an
usage example in English, so people can understand what the program is
supposed to to, you can avoid joining and then splitting strings
again, remove useless () around certain things.

This is a possible re-write of the first part of your code, it's not
exactly equal...


def input_words():
input_message = "Print one English sentence for dictionary check:
"
return raw_input(input_message).lower().split()


def load_dictionary():
return set(line.rstrip() for line in open("words.txt"))


def dictionary_search(dictionary, words):
found = []
not_found = []

for word in words:
if word in dictionary:
found.append(word)
else:
not_found.append(word)

return found + not_found


inwords = input_words()
dictionary = load_dictionary()
print dictionary_search(dictionary, inwords)


It's far from perfect, but you can use it as starting point for a
rewrite of your whole program.

Bye,
bearophile
 
C

castironpi

Dear Group,
I wrote one program,
There is a dictionary.
There is an input string.
Every word of input string the word is matched against the dictionary
If the word of input string is matched against the dictionary it gives
the word of the dictionary.
But if it does not find it gives the original word.
After searching the words are joined back.
But as I am joining I am finding the words which are not available in
dictionary are printed in the last even if the word is given in the
first/middle.
Now, I want to take them in order.
I am applying a thumb rule that the position of the word of the string
is exact with the resultant string.
So, I am determining the word which is not in the dictionary, and its
position in the input string.
Now I am inserting it in the target string, for this I am splitting
both the given string and the output/result string.
Till now it is working fine.
But a problem happening is that if I insert it it is inserting same
words multiple times and the program seems to be an unending process.
What is the error happening?
If any one can suggest.
The code is given below:

Warning, -spoiler-.

Instead split up your inputs first thing.

trans= { 'a': 'A', 'at': 'AT', 'to': 'TO' }
sample= 'a boy at the park walked to the tree'
expected= 'A boy AT the park walked TO the tree'

sample_list= sample.split( )
for i, x in enumerate( sample_list ):
if x in trans:
sample_list[ i ]= trans[ x ]

result= ' '.join( sample_list )
print result
assert result== expected

Then replace them as you visit each one, and join them later.
 
T

Terry Reedy

SUBHABRATA, I recommend you study this excellent response carefully.
-.

Instead split up your inputs first thing.

trans= { 'a': 'A', 'at': 'AT', 'to': 'TO' }
sample= 'a boy at the park walked to the tree'
expected= 'A boy AT the park walked TO the tree'

It starts with a concrete test case -- an 'executable problem
statement'. To me, this is cleared and more useful than the 20 lines of
prose you used. A single line English statement would be "Problem:
Replace selected words in a text using a dictionary." Sometimes, less
(words) really is more (understanding).

If the above is *not* what you meant, then give a similarly concrete
example that does what you *do* mean.
sample_list= sample.split( )
for i, x in enumerate( sample_list ):
if x in trans:
sample_list[ i ]= trans[ x ]

Meaningful names make the code easy to understand. Meaningless numbered
'a's require each reader to create meaningful names and associate them
in his/her head. But that is part of the job of the programmer.
result= ' '.join( sample_list )
print result
assert result== expected

It ends with an automated test that is easy to rerun should the code in
between need to be modified. Assert only prints something if there is
an error. With numerous tests, that is what one often wants. But with
only one, your might prefer 'print' instead of 'assert' to get a more
reassuring and satisfying 'True' printed.
Then replace them as you visit each one, and join them later.

If you are using Hindi characters, you might want to use Python3 when it
arrives, since it will use Unicode strings as the (default) string type.
But for posting here, stick with the ascii subset.

Terry Jan Reedy
 
S

SUBHABRATA

Dear group,
Thanx for your idea to use dictionary instead of a list. Your code is
more or less, OK, some problems are there, I'll debug them. Well, I
feel the insert problem is coming because of the Hindi thing.
And Python2.5 is supporting Hindi quite fluently.
I am writing in Python2.5.1.
Best Regards,
Subhabrata.

Terry said:
SUBHABRATA, I recommend you study this excellent response carefully.
-.

Instead split up your inputs first thing.

trans= { 'a': 'A', 'at': 'AT', 'to': 'TO' }
sample= 'a boy at the park walked to the tree'
expected= 'A boy AT the park walked TO the tree'

It starts with a concrete test case -- an 'executable problem
statement'. To me, this is cleared and more useful than the 20 lines of
prose you used. A single line English statement would be "Problem:
Replace selected words in a text using a dictionary." Sometimes, less
(words) really is more (understanding).

If the above is *not* what you meant, then give a similarly concrete
example that does what you *do* mean.
sample_list= sample.split( )
for i, x in enumerate( sample_list ):
if x in trans:
sample_list[ i ]= trans[ x ]

Meaningful names make the code easy to understand. Meaningless numbered
'a's require each reader to create meaningful names and associate them
in his/her head. But that is part of the job of the programmer.
result= ' '.join( sample_list )
print result
assert result== expected

It ends with an automated test that is easy to rerun should the code in
between need to be modified. Assert only prints something if there is
an error. With numerous tests, that is what one often wants. But with
only one, your might prefer 'print' instead of 'assert' to get a more
reassuring and satisfying 'True' printed.
Then replace them as you visit each one, and join them later.

If you are using Hindi characters, you might want to use Python3 when it
arrives, since it will use Unicode strings as the (default) string type.
But for posting here, stick with the ascii subset.

Terry Jan Reedy
 
J

John Machin

Dear group,
Thanx for your idea to use dictionary instead of a list. Your code is
more or less, OK, some problems are there, I'll debug them. Well, I
feel the insert problem is coming because of the Hindi thing.

It's nothing to do with the Hindi thing. Quite simply, you are
inserting into the list over which you are iterating; this is the
"a16" in the first and last lines in the following snippet from your
code. The result of doing such a thing (in general, mutating a
container that is being iterated over) is not defined and can cause
all sorts of problems. It can be avoided by iterating over a copy of
the container that you want to change. However I suggest that you
seriously look at what you are actually trying to achieve, and rewrite
it.

for word in a16:
#MATCHING WITH GIVEN STRING
a17=a2.find(word)
if a17>-1:
print "The word is found in the Source String"
a18=a3.index(word)
a19=a3[a18]
print a19
#INSERTING IN THE LIST OF TARGET STRING
a20=a16.insert(a18,a19)

This code has several problems:
if a8 in a5:
a9=a5.index(a8)
a10=a5[a9:]
a11=re.search("\xe0.*?\n",a10)
a12=a11.group()
a13=a12[:-1]
found.append(a13)
elif a8 not in a5:
a14=x
not_found.append(a14)
else:
print "Error"
found.extend(not_found)

(1) If you ever execute that print statement, it means that the end of
the universe is nigh -- throw away the else part and replace "elif a8
not in a5" with "else".

(2) The statement "found.extend(not_found)" is emitting a very foul
aroma. Your "found" list ends up with the translated words followed by
the untranslated words -- this is not very useful and you then have to
write some weird code to try to untangle it; just build your desired
output as you step through the words to be translated.

(3) Your "dictionary" is implemented as a string of the whole
dictionary contents -- you are linearly searching a long string for
each input word. You should load your dictionary file into a Python
dictionary, and load it *once* at the start of your program, not once
per input sentence.
And Python2.5 is supporting Hindi quite fluently.

Python supports any 8-bit encoding to the extent that the platform's
console can display the characters correctly. What is the '\xe0'? The
PC-ISCII ATR character?

Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top