How to read space separated file in python?

G

ganesh gajre

Hi all,

I want to read file which is mapping file. Used in to map character from ttf
to unicode.
eg

Map file contain data in the following way:

0 ०
1 १
2 २
3 ३
4 ४
5 ५
6 ६
7 ७
8 ८
9 ९

Like this. Please use any unicode editor to view the text if it not properly
shown.

Now i want to read both the character separately like:

str[0]=0 and str2[0]=०

How can i do this?

please give me solution?

Regards,
Ginovation
 
S

Steven D'Aprano

Hi all,

I want to read file which is mapping file. Used in to map character from
ttf to unicode.
eg

Map file contain data in the following way:

0 ०
1 १
2 २
3 ३
4 ४
5 ५
6 ६
7 ७
8 ८
9 ९

Like this. Please use any unicode editor to view the text if it not
properly shown.

Now i want to read both the character separately like:

str[0]=0 and str2[0]=०

How can i do this?

please give me solution?

Well, because you said please...

I assume the encoding of the second column is utf-8. You need something
like this:


# Untested.
column0 = []
column1 = []
for line in open('somefile', 'r'):
a, b = line.split()
column0.append(a)
column1.append(b.decode('utf-8'))
 
P

Peter Otten

ganesh said:
Hi all,

I want to read file which is mapping file. Used in to map character from
ttf to unicode.
eg

Map file contain data in the following way:

0 ०
1 १
2 २
3 ३
4 ४
5 ५
6 ६
7 ७
8 ८
9 ९

Like this. Please use any unicode editor to view the text if it not
properly shown.

Now i want to read both the character separately like:

str[0]=0 and str2[0]=०

How can i do this?

please give me solution?

Read the file:
import codecs
pairs = [line.split() for line in codecs.open("ganesh.txt", encoding="utf-8")]
pairs[0]
[u'0', u'\u0966']

Create the conversion dictionary:

Do the translation:
०११०९८७६

You may have to use int(s) instead of ord(s) in your actual conversion code:
०१९

Peter
 
J

Joe Strout

a, b = line.split()

Note that in a case like this, you may want to consider using
partition instead of split:

a, sep, b = line.partition(' ')

This way, if there happens to be more than one space (for example,
because the Unicode character you're mapping to happens to be a
space), it'll still work. It also better encodes the intention, which
is to split only on the first space in the line, rather than on every
space.

(It so happens I ran into exactly this issue yesterday, though my
delimiter was a colon.)

Cheers,
- Joe
 
S

Steve Holden

Joe said:
Note that in a case like this, you may want to consider using partition
instead of split:

a, sep, b = line.partition(' ')

This way, if there happens to be more than one space (for example,
because the Unicode character you're mapping to happens to be a space),
it'll still work. It also better encodes the intention, which is to
split only on the first space in the line, rather than on every space.

(It so happens I ran into exactly this issue yesterday, though my
delimiter was a colon.)
Joe:

In the special case of the None first argument (the default for the
str.split() method) runs of whitespace *are* treated as single
delimiters. So line.split() is not the same as line.split(' ').

regards
Steve
 
J

Joe Strout

In the special case of the None first argument (the default for the
str.split() method) runs of whitespace *are* treated as single
delimiters. So line.split() is not the same as line.split(' ').

Right -- so using split() gives you the wrong answer for two different
reasons. Try these:
ValueError: need more than 1 value to unpack
is some extra stuff"
ValueError: too many values to unpack

Partition handles these cases correctly (at least, within the OP's
specification that the value of "b" should be whatever comes after the
first space).

Cheers,
- Joe
 
G

Gabriel Genellina

Right -- so using split() gives you the wrong answer for two different
reasons. Try these:

ValueError: need more than 1 value to unpack

some extra stuff"
ValueError: too many values to unpack

Partition handles these cases correctly (at least, within the OP's
specification that the value of "b" should be whatever comes after the
first space).

split takes an additional argument too:

py> line = "3 x and here is some extra stuff"
py> a, b = line.split(None, 1)
py> a
'3'
py> b
'x and here is some extra stuff'

But it still fails if the line contains no spaces. partition is more
robust in those cases
 
S

Steve Holden

Joe Strout wrote:
[...]
Partition handles these cases correctly (at least, within the OP's
specification that the value of "b" should be whatever comes after the
first space).

I believe if you read the OP's post again you will see that he specified
two non-space items per line.

You really *love* being right, don't you? ;-) You say partition "...
better encodes the intention, which is to split only on the first space
in the line, rather than on every space". Your mind-reading abilities
are clearly superior to mine.

Anyway, sorry to have told you something you already knew. It's true
that partition has its place, and is too often overlooked. Particularly
by me.

regards
Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top