unicodedata name for \u000a

K

Ken Beesley

Newbie question: on unicodedata.name


If I do

import unicodedata
unicodedata.name(u"a")
or
unicodedata.name(u"\u0061")

I get
'LATIN SMALL LETTER A"

as expected; but when I follow that with

unicodedata.name(u"\u000a")

I get

Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: no such name

There is, of course, a Unicode name for \u000a,
which is 'LINE FEED' or perhaps 'LINE FEED (A)'.

Is there a gap in unicodedata? or in my understanding?

Thanks,

Ken
 
C

Christos TZOTZIOY Georgiou

[snip]
unicodedata.name(u"\u000a")

I get

Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: no such name

There is, of course, a Unicode name for \u000a,
which is 'LINE FEED' or perhaps 'LINE FEED (A)'.

Is there a gap in unicodedata? or in my understanding?

It seems that all control characters (u"\u0000" to u"\u001f") have no
names in unicodedata. Don't know if this is an omission (ie bug) or
intentional.
 
T

Tor Iver Wilhelmsen

Peter Kleiweg said:

Quoting that document:

Alias names are those for ISO/IEC 6429:1992.
Commonly used alternative aliases are also shown.

000A LF <control>
= LINE FEED (LF)

So the authors of unicodedata.name() could have picked either
'<control>', the ASCII name 'LF' or the alternative 'LINE FEED (LF)'.
Not picking any of them seems strange, and as the OP pointed out,
leads to an error even though the "C0 Controls" part of that page *is*
part of Unicode.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Tor said:
000A LF <control>
= LINE FEED (LF)

So the authors of unicodedata.name() could have picked either
'<control>', the ASCII name 'LF' or the alternative 'LINE FEED (LF)'.

No. <control> is not a character name. The unicodedata.name function
returns the official character name, so it MUST NOT return an alias
(which rules out your second alternative).
Not picking any of them seems strange, and as the OP pointed out,
leads to an error even though the "C0 Controls" part of that page *is*
part of Unicode.

Yes. However, this strangeness originates from the Unicode
specification. Control characters simply do not have a name.

If you want to know whether a code point is an unassigned character,
check whether unicodedata.type is "Cn".

Regards,
Martin
 
T

Tor Iver Wilhelmsen

Martin v. Löwis said:
No. <control> is not a character name. The unicodedata.name function
returns the official character name, so it MUST NOT return an alias
(which rules out your second alternative).

Then why not return None or the empty string instead of raising an
exception?
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Tor said:
Then why not return None or the empty string instead of raising an
exception?

Why does a dictionary lookup raise a KeyError instead of returning
None or an empty exception? It's easy enough to add a function that
does what you want:

def name(c):
try:
return unicodedata.name
except ValueError:
return None

Python reports failures through exceptions, not through special
return values. It might have been an option initially to return
None. Now, it cannot be changed for backwards compatibility.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top