Where to locate existing standard encodings in python

N

News123

Hi,

I was googling quite some time before finding the answer to my question:
'what are the names for the encodings supported by python?'

I found the answer at http://python.active-venture.com/lib/node127.html


Now my question:

Can I find the same info in the standard python doc or query python with
a certain command to print out all existing codings?

thanks in advance for your answer and bye


N
 
J

John Machin

Hi,

I was googling quite some time before finding the answer to my question:
'what are the names for the encodings supported by python?'

I found the answer athttp://python.active-venture.com/lib/node127.html

Now my question:

Can I find the same info in the standard python doc or query python with
a certain command to print out all existing codings?

codecs module
http://docs.python.org/library/codecs.html#id3
 
P

Philip Semanchuk

Hi,

I was googling quite some time before finding the answer to my
question:
'what are the names for the encodings supported by python?'

I found the answer at http://python.active-venture.com/lib/
node127.html


Now my question:

Can I find the same info in the standard python doc or query python
with
a certain command to print out all existing codings?


Look under the heading "Standard Encodings":
http://docs.python.org/library/codecs.html

Note that both the page you found (which appears to be a copy of the
Python documentation) and the reference I provide say, "Neither the
list of aliases nor the list of languages is meant to be exhaustive".

I guess one reason for this is that different Python implementations
could choose to offer codecs for additional encodings.
 
N

News123

Hi Philip,

Your answer touches exaclty one point, which I was slightly afraid of:
- The list is not exhaustive
- python versions might have implemented different codecs.

This is why I wondered whether there's any way of querying python for a
list of codecs it supports.

thanks again for your and the other answers


bye


N
 
P

Philip Semanchuk

Hi Philip,

Your answer touches exaclty one point, which I was slightly afraid of:
- The list is not exhaustive
- python versions might have implemented different codecs.

This is why I wondered whether there's any way of querying python
for a
list of codecs it supports.

Try this:
Python 2.5.1 (r251:54863, Nov 17 2007, 21:19:53)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

"aliases" in the encodings.aliases module is a dict mapping alias
names (the dict keys) to encodings (the dict values). Thus, this will
give you the list of supported encodings:

The encodings module isn't in the documentation (?!?); I found it when
looking through the Python source code. For that reason I can't say
more about how it works. You may want to experiment to see if
encodings added via codecs.register() show up in the
encodings.aliases.aliases dict.


Have fun
Philip
 
N

News123

Hi Philip,

Thanks for your answer:
The fact, that a module 'encodings' exists was new to me.


encodings.aliases.aliases has however one problem.
It helps to locate all encoding aliases, but it won't find entries for
which no aliases exist:

So I can find koi8_r and its aliases

[ [k,v] for k,v in encodings.aliases.aliases.iteritems() \
if v.find('koi')> -1 ]
[['cskoi8r', 'koi8_r']]
However I wouuldn't find the greek code page 'cp737' as it exists
without an alias


[ [k,v] for k,v in encodings.aliases.aliases.iteritems() \
if v.find('737')> -1 ]


What gives me a list of quite some encodings on my host is the shell command
ls /usr/lib/python2.5/encodings | sed -n 's/\.py$//p' | sort
(soma false hits, bit this is fine for me purposes)


I don't know if really all encodings are represented with a .py file and
if all encodigns have to be in this directory, but it's a start.

Using shell commands is not that pythonic:

I could try to rewrite this in python by
1.) determine from which directory encodings was imported and
then using the glob module to list all .py files located there.


thanks again and bye


N

Philip said:
Hi Philip,

Your answer touches exaclty one point, which I was slightly afraid of:
- The list is not exhaustive
- python versions might have implemented different codecs.

This is why I wondered whether there's any way of querying python for a
list of codecs it supports.

Try this:
Python 2.5.1 (r251:54863, Nov 17 2007, 21:19:53)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

"aliases" in the encodings.aliases module is a dict mapping alias names
(the dict keys) to encodings (the dict values). Thus, this will give you
the list of supported encodings:

The encodings module isn't in the documentation (?!?); I found it when
looking through the Python source code. For that reason I can't say more
about how it works. You may want to experiment to see if encodings added
via codecs.register() show up in the encodings.aliases.aliases dict.


Have fun
Philip
 
P

Philip Semanchuk

Hi Philip,

Thanks for your answer:
The fact, that a module 'encodings' exists was new to me.

We both learned something new today. =)

encodings.aliases.aliases has however one problem.
It helps to locate all encoding aliases, but it won't find entries for
which no aliases exist:

Ooops, I hadn't thought about that.

What gives me a list of quite some encodings on my host is the shell
command
ls /usr/lib/python2.5/encodings | sed -n 's/\.py$//p' | sort
(soma false hits, bit this is fine for me purposes)

I don't know if really all encodings are represented with a .py file
and
if all encodigns have to be in this directory, but it's a start.


Using shell commands is not that pythonic:

I could try to rewrite this in python by
1.) determine from which directory encodings was imported and
then using the glob module to list all .py files located there.

Yes, I'd thought about this but I agree with you that it seems
unpythonic and fragile. Unfortunately I can't think of anything better
at this point.

Good luck
Philip

Philip said:
Hi Philip,

Your answer touches exaclty one point, which I was slightly afraid
of:
- The list is not exhaustive
- python versions might have implemented different codecs.

This is why I wondered whether there's any way of querying python
for a
list of codecs it supports.

Try this:
Python 2.5.1 (r251:54863, Nov 17 2007, 21:19:53)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more
information.
import encodings.aliases

encodings.aliases.aliases


"aliases" in the encodings.aliases module is a dict mapping alias
names
(the dict keys) to encodings (the dict values). Thus, this will
give you
the list of supported encodings:
set(encodings.aliases.aliases.values())


The encodings module isn't in the documentation (?!?); I found it
when
looking through the Python source code. For that reason I can't say
more
about how it works. You may want to experiment to see if encodings
added
via codecs.register() show up in the encodings.aliases.aliases dict.


Have fun
Philip


Philip Semanchuk wrote:

On Nov 9, 2008, at 7:00 PM, News123 wrote:

Hi,

I was googling quite some time before finding the answer to my
question:
'what are the names for the encodings supported by python?'

I found the answer at http://python.active-venture.com/lib/node127.html


Now my question:

Can I find the same info in the standard python doc or query
python
with
a certain command to print out all existing codings?


Look under the heading "Standard Encodings":
http://docs.python.org/library/codecs.html

Note that both the page you found (which appears to be a copy of
the
Python documentation) and the reference I provide say, "Neither
the list
of aliases nor the list of languages is meant to be exhaustive".

I guess one reason for this is that different Python
implementations
could choose to offer codecs for additional encodings.
 
J

John Machin

Hi,

I was googling quite some time before finding the answer to my question:
'what are the names for the encodings supported by python?'

I found the answer athttp://python.active-venture.com/lib/node127.html

Now my question:

Can I find the same info in the standard python doc or query python with
a certain command to print out all existing codings?

thanks in advance for your answer and bye

N

You haven't explained why you think that you *need* a list of all
encodings that exist at a point in time. What are you going to do with
the list? Surely not use it to validate user input, one would hope.
 
R

rurpy

We both learned something new today. =)


Ooops, I hadn't thought about that.





Yes, I'd thought about this but I agree with you that it seems
unpythonic and fragile. Unfortunately I can't think of anything better
at this point.

Good luck
Philip
....snip...

If it's of any help, in a post on 2007-07-22 by Peter Otten,
(though I can't get a url for it at the moment) he took the
same approach. From a saved copy of that post:

import encodings
import os
import glob

def encodings_from_modulenames():
ef = os.path.dirname(encodings.__file__)
for fn in glob.glob(os.path.join(ef, "*.py")):
fn = os.path.basename(fn)
yield os.path.splitext(fn)[0]
 
T

Tim Chase

You haven't explained why you think that you *need* a list of all
encodings that exist at a point in time. What are you going to do with
the list?

Just because I ran into this recently, the Dilbert.com site
returns a bogus Content-type header with

Content-Type: text/html; charset=utf-8lias

For Python to parse this, I had to use Python's list of known
encodings in order to determine whether I could even parse the
site (for passing it to a string's .encode() method). (Aside:
stupid dilbert.com site developers...what sorta rubbish is
"utf-8lias"?! It's not like it's something that would appear
accidentally. And there were bogus characters in the document to
boot)

-tkc
 
J

John Machin

Just because I ran into this recently, the Dilbert.com site returns a
bogus Content-type header with

Content-Type: text/html; charset=utf-8lias

For Python to parse this, I had to use Python's list of known encodings
in order to determine whether I could even parse the site (for passing
it to a string's .encode() method).

You haven't said why you think you need a list of known encodings!

I would have thought that just trying it on some dummy data will let you
determine very quickly whether the alleged encoding is supported by the
Python version etc that you are using.

E.g.

| >>> alleged_encoding = "utf-8lias"
| >>> "any old ascii".decode(alleged_encoding)
| Traceback (most recent call last):
| File "<stdin>", line 1, in <module>
| LookupError: unknown encoding: utf-8lias
| >>>
 
T

Tim Chase

Content-Type: text/html; charset=utf-8lias
You haven't said why you think you need a list of known encodings!

I would have thought that just trying it on some dummy data will let you
determine very quickly whether the alleged encoding is supported by the
Python version etc that you are using.

E.g.

| >>> alleged_encoding = "utf-8lias"
| >>> "any old ascii".decode(alleged_encoding)
| Traceback (most recent call last):
| File "<stdin>", line 1, in <module>
| LookupError: unknown encoding: utf-8lias

I then try to remap the bogus encoding to one it seems most like
(in this case, utf-8) and retry. Having a list of encodings
allows me to either eyeball or define a heuristic to say "this is
the closest match...try this one instead". That mapping can then
be used to update a mapping file so I don't have to think about
it the next time I encounter the same bogus encoding.

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top