Changing filenames from Greeklish => Greek (subprocess complain)

  • Thread starter Íéêüëáïò Êïýñáò
  • Start date
M

Mark Lawrence

Thankls Michael,

are these two behave the same in your opinion?

sys.stdout = os.fdopen(1, 'w', encoding='utf-8')

which is what i have now
opposed to this one

import ocdecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

Which one should i keep and why?

import ocdecs?

Sums up perfectly the amount of effort you put in.

--
"Steve is going for the pink ball - and for those of you who are
watching in black and white, the pink is next to the green." Snooker
commentator 'Whispering' Ted Lowe.

Mark Lawrence
 
Í

Íéêüëáïò Êïýñáò

Ôç ÄåõôÝñá, 3 Éïõíßïõ 2013 9:46:46 ð.ì.UTC+3, ï ÷ñÞóôçò Steven D'Aprano Ýãñáøå:
If I am right, the solution is to fix the file names to ensure that they
are all valid UTF-8 names. If you view the directory containing these
files in a file browser that supports UTF-8, do you see any file names
containing Mojibake?
Fix those file names, and hopefully the problem will go away.

You are right Steven, i just renames the file 'Euxi tou Ihsou.mp3' => 'Eõ÷Þ ôïõ Éçóïý.mp3' and here is how it appears the filename directory listing via Chrome.

http://superhost.gr/data/apps/

I doesn't display the file with proper Greek characters but with *Mojibake*instead.

So, as you correctly said when files.py need to actually open that file, itcannot decode its stored byte stream from the hdd to proper 'utf-8' charset.

So, how will files.py be able to open these files then?!?!
 
R

rusi

You are right Steven, i just renames the file 'Euxi tou Ihsou.mp3' => 'Eõ÷Þ ôïõ Éçóïý.mp3' and…

Is that how you renamed your file?
In any case thats what I see!!

[Dont whether to say: Its greek to me or its not greek to me!!]
 
N

nagia.retsina

Τη ΔευτέÏα, 3 Ιουνίου 2013 3:54:30 μ.μ. UTC+3, ο χÏήστης rusi έγÏαψε:
Is that how you renamed your file?
In any case thats what I see!
[Dont whether to say: Its greek to me or its not greek to me!!]

Now! that weird again.
I rename sit using proper Greek letters but as it appears to you it also appears to me via Chrome.

Also this is how it looks like via linux cmd listing:

(e-mail address removed) [~/www/cgi-bin]# ls -l ../data/apps/
total 368548
drwxr-xr-x 2 nikos nikos 4096 Jun 3 12:07 ./
drwxr-xr-x 6 nikos nikos 4096 May 26 21:13 ../
-rwxr-xr-x 1 nikos nikos 13157283 Mar 17 12:57 100\ Mythoi\ tou\ Aiswpou.pdf*
-rwxr-xr-x 1 nikos nikos 29524686 Mar 11 18:17 Anekdotologio.exe*
-rw-r--r-- 1 nikos nikos 42413964 Jun 2 20:29 Battleship.exe
-rwxr-xr-x 1 nikos nikos 66896732 Mar 17 13:13 Kosmas\ o\ Aitwlos\ -\ Profiteies .pdf*
-rw-r--r-- 1 nikos nikos 51819750 Jun 2 20:04 Luxor\ Evolved.exe
-rw-r--r-- 1 nikos nikos 60571648 Jun 2 14:59 Monopoly.exe
-rwxr-xr-x 1 nikos nikos 1788164 Mar 14 11:31 Online\ Movie\ Player.zip*
-rw-r--r-- 1 nikos nikos 5277287 Jun 1 18:35 O\ Nomos\ tou\ Merfy\ v1-2-3..zip
-rwxr-xr-x 1 nikos nikos 16383001 Jun 22 2010 Orthodoxo\ Imerologio.exe*
-rw-r--r-- 1 nikos nikos 6084806 Jun 1 18:22 Pac-Man.exe
-rw-r--r-- 1 nikos nikos 25476584 Jun 2 19:50 Scrabble\ 2013.exe
-rw-r--r-- 1 nikos nikos 236032 Jun 2 19:31 Skepsou\ enan\ arithmo!.exe
-rwxr-xr-x 1 nikos nikos 49141166 Mar 17 12:48 To\ 1o\ mou\ vivlio\ gia\ to\ ska ki.pdf*
-rwxr-xr-x 1 nikos nikos 3298310 Mar 17 12:45 Vivlos\ gia\ Atheofovous.pdf*
-rw-r--r-- 1 nikos nikos 1764864 May 29 21:50 V-Radio\ v2.4.msi
-rw-r--r-- 1 nikos nikos 3511233 Jun 3 12:07 ΞΟ
ΟΞ�\ Ο
ΞÎΟ.mp3
(e-mail address removed) [~/www/cgi-bin]#

Why doesnt the name of the ?

file doesnt appear in proper Greek letter neither from cmd nor for Chrome too?
It's no wonder files.py cant decode it in 'utf-8'

How can i make the filanems appear properly or at least decode from byte strea,m to utf-8 properly so they can be opened via the python script withouterror?
 
S

Steven D'Aprano

(Note: this post is sent using UTF-8. If anyone reading this sees
mojibake, please make sure your email or news client is set to use UTF-8.)



Is that how you renamed your file?
In any case thats what I see!!

rusi, whatever program you are using to read these posts is buggy.

Nicholas (please excuse me ASCII-fying his name, but given that we are
discussing encoding problems, it is probably for the best) sent his post
with a header line:

charset=ISO-8859-7

If your client honoured that charset line, you would see:

Eυχή του ΙησοÏ.mp3

It looks like your client is ignoring the charset header, and
interpreting the bytes as Latin-1 when they are actually ISO-8859-7.

py> s = 'Eυχή του ΙησοÏ.mp3'
py> print(s.encode('ISO-8859-7').decode('latin-1'))
Eõ÷Þ ôïõ Éçóïý.mp3

which matches what you see. If you can manually tell your client to use
ISO-8859-7, you should see it correctly.
 
S

Steven D'Aprano

Τη ΔευτέÏα, 3 Ιουνίου 2013 9:46:46 Ï€.μ. UTC+3, ο χÏήστης Steven D'Aprano
έγÏαψε:



You are right Steven, i just renames the file 'Euxi tou Ihsou.mp3' =>
'Eυχή του ΙησοÏ.mp3' and here is how it appears the filename directory
listing via Chrome.

http://superhost.gr/data/apps/

I doesn't display the file with proper Greek characters but with
*Mojibake* instead.


Not so -- it actually shows correctly, provided you use the right
encoding. Tell your browser to view the page as UTF-8, and the file name
is displayed correctly.

By default, my browser Iceweasel views the page as Latin-1, which
displays like this:

ΕÅÇή ÄοÃ… ΙηÃοÃÂ.mp3

so the first thing you need to fix is to find some way to tell Apache to
include a UTF-8 encoding line in its automatically generated pages. Then
at least it will display correctly for visitors.

I now tentatively believe that the file names are correct, using the UTF-8
encoding. But you can help confirm this:

* What operating system are you using? If Linux, what distro and version?

* What is the output of the locale command?
 
S

Steven D'Aprano

Here is the whole code of files.py in case someone wants to comment on
somethign about how to properly encode/decode the filanames, which seems
to be the problem.

http://pastebin.com/qXasy5iU


Second line in the file says:

import cgi, re, os, sys, socket, datetime, pymysql, locale


but there is no pymysql module available. Fix that problem, and then we
can look at the next problem.
 
R

rusi

(Note: this post is sent using UTF-8. If anyone reading this sees
mojibake, please make sure your email or news client is set to use UTF-8.)




rusi, whatever program you are using to read these posts is buggy.

When you go to the python mailing list archive and look at Nikos mail
http://mail.python.org/pipermail/python-list/2013-June/648301.html
I see
<META http-equiv="Content-Type" content="text/html; charset=us-ascii">

[Not claiming to understand all this unicode stuff...]
 
S

Steven D'Aprano

When you go to the python mailing list archive and look at Nikos mail
http://mail.python.org/pipermail/python-list/2013-June/648301.html I see
<META http-equiv="Content-Type" content="text/html; charset=us-ascii">

You're looking at the encoding of the HTML page which displays only the
body and a few selected headers, copied from Nikos' post. It has no
connection to the encoding of the post itself.

If you're using Thunderbird, or some other mail/news client, there is
usually an option to View Raw Post or View Entire Message or something
similar. Use that on the original post, not the web archive.

[Not claiming to understand all this unicode stuff...]

:)


Start here:

http://www.joelonsoftware.com/articles/Unicode.html

http://nedbatchelder.com/text/unipain.html
 
N

nagia.retsina

Τη ΤÏίτη, 4 Ιουνίου 2013 1:46:53 Ï€.μ. UTC+3, ο χÏήστης Steven D'Aprano έγÏαψε:
Not so -- it actually shows correctly, provided you use the right
encoding. Tell your browser to view the page as UTF-8, and the file name
is displayed correctly.

I can't believe Chrome whcih by default uses utf8 chosed iso-8859-1 to presnt the filenames.
You were right Steven, when i explicitly told him to presnt page sin utf8 it then started to show tha filesname correctly.
I now tentatively believe that the file names are correct, using the UTF-8
encoding. But you can help confirm this:
* What operating system are you using? If Linux, what distro and version?
* What is the output of the locale command?

First of all thank you very much for being so cooperative, i appreciate it.

Here is some of my system insight you wanted to see.


(e-mail address removed) [~]# uname -a
Linux nikos.superhost.gr 2.6.32-042stab075.2 #1 SMP Tue May 14 20:38:14 MSK2013 x86_64 x86_64 x86_64 GNU/Linux

(e-mail address removed) [~]# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
(e-mail address removed) [~]#

I'am using CentOS v6.4 becaue it is the only linux OS that supports cPanel which my clients need to administer their websites.

Hese is also how the terminal presents my filenames.

(e-mail address removed) [~]# ls -l www/data/apps/
total 368548
drwxr-xr-x 2 nikos nikos 4096 Jun 3 12:07 ./
drwxr-xr-x 6 nikos nikos 4096 May 26 21:13 ../
-rwxr-xr-x 1 nikos nikos 13157283 Mar 17 12:57 100\ Mythoi\ tou\ Aiswpou.pdf*
-rwxr-xr-x 1 nikos nikos 29524686 Mar 11 18:17 Anekdotologio.exe*
-rw-r--r-- 1 nikos nikos 42413964 Jun 2 20:29 Battleship.exe
-rwxr-xr-x 1 nikos nikos 66896732 Mar 17 13:13 Kosmas\ o\ Aitwlos\ -\ Profiteies .pdf*
-rw-r--r-- 1 nikos nikos 51819750 Jun 2 20:04 Luxor\ Evolved.exe
-rw-r--r-- 1 nikos nikos 60571648 Jun 2 14:59 Monopoly.exe
-rwxr-xr-x 1 nikos nikos 1788164 Mar 14 11:31 Online\ Movie\ Player.zip*
-rw-r--r-- 1 nikos nikos 5277287 Jun 1 18:35 O\ Nomos\ tou\ Merfy\ v1-2-3..zip
-rwxr-xr-x 1 nikos nikos 16383001 Jun 22 2010 Orthodoxo\ Imerologio.exe*
-rw-r--r-- 1 nikos nikos 6084806 Jun 1 18:22 Pac-Man.exe
-rw-r--r-- 1 nikos nikos 25476584 Jun 2 19:50 Scrabble\ 2013.exe
-rw-r--r-- 1 nikos nikos 236032 Jun 2 19:31 Skepsou\ enan\ arithmo!.exe
-rwxr-xr-x 1 nikos nikos 49141166 Mar 17 12:48 To\ 1o\ mou\ vivlio\ gia\ to\ ska ki.pdf*
-rwxr-xr-x 1 nikos nikos 3298310 Mar 17 12:45 Vivlos\ gia\ Atheofovous.pdf*
-rw-r--r-- 1 nikos nikos 1764864 May 29 21:50 V-Radio\ v2.4.msi
-rw-r--r-- 1 nikos nikos 3511233 Jun 3 12:07 ΞΟ
ΟΞ�\ Ο
ΞÎΟ.mp3
(e-mail address removed) [~]#

Its wird, because as locale showed from above terminal is set to 'utf-8' but the greek filename cannot be viewed properly.
I must say though, that i have renamed it from my Windows 8 system and thenuploaded via FileZilla to my remote webhost server. Maybe windows 8 is causing this?

I'll try renaming it via terminal too.
f you want to see soemhtign else please ask me to show you Steven.
 
Í

Íéêüëáïò Êïýñáò

Τη ΤÏίτη, 4 Ιουνίου 2013 1:37:37 Ï€.μ. UTC+3, ο χÏήστης Steven D'Aprano έγÏαψε:
It looks like your client is ignoring the charset header, and
interpreting the bytes as Latin-1 when they are actually ISO-8859-7.
py> s = 'Eυχή του ΙησοÏ.mp3'
py> print(s.encode('ISO-8859-7').decode('latin-1'))
Eõ÷Þ ôïõ Éçóïý.mp3
which matches what you see. If you can manually tell your client to use
ISO-8859-7, you should see it correctly.

I think is this is the case too steven, but it suprises me that Chrome ignores the charset header.

Actually when i toild explicitly Chrome to display everythign as utf-8 it presented the filanem properly.

py> print(s.encode('ISO-8859-7').decode('latin-1'))

Why you are encoding the 's' string to greek-iso?
Isn't it by itself in greek-iso since it uses greek letters?
 
C

Chris Angelico

Ôç Ôñßôç, 4 Éïõíßïõ 2013 1:46:53 ð.ì. UTC+3, ï ÷ñÞóôçò Steven D'Aprano Ýãñáøå:


I can't believe Chrome whcih by default uses utf8 chosed iso-8859-1 to presnt the filenames.

What do you mean, "by default uses UTF-8"? Chrome uses whatever it's
told. In this case, you have no encoding specified in the page, and
your HTTP headers include:

Content-Type:text/html;charset=ISO-8859-1

I wonder what effect that'll have... I wonder.

Quit blaming Chrome for what's not its fault. (There's enough that is,
but that's true of every browser.)

ChrisA
 
N

Nobody

I can't believe Chrome whcih by default uses utf8 chosed iso-8859-1 to
presnt the filenames.

Chrome didn't choose ISO-8859-1, the server did; the HTTP response says:

Content-Type: text/html;charset=ISO-8859-1
 
Í

Íéêüëáïò Êïýñáò

Ôç Ôñßôç, 4 Éïõíßïõ 2013 10:39:08 ð.ì. UTC+3, ï ÷ñÞóôçò Nobody Ýãñáøå:
Chrome didn't choose ISO-8859-1, the server did; the HTTP response says:
Content-Type: text/html;charset=ISO-8859-1

From where do you see this: i receivf this when trying from terminal:

(e-mail address removed) [~/www/data/apps]# wget -S -O - http://www.superhost.gr

--2013-06-04 10:58:05-- http://www.superhost.gr/
Resolving www.superhost.gr... 82.211.30.133
Connecting to www.superhost.gr|82.211.30.133|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: ApacheBooster/1.6
Date: Tue, 04 Jun 2013 07:58:05 GMT
Content-Type: text/html
Connection: close
Vary: Accept-Encoding
<!--: spam
X-Cacheable: YES
X-Varnish: 2000176616 2000176615
Via: 1.1 varnish
age: 0
X-Cache: HIT
X-Cache-Hits: 1
Length: unspecified [text/html]
Saving to: âSTDOUTâ
 
Í

Íéêüëáïò Êïýñáò

Ôç Ôñßôç, 4 Éïõíßïõ 2013 10:35:31 ð.ì. UTC+3, ï ÷ñÞóôçò Chris Angelico Ýãñáøå:
What do you mean, "by default uses UTF-8"? Chrome uses whatever it's
told. In this case, you have no encoding specified in the page, and
your HTTP headers include:
Content-Type:text/html;charset=ISO-8859-1

From where do you see this Chris?
I have an encoding specified in every cgi script i use by stating this command:

print( '''Content-type: text/html; charset=utf-8\n''' )

( That is a browser directive to display python script's output using 'utf-8' charset, that is why i wonder where you Nobody see greek-iso)


and also i'm using this:


sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

(not sure what exactly it does though, but if i remove it from my cgi scipts no python3 script runs, they are all die prematurely)
 
S

Steven D'Aprano

(e-mail address removed) [~]# locale
LANG=en_US.UTF-8
[...]

Okay, this is good. This means that your system is currently using UTF-8.

Hese is also how the terminal presents my filenames. [...]
(e-mail address removed) [~]# ls -l www/data/apps/ total 368548
v2.4.msi -rw-r--r-- 1 nikos nikos 3511233 Jun 3 12:07 ΞΟ ΟΞ�\ Ο
ΞÎΟ.mp3

Weirder and weirder.

Please run these commands, and show what result they give:

alias ls

printf %q\\n *.mp3

ls -b *.mp3


I'll try renaming it via terminal too. f you want to see soemhtign else
please ask me to show you Steven.


If all else fails, you could just rename the troublesome file and
hopefully the problem will go away:

mv *Ο.mp3 1.mp3
mv 1.mp3 Eυχή του ΙησοÏ.mp3
 
Í

Íéêüëáïò Êïýñáò

Τη ΤÏίτη, 4 Ιουνίου 2013 11:47:01 Ï€.μ. UTC+3, ο χÏήστης Steven D'Aprano έγÏαψε:
Please run these commands, and show what result they give:

(e-mail address removed) [~/www/data/apps]# ls -l *.mp3
-rw-r--r-- 1 nikos nikos 3511233 Jun 3 12:07 \305\365\367\336\ \364\357\365\ \311\347\363\357\375\375.mp3
-rw-r--r-- 1 nikos nikos 3511233 Jun 4 11:54 ΞΟ
ΟΞ�\ Ο
ΞÎΟ.mp3

(e-mail address removed) [~/www/data/apps]# alias ls
alias ls='/bin/ls $LS_OPTIONS'

(e-mail address removed) [~/www/data/apps]# printf %q\n\n *.mp3
$'\305\365\367\336 \364\357\365 \311\347\363\357\375\375.mp3'nn$'\316\225\317\205\317\207\316\256 \317\204\316\277\317\205 \316\231\316\267\317\203\316\277\317\215.mp3'(e-mail address removed) [~/www/data/apps]# ls -b *.mp3
\305\365\367\336\ \364\357\365\ \311\347\363\357\375\375.mp3 ΞΟ
ΟΞ�\ Ο
ΞÎΟ.mp3

please explain what this comamnd does.

I deliberately placed the same .mp3 file twice.

The first is after renaming it to greek chars and uploaded from within my Win8 machine via FileZilla to the webhost server
The latter after renaming the file from within the remote linux machine.

Seems that the way the system used to actually rename the file matters.

If all else fails, you could just rename the troublesome file and
hopefully the problem will go away:
mv *Ο.mp3 1.mp3
mv 1.mp3 Eυχή του ΙησοÏ.mp3

Yes, but why you are doing it it 2 steps and not as:

mv *Ο.mp3 'Eυχή του ΙησοÏ.mp3'
 
N

Nobody

Τη ΤÏίτη, 4 Ιουνίου 2013 10:39:08 Ï€.μ. UTC+3, ο
χÏήστης Nobody έγÏαψε:


From where do you see this

$ wget -S -O - http://superhost.gr/data/apps/
--2013-06-04 14:00:10-- http://superhost.gr/data/apps/
Resolving superhost.gr... 82.211.30.133
Connecting to superhost.gr|82.211.30.133|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: ApacheBooster/1.6
Date: Tue, 04 Jun 2013 13:00:19 GMT
Content-Type: text/html;charset=ISO-8859-1
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
X-Cacheable: YES
X-Varnish: 2000177813
Via: 1.1 varnish
age: 0
X-Cache: MISS
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top