Inconsistent results from (dos)glob

  • Thread starter Theo van den Heuvel
  • Start date
I

Ilya Zakharevich

That's impossible. All the syscalls dealing with filenames treat "\0" as
a terminator. There is no way to create or access a file with a null in
its name[1].

It is not a problem to create or access a file with a null in its name
on Unix. (Remember read()/write() syscalls?)

read and write syscalls do not create or access files. Unless you are
talking about opening the block device and reading from/writing to that.

Either that, or doing read()/write() with a directory.
But in that case you aren't "accessing files with a null in its name",
you are just accessing one huge file the size of your (logical) disk.

Yes I do. As your "filesystem driver" would eventually do as when you
ask for

mv foo bar

Hope this helps,
Ilya
 
I

Ilya Zakharevich

Yes. Jan Dubois is trying quite hard to get perl there without breaking
anything. The problem is that naive programs that read a filename from
the console (or the command-line) will then break, because the name will
be in the current 'ANSI' encoding

No it won't. AFAIK, arguments to a program are available in Unicode.

Yours,
Ilya
 
I

Ilya Zakharevich

You can get Unicode arguments by using wmain, or,
presumably, other more Win32ish entry points, but that doesn't help:
???

existing Perl programs that expect @ARGV to contain bytes in the current
code page will be broken.

Perl strings contain characters, not bytes. Hence

a) such programs are already broken; there is no need to support
such programs in the default configuration.
$ENV{PERL_ARGV_IN_CP} and -Margv_in_cp should be enough to handle this.

b) More often then not, Perl programs which did not work before
would "magically start working". This is in itself an incentive...
As I said, this is a known problem, the
solution is known, and people are working to get perl there without
breaking too much along the way.

Good to hear this,
Ilya
 
I

Ilya Zakharevich

Not true: ....
you could argue that $bytes contains
'characters that can be interpreted as representing the equivalent
bytes'
Sure.

the fact remains that it is common in perl for
a string to contain bytes representing characters in some encoding other
than the internal one. Any data you read from a :raw filehandle, for
instance.

And, in these situations, one should be ready to handle them "specially".
This is why I wrote the following:

Yours,
Ilya
 
I

Ilya Zakharevich

I disagree. It's not usual for perl to 'decode' data that comes in from
the outside world until you ask it to. (Apart from anything else, it's
not necessarily a lossless operation.)

Sure. So why do you insist in LOSSY conversion to the current
codepage as the default semantic?
If you want your @ARGV decoded from the current codepage,

It is the opposite. What do you think is @ARGV when you type

perl -wde0 ARGS

where ARGS is not representable in the current codepage?
This is exactly the same as the situation on Unix:

I'm afraid I consider this as BS. Unix has no defined semantic of
encoding of arguments to the programs (OS X is a possible exception;
definitely an exception wrt filenames). AFAIK, Windows has.

Hope this helps,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top