PB with euro sign and checkbox in multipart/form-data

Y

Yohan N. Leder

Hi,

Hoping it will match the alt.html group, because already tried in
comp.lang.perl.misc but it seems to be more related to browser and
multipart/form-data posting.

Well, what do you think about the pb explained in this test script
called
form2dump.pl :

#!/usr/bin/perl -w
# Script written to solve the bug explained below :
# PB : € sign in any form field corrupt beginning of multipart/form-data
# in STDIN (1st lines with boundary & 1st field declar truncated)
# CAUSE : checkbox without any value (uncheckd) cause this pb
# - without <input type='checkbox' name='chk'>, it works
# - with <input type='checkbox' name='chk'> checked, it works
# NB : strange because an unchecked box shouldn't be sent !
# IDEA : I've tried to provide an hidden field with same name as
# checkbox which would submit an 'off' value when checkbox is
# unchecked, but both values are sent when checkbox is checked
# SOL : ?

print "Content-type: text/html; charset=iso-8859-1\n\n";
if ($ENV{'QUERY_STRING'} =~ /add/)
{
read STDIN, my $buff, $ENV{'CONTENT_LENGTH'};
print "<b>Multipart/form-data (ok because no binary data inside)
</b><hr>$buff";
}
else
{
print <<FORM;
<form action='/cgi-bin/form2dump.pl?add'
method='post' enctype='multipart/form-data' accept-charset='iso-8859-
1'>
<input type='text' name='txt1'><br>
<input type='text' name='txt2'><br>
<input type='text' name='txt3'><br>
<input type='text' name='txt4'><br>
<input type='text' name='txt5'><br>
<input type='submit'>
<input type='checkbox' name='chk' value='on'>
</form>
FORM
}
exit 0;
 
A

Alan J. Flavell

ISO-8859-1 doesn't include a euro sign.

Why would that matter? &euro; works well, across a wide range of
browsers, new and old.
Try ISO-8859-15 instead.

Oh no. There is really NO point in coding HTML in iso-8859-15.
Browsers were already supporting utf-8 fairly well, before support for
8859-15 was introduced. I really could not advise using 8859-15 to
code web pages.

Its use for coding *plain* text is a different matter, for sure.

http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist#NoteWin

regards
 
J

Jukka K. Korpela

Yohan N. Leder said:
Hoping it will match the alt.html group, because already tried in
comp.lang.perl.misc but it seems to be more related to browser and
multipart/form-data posting.

Why didn't you summarize which answers you got there?

Is there any reason to think that your problem is the least connected with
how the form is _generated_ (e.g., Perl code)? That is, did you even try
what happens if you simply use a static HTML document containing the form
that the script generates? You would then could have posted the URL of that
document, so that we would have a simpler manifestation of your problem.
Well, what do you think about the pb explained

y do u use silly abbrs? It saves a few seconds of your time and wastes other
people's time when they try to decipher your private codes. pb = problem
ain't no std abbr.
in this test script called form2dump.pl :

Your script name is irrelevant. What would matter is an absolute URL that
would let us see the problem in action.

Describing your _problem_ in program code comments (in sloppy style) is not
a good approach. You are not helping us to help you.
# Script written to solve the bug explained below :

Huh? How is the script supposed to solve "the bug"? And why the singular,
when you clearly have two problems?
# PB : ? sign in any form field corrupt beginning of
multipart/form-data

Which "? sign". Your Usenet message does not declare its character encoding,
thereby implying ASCII, so you cannot insert the euro sign there, as you
probably tried (guessing from the Subject line).

The real problem is that there is no specification of what happens when the
user types in a character that cannot be represented in the character
encoding used for the form, which is the same as the encoding of the page
(note that browsers ignore accept-charset attributes). When the encoding is
iso-8859-1 and the user types in the euro sign, the browser might (for
example) ignore it or - strangely, but perhaps usefully in some cases -
represent it as an entity reference &euro; or some other way. Anyway, it is
an error condition with no prescribed error processing.

The lesson is that using iso-8859-15 instead, in addition to being a wrong
move in general as Alan explained, would not help against all _other_
characters that people may enter, even if it "worked" in some circumstances.
You cannot prevent people from entering arbitrary data through your form;
you can just process it the best you can.

If you expect "any characters", then the logical move is to use utf-8.
Naturally, your form handler then needs to be able to process utf-8 encoded
data. In practice, you need a suitable library module for the job.
# in STDIN (1st lines with boundary & 1st
field declar truncated) # CAUSE : checkbox without any value
(uncheckd) cause this pb # - without <input type='checkbox'
name='chk'>, it works # - with <input type='checkbox' name='chk'>
checked, it works # NB : strange because an unchecked box shouldn't
be sent ! # IDEA : I've tried to provide an hidden field with same
name as # checkbox which would submit an 'off' value when
checkbox is # unchecked, but both values are sent when checkbox
is checked # SOL : ?

Apparently my newsreader got wild when quoting your program code commens.
I'm not going to fix it.

You're telling that "it works" both ways, whether the checkbox is checked or
not. You are not telling why it is a problem that it works. Neither are you
telling what you really mean by "it works" and how we can decide whether "it
works" or not.

However, from past experience with similar-sounding problems, I suppose you
have just not understood how checkboxes work in HTML form data processing.
When a checkbox is checked upon submission, a name=value pair is generated;
if it is not, no such pair is generated. This is how things were designed to
work; live with it. This means in practice that your form handler needs to
check for the _presence_ of a name=value pair with the name of the checkbox,
and treat its absence as indicating that the checkbox was not checked.
 
A

Alan J. Flavell

The lesson is that using iso-8859-15 instead, in addition to being a
wrong move in general as Alan explained, would not help against all
_other_ characters that people may enter, even if it "worked" in
some circumstances.

I was following-up to a posting which hadn't mentioned that this was a
form submission question, so my initial answer could have been a bit
off-beam.

But, now that I know it's a form submission question, my advice to use
utf-8 is much stronger. Pretty much any currently used browser will
support utf-8 form submission nowadays. The last browser of any
widespread use to cause problems was NN4, and (to the best of my
recollection) that browser would not perform any better with
iso-8859-15 anyway. (Windows-1252 perhaps, but I would not recommend
going that way!).

Worse, NN4 claimed in its Accept-charset to be capable of rendering
utf-8, so the obvious strategem of doing content negotiation on the
browser's Accept-charset is ruled out. In fact, NN4 pretty much
*could* render utf-8 as claimed - but in forms submission it submitted
crap.

Anyhow, the search engines such as google, which in earlier times
supported a wide range of different form submission encodings, have
been using utf-8 as their standard for several years now, so it's
evident that they've concluded this is a viable way to proceed.
I'd be happy to go along with that now.
You cannot prevent people from entering arbitrary data through your
form; you can just process it the best you can.
Absolutely...

If you expect "any characters", then the logical move is to use
utf-8. Naturally, your form handler then needs to be able to process
utf-8 encoded data. In practice, you need a suitable library module
for the job.

Agreed, this is the way to go for all practical purposes - hand
knitted code for this job is sometimes useful for diagnostics, but
for production one should use well-tested libraries/modules.

regards

Oh, http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html
 
Y

Yohan N. Leder

Why didn't you summarize which answers you got there?

Better than a summary, which would be false, by design, here is the url
: said:
y do u use silly abbrs? It saves a few seconds of your time and wastes other
people's time when they try to decipher your private codes. pb = problem
ain't no std abbr.

Sorry, but sometimes, you've not any time and have to try taking some
shortcuts... However, "pb" is a well known abbreviation in French and
sorry again to didn't have translated what is natural for me and, maybe,
I can't known it's not for an English native man.
Your script name is irrelevant. What would matter is an absolute URL that
would let us see the problem in action.

form2dump means "it's a form submission for which I'm observing what is
received by the server". Also, in a first version of this test script I
did "dumped" the "multipart/form-data" content toward a server file...
Later, I've rewritten this part to get it on screen (i.e. client area of
client browser) for facility and because this multipart/form-data
doesn't contains any file upload (binary).
Huh? How is the script supposed to solve "the bug"? And why the singular,
when you clearly have two problems?

No, I've only one problem : "euro sign in any form field corrupt
beginning of sent multipart/form-data (in detail : first lines
containing boundary and declaration of the first field are truncated)"
Which "? sign". Your Usenet message does not declare its character encoding,
thereby implying ASCII, so you cannot insert the euro sign there, as you
probably tried (guessing from the Subject line).

Sorry about character encoding, but I'm using the newsreader called
"MicroPlanet Gravity 2.5" and I don't find any option about "character
encoding" in this release. Taking care of your message, I've searched a
little on the web and it seems that the only Gravity-like program which
provide something about character encoding is an unofficial release
called "Super Gravity" : <http://www.usenet-fr.net/fur/minis-
faqs/accents.html>. I'll take a look at it.

However, the sign I told about was the "euro sign" which appeared as
interrogation point in your newsreader.
The real problem is that there is no specification of what happens when the
user types in a character that cannot be represented in the character
encoding used for the form, which is the same as the encoding of the page
(note that browsers ignore accept-charset attributes).

Nevertheless, when I'm trying to submit a form with "accept-
charset='utf-8'" in an HTML page which has a content-type indicating a
character set as "charset=iso-8859-1", the fields data are well
transmitted in an UTF-8 format.
When the encoding is
iso-8859-1 and the user types in the euro sign, the browser might (for
example) ignore it or - strangely, but perhaps usefully in some cases -
represent it as an entity reference &euro; or some other way. Anyway, it is
an error condition with no prescribed error processing.

Considering the station on which I've done my own test, it's not what
I've seen. Don't no the reason why, but here is my experience : if the
HTML page containing the form has a content-type indicating "iso-8859-
1", if there's not any checkbox in the form, when I'm typing the euro
sign from an Azerty keyboard using the graphic 'Alt' key in combination
with the 'e' one, it well apperas in the form field and is well
transmitted to the server (the euro sign is well present at the arrival
; in STDIN using my test script).

However, you said you would prefer something inline for testing. So,
I've done it and here it is : <>.

Also, I'm rewriting an explanation of the problem for which I'm
searching for a solution : "euro sign in any form field corrupt
beginning of sent multipart/form-data (in detail : first lines
containing boundary and declaration of the first field are truncated".

And to finish : of course, I could use UTF-8, but there's several reason
which "brake" me (some being about Perl, because I've found the problem
I'm talking about during writing of a Perl script) :

- Some target servers are using Perl 5.00503 under FreeBSD and there's
nothing about UTF-8 encoding/decoding in the stock modules of this
release.

- On those old servers, stock Perl modules only are authorized, even in
personal /cgi-bin directory. I'm aware it's a big constraint, but I've
not any way to change the decision about that : we have to do with this!

- HTML forms generated by the Perl scripts must be able to handle all
which may be usually tped in English and French language, including euro
sign.

- These Perl scripts contain a configurable part where different persons
(some being not developers) will be able to change some strings (stored
as constants using the Perl syntax : "use constant NAMEOFCONSTANT =>
"The string people can write, rewrite and manage by themself as if it
was a configuration feature";"), and we can't ask them to type character
entity rather than special or accentuated characters when there will be
ones (e.g. &agrave;, etc). So, if we would choose to use UTF-8, we
should, in the same time, find a way (without external module) to encode
these "configurable strings" prior to display them in any browser (i.e
write our own function).

Hoping to have been more accurate this time ;-)
 
J

Jukka K. Korpela

Yohan N. Leder said:
Better than a summary, which would be false, by design, here is the
url

Why would a summary be false? If _you_ did not understand the answers well
enough to summarize them for us, is each of us expected to read through
them.
Sorry, but sometimes, you've not any time and have to try taking some
shortcuts... However, "pb" is a well known abbreviation in French

You were already informed about the unsuitability of such jargon in the
discussion you refer to, and _yet_ you kept using it. A less mild-mannered
man than I am would lose patience here.
form2dump means "it's a form submission for which I'm observing what
is received by the server".

No, it's just the name you gave.
No, I've only one problem : "euro sign in any form field corrupt
beginning of sent multipart/form-data (in detail : first lines
containing boundary and declaration of the first field are truncated)"

You managed to give the impression of two distinct problems. Whether the
euro sign and the checkbox are related remains to be seen.

Next time, please start from a simple prose description of what you wanted
to achieve, exactly how it failed, and what's the URL that lets other see
it.
Nevertheless, when I'm trying to submit a form with "accept-
charset='utf-8'" in an HTML page which has a content-type indicating a
character set as "charset=iso-8859-1", the fields data are well
transmitted in an UTF-8 format.

This was new to me. Apparently IE 6 and IE 7 beta (at least) seem to honor
the accept-charset attribute to some extent, though not to the extent of
actually declaring the encoding in the form data set.

I don't think this changes the big picture, though. If you ask for
iso-8859-1 data transmission, as you explicitly do, you cannot really blame
anyone else when characters outside the iso-8859-1 repertoire cause some
trouble.
Considering the station on which I've done my own test, it's not what
I've seen.

What you have seen is one particular error processing. It does not disprove
the statement that you have created an error condition.
Also, I'm rewriting an explanation of the problem for which I'm
searching for a solution : "euro sign in any form field corrupt
beginning of sent multipart/form-data (in detail : first lines
containing boundary and declaration of the first field are truncated".

Again, you are complaining about error processing in a situation where no
particular error processing is required by the specifications.

Besides, I was unable to observe the problem you describe. Your code for
dumping raw data doesn't produce particularly readable output (I don't see
line breaks).
And to finish : of course, I could use UTF-8,

Well, that would be the solution, apparently. How you would implement it
depends on your authoring environment.
 
A

Alan J. Flavell

A few years back, there were some reports of bizarre things happening
in IE when a euro character was pasted into an iso-8859-1 form. Now
that I've had time to look at this thread, I'm starting to think that
this might be something similar.

It's mentioned (as of dates in 2002 and 2004) in my writeup at
http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html#iefurther

But I'm afraid although my page is in English, most or all of that
cited discussion will be in German, and I don't know whether the
original poster can read that.

In any case, if we conclude - as we've all said before - that it's a
better approach to use utf-8 for forms submission, then the problem
goes away by itself, and there's no need to understand which versions
of IE are defective or just what they are getting wrong in this
regard.

Hope this helps a bit.
 
Y

Yohan N. Leder

Besides, I was unable to observe the problem you describe. Your code for
dumping raw data doesn't produce particularly readable output (I don't see
line breaks).

As you said yourself, it's raw data and line break are not HTML line
breaks (ie. <br>). However, I could change every line break to <br>
before displaying, but it doesn't change anything about the problem.

Apparently you didn't seen anything wrong on your side... Then, it means
your particular plateform (browser, os) doesn't fall in this issue while
others do. What's your browser and operating system ?

Also, you said : "Again, you are complaining about error processing in a
situation where no particular error processing is required by the
specifications."... Right, because, effectively an euro sign is not
supposed to be processed in iso-8859-1. But wrong too, because in my
example, the transmitted data, when checkbox is checked, includes the
euro sign : strange... As stated by Alan J. F. elsewhere in the thread :
it seems to be an old well known bug.

And, to finish, it's always interesting to test everything as final user
will do : entering everything, even what was not foreseen by the
programmer... And the fact is that, even if I've choosen an iso-8859-1
charset, a French user using an Azerty keyboard is able (and encouraged
because the sign is printed on the key) to enter the euro sign...

So, in this case and, again, because the content-type charset was iso-
8859-1, I expected that the euro sign was striiped out and the rest of
the data well transmitted : but it's appently not the case from every
client : it's a problem we can call a bug !
 
N

Neredbojias

To further the education of mankind, "Jukka K. Korpela"
You were already informed about the unsuitability of such jargon in
the discussion you refer to, and _yet_ you kept using it. A less
mild-mannered man than I am would lose patience here.

And they said you had no sense of humor...
 
J

Jukka K. Korpela

Yohan N. Leder said:
As you said yourself, it's raw data and line break are not HTML line
breaks (ie. <br>).

Rendering raw data without showing line breaks isn't logical.
However, I could change every line break to <br>
before displaying, but it doesn't change anything about the problem.

It would make the problem easier to see.
And, to finish, it's always interesting to test everything as final
user will do : entering everything, even what was not foreseen by the
programmer...

Yes, but we already know that error conditions will arise then, so why not
concentrate in preventing such conditions or handling them properly, rather
than asking why some particular browser handles it some particular way.
So, in this case and, again, because the content-type charset was iso-
8859-1, I expected that the euro sign was striiped out and the rest of
the data well transmitted :

There was no ground for such expectations. Luckily it didn't happen, since
you might think "it works" and fail to know that it doesn't work on other
browsers.
but it's appently not the case from every
client : it's a problem we can call a bug !

No, it is error handling upon which no requirements have been made. I though
this already became clear.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top