PB with euro sign and checkbox in multipart/form-data

Discussion in 'HTML' started by Yohan N. Leder, May 18, 2006.

  1. Hi,

    Hoping it will match the alt.html group, because already tried in
    comp.lang.perl.misc but it seems to be more related to browser and
    multipart/form-data posting.

    Well, what do you think about the pb explained in this test script
    called
    form2dump.pl :

    #!/usr/bin/perl -w
    # Script written to solve the bug explained below :
    # PB : € sign in any form field corrupt beginning of multipart/form-data
    # in STDIN (1st lines with boundary & 1st field declar truncated)
    # CAUSE : checkbox without any value (uncheckd) cause this pb
    # - without <input type='checkbox' name='chk'>, it works
    # - with <input type='checkbox' name='chk'> checked, it works
    # NB : strange because an unchecked box shouldn't be sent !
    # IDEA : I've tried to provide an hidden field with same name as
    # checkbox which would submit an 'off' value when checkbox is
    # unchecked, but both values are sent when checkbox is checked
    # SOL : ?

    print "Content-type: text/html; charset=iso-8859-1\n\n";
    if ($ENV{'QUERY_STRING'} =~ /add/)
    {
    read STDIN, my $buff, $ENV{'CONTENT_LENGTH'};
    print "<b>Multipart/form-data (ok because no binary data inside)
    </b><hr>$buff";
    }
    else
    {
    print <<FORM;
    <form action='/cgi-bin/form2dump.pl?add'
    method='post' enctype='multipart/form-data' accept-charset='iso-8859-
    1'>
    <input type='text' name='txt1'><br>
    <input type='text' name='txt2'><br>
    <input type='text' name='txt3'><br>
    <input type='text' name='txt4'><br>
    <input type='text' name='txt5'><br>
    <input type='submit'>
    <input type='checkbox' name='chk' value='on'>
    </form>
    FORM
    }
    exit 0;
     
    Yohan N. Leder, May 18, 2006
    #1
    1. Advertising

  2. Yohan N. Leder

    Toby Inkster Guest

    Yohan N. Leder wrote:

    > print "Content-type: text/html; charset=iso-8859-1\n\n";


    ISO-8859-1 doesn't include a euro sign. Try ISO-8859-15 instead.

    --
    Toby A Inkster BSc (Hons) ARCS
    Contact Me ~ http://tobyinkster.co.uk/contact
     
    Toby Inkster, May 18, 2006
    #2
    1. Advertising

  3. On Thu, 18 May 2006, Toby Inkster wrote, quoting a page that I'm not
    seeing here:

    > Yohan N. Leder wrote:
    >
    > > print "Content-type: text/html; charset=iso-8859-1\n\n";

    >
    > ISO-8859-1 doesn't include a euro sign.


    Why would that matter? &euro; works well, across a wide range of
    browsers, new and old.

    > Try ISO-8859-15 instead.


    Oh no. There is really NO point in coding HTML in iso-8859-15.
    Browsers were already supporting utf-8 fairly well, before support for
    8859-15 was introduced. I really could not advise using 8859-15 to
    code web pages.

    Its use for coding *plain* text is a different matter, for sure.

    http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist#NoteWin

    regards
     
    Alan J. Flavell, May 19, 2006
    #3
  4. Yohan N. Leder <> scripsit:

    > Hoping it will match the alt.html group, because already tried in
    > comp.lang.perl.misc but it seems to be more related to browser and
    > multipart/form-data posting.


    Why didn't you summarize which answers you got there?

    Is there any reason to think that your problem is the least connected with
    how the form is _generated_ (e.g., Perl code)? That is, did you even try
    what happens if you simply use a static HTML document containing the form
    that the script generates? You would then could have posted the URL of that
    document, so that we would have a simpler manifestation of your problem.

    > Well, what do you think about the pb explained


    y do u use silly abbrs? It saves a few seconds of your time and wastes other
    people's time when they try to decipher your private codes. pb = problem
    ain't no std abbr.

    > in this test script called form2dump.pl :


    Your script name is irrelevant. What would matter is an absolute URL that
    would let us see the problem in action.

    Describing your _problem_ in program code comments (in sloppy style) is not
    a good approach. You are not helping us to help you.

    > # Script written to solve the bug explained below :


    Huh? How is the script supposed to solve "the bug"? And why the singular,
    when you clearly have two problems?

    > # PB : ? sign in any form field corrupt beginning of
    > multipart/form-data


    Which "? sign". Your Usenet message does not declare its character encoding,
    thereby implying ASCII, so you cannot insert the euro sign there, as you
    probably tried (guessing from the Subject line).

    The real problem is that there is no specification of what happens when the
    user types in a character that cannot be represented in the character
    encoding used for the form, which is the same as the encoding of the page
    (note that browsers ignore accept-charset attributes). When the encoding is
    iso-8859-1 and the user types in the euro sign, the browser might (for
    example) ignore it or - strangely, but perhaps usefully in some cases -
    represent it as an entity reference &euro; or some other way. Anyway, it is
    an error condition with no prescribed error processing.

    The lesson is that using iso-8859-15 instead, in addition to being a wrong
    move in general as Alan explained, would not help against all _other_
    characters that people may enter, even if it "worked" in some circumstances.
    You cannot prevent people from entering arbitrary data through your form;
    you can just process it the best you can.

    If you expect "any characters", then the logical move is to use utf-8.
    Naturally, your form handler then needs to be able to process utf-8 encoded
    data. In practice, you need a suitable library module for the job.

    > # in STDIN (1st lines with boundary & 1st
    > field declar truncated) # CAUSE : checkbox without any value
    > (uncheckd) cause this pb # - without <input type='checkbox'
    > name='chk'>, it works # - with <input type='checkbox' name='chk'>
    > checked, it works # NB : strange because an unchecked box shouldn't
    > be sent ! # IDEA : I've tried to provide an hidden field with same
    > name as # checkbox which would submit an 'off' value when
    > checkbox is # unchecked, but both values are sent when checkbox
    > is checked # SOL : ?


    Apparently my newsreader got wild when quoting your program code commens.
    I'm not going to fix it.

    You're telling that "it works" both ways, whether the checkbox is checked or
    not. You are not telling why it is a problem that it works. Neither are you
    telling what you really mean by "it works" and how we can decide whether "it
    works" or not.

    However, from past experience with similar-sounding problems, I suppose you
    have just not understood how checkboxes work in HTML form data processing.
    When a checkbox is checked upon submission, a name=value pair is generated;
    if it is not, no such pair is generated. This is how things were designed to
    work; live with it. This means in practice that your form handler needs to
    check for the _presence_ of a name=value pair with the name of the checkbox,
    and treat its absence as indicating that the checkbox was not checked.
     
    Jukka K. Korpela, May 19, 2006
    #4
  5. On Fri, 19 May 2006, Jukka K. Korpela wrote:

    > The lesson is that using iso-8859-15 instead, in addition to being a
    > wrong move in general as Alan explained, would not help against all
    > _other_ characters that people may enter, even if it "worked" in
    > some circumstances.


    I was following-up to a posting which hadn't mentioned that this was a
    form submission question, so my initial answer could have been a bit
    off-beam.

    But, now that I know it's a form submission question, my advice to use
    utf-8 is much stronger. Pretty much any currently used browser will
    support utf-8 form submission nowadays. The last browser of any
    widespread use to cause problems was NN4, and (to the best of my
    recollection) that browser would not perform any better with
    iso-8859-15 anyway. (Windows-1252 perhaps, but I would not recommend
    going that way!).

    Worse, NN4 claimed in its Accept-charset to be capable of rendering
    utf-8, so the obvious strategem of doing content negotiation on the
    browser's Accept-charset is ruled out. In fact, NN4 pretty much
    *could* render utf-8 as claimed - but in forms submission it submitted
    crap.

    Anyhow, the search engines such as google, which in earlier times
    supported a wide range of different form submission encodings, have
    been using utf-8 as their standard for several years now, so it's
    evident that they've concluded this is a viable way to proceed.
    I'd be happy to go along with that now.

    > You cannot prevent people from entering arbitrary data through your
    > form; you can just process it the best you can.


    Absolutely...

    > If you expect "any characters", then the logical move is to use
    > utf-8. Naturally, your form handler then needs to be able to process
    > utf-8 encoded data. In practice, you need a suitable library module
    > for the job.


    Agreed, this is the way to go for all practical purposes - hand
    knitted code for this job is sometimes useful for diagnostics, but
    for production one should use well-tested libraries/modules.

    regards

    Oh, http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html
     
    Alan J. Flavell, May 19, 2006
    #5
  6. In article <eMcbg.1906$>,
    says...
    > Yohan N. Leder <> scripsit:
    >
    > > Hoping it will match the alt.html group, because already tried in
    > > comp.lang.perl.misc but it seems to be more related to browser and
    > > multipart/form-data posting.

    >
    > Why didn't you summarize which answers you got there?


    Better than a summary, which would be false, by design, here is the url
    : <http://minilien.fr/a0juc6>

    > y do u use silly abbrs? It saves a few seconds of your time and wastes other
    > people's time when they try to decipher your private codes. pb = problem
    > ain't no std abbr.


    Sorry, but sometimes, you've not any time and have to try taking some
    shortcuts... However, "pb" is a well known abbreviation in French and
    sorry again to didn't have translated what is natural for me and, maybe,
    I can't known it's not for an English native man.

    > > in this test script called form2dump.pl :

    >
    > Your script name is irrelevant. What would matter is an absolute URL that
    > would let us see the problem in action.


    form2dump means "it's a form submission for which I'm observing what is
    received by the server". Also, in a first version of this test script I
    did "dumped" the "multipart/form-data" content toward a server file...
    Later, I've rewritten this part to get it on screen (i.e. client area of
    client browser) for facility and because this multipart/form-data
    doesn't contains any file upload (binary).

    > > # Script written to solve the bug explained below :

    >
    > Huh? How is the script supposed to solve "the bug"? And why the singular,
    > when you clearly have two problems?


    No, I've only one problem : "euro sign in any form field corrupt
    beginning of sent multipart/form-data (in detail : first lines
    containing boundary and declaration of the first field are truncated)"

    > > # PB : ? sign in any form field corrupt beginning of
    > > multipart/form-data

    >
    > Which "? sign". Your Usenet message does not declare its character encoding,
    > thereby implying ASCII, so you cannot insert the euro sign there, as you
    > probably tried (guessing from the Subject line).
    >


    Sorry about character encoding, but I'm using the newsreader called
    "MicroPlanet Gravity 2.5" and I don't find any option about "character
    encoding" in this release. Taking care of your message, I've searched a
    little on the web and it seems that the only Gravity-like program which
    provide something about character encoding is an unofficial release
    called "Super Gravity" : <http://www.usenet-fr.net/fur/minis-
    faqs/accents.html>. I'll take a look at it.

    However, the sign I told about was the "euro sign" which appeared as
    interrogation point in your newsreader.

    > The real problem is that there is no specification of what happens when the
    > user types in a character that cannot be represented in the character
    > encoding used for the form, which is the same as the encoding of the page
    > (note that browsers ignore accept-charset attributes).


    Nevertheless, when I'm trying to submit a form with "accept-
    charset='utf-8'" in an HTML page which has a content-type indicating a
    character set as "charset=iso-8859-1", the fields data are well
    transmitted in an UTF-8 format.

    > When the encoding is
    > iso-8859-1 and the user types in the euro sign, the browser might (for
    > example) ignore it or - strangely, but perhaps usefully in some cases -
    > represent it as an entity reference &euro; or some other way. Anyway, it is
    > an error condition with no prescribed error processing.


    Considering the station on which I've done my own test, it's not what
    I've seen. Don't no the reason why, but here is my experience : if the
    HTML page containing the form has a content-type indicating "iso-8859-
    1", if there's not any checkbox in the form, when I'm typing the euro
    sign from an Azerty keyboard using the graphic 'Alt' key in combination
    with the 'e' one, it well apperas in the form field and is well
    transmitted to the server (the euro sign is well present at the arrival
    ; in STDIN using my test script).

    However, you said you would prefer something inline for testing. So,
    I've done it and here it is : <>.

    Also, I'm rewriting an explanation of the problem for which I'm
    searching for a solution : "euro sign in any form field corrupt
    beginning of sent multipart/form-data (in detail : first lines
    containing boundary and declaration of the first field are truncated".

    And to finish : of course, I could use UTF-8, but there's several reason
    which "brake" me (some being about Perl, because I've found the problem
    I'm talking about during writing of a Perl script) :

    - Some target servers are using Perl 5.00503 under FreeBSD and there's
    nothing about UTF-8 encoding/decoding in the stock modules of this
    release.

    - On those old servers, stock Perl modules only are authorized, even in
    personal /cgi-bin directory. I'm aware it's a big constraint, but I've
    not any way to change the decision about that : we have to do with this!

    - HTML forms generated by the Perl scripts must be able to handle all
    which may be usually tped in English and French language, including euro
    sign.

    - These Perl scripts contain a configurable part where different persons
    (some being not developers) will be able to change some strings (stored
    as constants using the Perl syntax : "use constant NAMEOFCONSTANT =>
    "The string people can write, rewrite and manage by themself as if it
    was a configuration feature";"), and we can't ask them to type character
    entity rather than special or accentuated characters when there will be
    ones (e.g. &agrave;, etc). So, if we would choose to use UTF-8, we
    should, in the same time, find a way (without external module) to encode
    these "configurable strings" prior to display them in any browser (i.e
    write our own function).

    Hoping to have been more accurate this time ;-)
     
    Yohan N. Leder, May 19, 2006
    #6
  7. In article <>,
    says...
    > However, you said you would prefer something inline for testing. So,
    > I've done it and here it is : <>.
    >


    Oops, forgotten to provide the url I told about. Here it is :

    <http://yohannl.tripod.com/cgi-bin/form2dump.pl>
     
    Yohan N. Leder, May 19, 2006
    #7
  8. Yohan N. Leder <> scripsit:

    >> Why didn't you summarize which answers you got there?

    >
    > Better than a summary, which would be false, by design, here is the
    > url
    >> http://minilien.fr/a0juc6


    Why would a summary be false? If _you_ did not understand the answers well
    enough to summarize them for us, is each of us expected to read through
    them.

    > Sorry, but sometimes, you've not any time and have to try taking some
    > shortcuts... However, "pb" is a well known abbreviation in French


    You were already informed about the unsuitability of such jargon in the
    discussion you refer to, and _yet_ you kept using it. A less mild-mannered
    man than I am would lose patience here.

    >> Your script name is irrelevant. What would matter is an absolute URL
    >> that would let us see the problem in action.

    >
    > form2dump means "it's a form submission for which I'm observing what
    > is received by the server".


    No, it's just the name you gave.

    >>> # Script written to solve the bug explained below :

    >>
    >> Huh? How is the script supposed to solve "the bug"? And why the
    >> singular, when you clearly have two problems?

    >
    > No, I've only one problem : "euro sign in any form field corrupt
    > beginning of sent multipart/form-data (in detail : first lines
    > containing boundary and declaration of the first field are truncated)"


    You managed to give the impression of two distinct problems. Whether the
    euro sign and the checkbox are related remains to be seen.

    Next time, please start from a simple prose description of what you wanted
    to achieve, exactly how it failed, and what's the URL that lets other see
    it.

    >> The real problem is that there is no specification of what happens
    >> when the user types in a character that cannot be represented in the
    >> character encoding used for the form, which is the same as the
    >> encoding of the page (note that browsers ignore accept-charset
    >> attributes).

    >
    > Nevertheless, when I'm trying to submit a form with "accept-
    > charset='utf-8'" in an HTML page which has a content-type indicating a
    > character set as "charset=iso-8859-1", the fields data are well
    > transmitted in an UTF-8 format.


    This was new to me. Apparently IE 6 and IE 7 beta (at least) seem to honor
    the accept-charset attribute to some extent, though not to the extent of
    actually declaring the encoding in the form data set.

    I don't think this changes the big picture, though. If you ask for
    iso-8859-1 data transmission, as you explicitly do, you cannot really blame
    anyone else when characters outside the iso-8859-1 repertoire cause some
    trouble.

    >> Anyway, it is an error condition with no prescribed error
    >> processing.

    >
    > Considering the station on which I've done my own test, it's not what
    > I've seen.


    What you have seen is one particular error processing. It does not disprove
    the statement that you have created an error condition.

    > Also, I'm rewriting an explanation of the problem for which I'm
    > searching for a solution : "euro sign in any form field corrupt
    > beginning of sent multipart/form-data (in detail : first lines
    > containing boundary and declaration of the first field are truncated".


    Again, you are complaining about error processing in a situation where no
    particular error processing is required by the specifications.

    Besides, I was unable to observe the problem you describe. Your code for
    dumping raw data doesn't produce particularly readable output (I don't see
    line breaks).

    > And to finish : of course, I could use UTF-8,


    Well, that would be the solution, apparently. How you would implement it
    depends on your authoring environment.
     
    Jukka K. Korpela, May 19, 2006
    #8
  9. On Fri, 19 May 2006, Alan J. Flavell wrote:

    > On Fri, 19 May 2006, Jukka K. Korpela wrote:
    >
    > > The lesson is that using iso-8859-15 instead, in addition to being a
    > > wrong move in general as Alan explained, would not help against all
    > > _other_ characters that people may enter, even if it "worked" in
    > > some circumstances.


    A few years back, there were some reports of bizarre things happening
    in IE when a euro character was pasted into an iso-8859-1 form. Now
    that I've had time to look at this thread, I'm starting to think that
    this might be something similar.

    It's mentioned (as of dates in 2002 and 2004) in my writeup at
    http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html#iefurther

    But I'm afraid although my page is in English, most or all of that
    cited discussion will be in German, and I don't know whether the
    original poster can read that.

    In any case, if we conclude - as we've all said before - that it's a
    better approach to use utf-8 for forms submission, then the problem
    goes away by itself, and there's no need to understand which versions
    of IE are defective or just what they are getting wrong in this
    regard.

    Hope this helps a bit.
     
    Alan J. Flavell, May 19, 2006
    #9
  10. In article <gsnbg.2271$>,
    says...
    > Besides, I was unable to observe the problem you describe. Your code for
    > dumping raw data doesn't produce particularly readable output (I don't see
    > line breaks).
    >


    As you said yourself, it's raw data and line break are not HTML line
    breaks (ie. <br>). However, I could change every line break to <br>
    before displaying, but it doesn't change anything about the problem.

    Apparently you didn't seen anything wrong on your side... Then, it means
    your particular plateform (browser, os) doesn't fall in this issue while
    others do. What's your browser and operating system ?

    Also, you said : "Again, you are complaining about error processing in a
    situation where no particular error processing is required by the
    specifications."... Right, because, effectively an euro sign is not
    supposed to be processed in iso-8859-1. But wrong too, because in my
    example, the transmitted data, when checkbox is checked, includes the
    euro sign : strange... As stated by Alan J. F. elsewhere in the thread :
    it seems to be an old well known bug.

    And, to finish, it's always interesting to test everything as final user
    will do : entering everything, even what was not foreseen by the
    programmer... And the fact is that, even if I've choosen an iso-8859-1
    charset, a French user using an Azerty keyboard is able (and encouraged
    because the sign is printed on the key) to enter the euro sign...

    So, in this case and, again, because the content-type charset was iso-
    8859-1, I expected that the euro sign was striiped out and the rest of
    the data well transmitted : but it's appently not the case from every
    client : it's a problem we can call a bug !
     
    Yohan N. Leder, May 19, 2006
    #10
  11. Yohan N. Leder

    Neredbojias Guest

    To further the education of mankind, "Jukka K. Korpela"
    <> vouchsafed:

    >> Sorry, but sometimes, you've not any time and have to try taking some
    >> shortcuts... However, "pb" is a well known abbreviation in French


    > You were already informed about the unsuitability of such jargon in
    > the discussion you refer to, and _yet_ you kept using it. A less
    > mild-mannered man than I am would lose patience here.


    And they said you had no sense of humor...

    --
    Neredbojias
    Infinity has its limits.
     
    Neredbojias, May 19, 2006
    #11
  12. Yohan N. Leder <> scripsit:

    > As you said yourself, it's raw data and line break are not HTML line
    > breaks (ie. <br>).


    Rendering raw data without showing line breaks isn't logical.

    > However, I could change every line break to <br>
    > before displaying, but it doesn't change anything about the problem.


    It would make the problem easier to see.

    > And, to finish, it's always interesting to test everything as final
    > user will do : entering everything, even what was not foreseen by the
    > programmer...


    Yes, but we already know that error conditions will arise then, so why not
    concentrate in preventing such conditions or handling them properly, rather
    than asking why some particular browser handles it some particular way.

    > So, in this case and, again, because the content-type charset was iso-
    > 8859-1, I expected that the euro sign was striiped out and the rest of
    > the data well transmitted :


    There was no ground for such expectations. Luckily it didn't happen, since
    you might think "it works" and fail to know that it doesn't work on other
    browsers.

    > but it's appently not the case from every
    > client : it's a problem we can call a bug !


    No, it is error handling upon which no requirements have been made. I though
    this already became clear.
     
    Jukka K. Korpela, May 20, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. rphil

    Euro sign in .Net

    rphil, Apr 26, 2005, in forum: ASP .Net
    Replies:
    4
    Views:
    3,136
    Joerg Jooss
    Apr 28, 2005
  2. kingski

    Problem: Euro sign in sending email !

    kingski, Mar 3, 2006, in forum: ASP .Net
    Replies:
    7
    Views:
    713
    Juan T. Llibre
    Mar 4, 2006
  3. kingski

    Problem: Euro sign in send mail.

    kingski, Mar 3, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    437
    kingski
    Mar 3, 2006
  4. Marco W
    Replies:
    1
    Views:
    614
    David Carlisle
    Jun 8, 2005
  5. Yohan N. Leder
    Replies:
    17
    Views:
    294
    Dr.Ruud
    May 23, 2006
Loading...

Share This Page