CGI.pm and lost carriage returns

Discussion in 'Perl Misc' started by Joseph Czapski, Jul 20, 2006.

  1. Hi, Perl practitioners. I'm having a problem with CGI.pm. If I have an
    HTML form with a textarea input box, I would like my Perl program to see the
    carriage returns that the user typed in so I can format his text
    appropriately.

    Using

    $value = $q->param($name);

    gives me the text with all the carriage returns deleted. Some words are
    just stuck together where they were separated by only one or more carriage
    returns.

    I like to use CGI.pm for the neatness and the file uploading capability.

    Thanks for your help!

    Joe Czapski
    Boston, Mass.
     
    Joseph Czapski, Jul 20, 2006
    #1
    1. Advertisements

  2. Joseph Czapski

    xhoster Guest

    Most likely, either your web browser isn't sending what you think it is
    sending, or the data you are seeing is not what you think you are seeing.

    Can you provide an example script that produces the form and evaluates the
    response in a way to demonstrate what you are saying?

    Xho
     
    xhoster, Jul 20, 2006
    #2
    1. Advertisements

  3. Joseph Czapski

    David Squire Guest

    How and where are you displaying $value to make this judgment? In a web
    browser? If so, not that HTML does not recognize carriage returns - it
    uses <BR> (or <BR/> for XHTML :) ) to indicate line breaks.

    Still, it should at least treat them as white space...

    Can you give us some more details, e.g. as the posting guidelines for
    this group say, a small but complete script demonstrating your problem
    (including data)?


    DS
     
    David Squire, Jul 20, 2006
    #3
  4. Joseph Czapski

    usenet Guest

    I think CGI is a great module, but the one fault that I would find is
    the sloppy and incomplete perldocs. I cannot think of a Perl builtin
    module that has worse documentation.

    IMHO, If someone wants to do serious CGI programming, s/he really needs
    to get a book that fills in the gaping holes in the perldocs.
    Unfortunately, the selection is neither wide nor particularly good. I
    have the "Official Guide to Programming with CGI.pm" by Lincoln Stein
    (the author of the module), which is kinda like an annotated version of
    the perldocs. But, at least, it has fairly complete information.

    Page 261-262 of the "Official Guide" describes the behavior of the
    textarea's wrapping properties, which is controlled by a "-wrap"
    argument (which is not mentioned in any way in the perldocs).

    <quote>
    -wrap: Sets the WRAP attribute. It can be one of "off," "physical," or
    "virtual." If "off," word wrapping only occurs in the field when the
    user presses the Enter key. The contents of the field are transmitted
    to your script with line breaks inserted exactly as they were displayed
    to the user. If "physical," word wrapping occurs automatically when the
    text exceeds the width of the field, and the text is transmitted to
    your script as if the user had actually transmitted it that way. If
    "virtual," word wrapping occurs automatically when the text exceeds the
    width of the field, but the contents of the field are transmitted to
    your script as a single unbroken line of text (unless the user inserts
    a blank line manually).
    </quote>

    Of course, someone who was rather familiar with HTML itself could
    probably guess how WRAP (which is an HTML property) is implemented in
    CGI.pm. Personally, though, I use CGI precisely because I DON'T want
    to fool with the oddities of HTML.
     
    usenet, Jul 20, 2006
    #4
  5. David Filmer wrote:
    ....
    ....

    Holy smoke, I think the WRAP attribute may be the issue. I have it set to
    'virtual' on all forms. I'm going to test that and then reply back.

    Thank you very much!

    Joe Czapski
    Boston, Mass.
     
    Joseph Czapski, Jul 20, 2006
    #5
  6. Joseph Czapski

    Todd Guest

    Did you try printing the result out as:

    print "<hr /><pre>$value</pre><hr />";

    Todd
     
    Todd, Jul 20, 2006
    #6
  7. Joseph Czapski

    xhoster Guest

    I don't think it is a module's documentation's job to document stuff
    outside of the module. It is great if it happens to point out some outside
    gotchas, but that is not what it is primarily there for.
    Yes, especially the gaping holes that have nothing to do with Perl.
    There are plenty of resources on the web for this.
    And, in fact, not mentioned in any relevant way in the CGI.pm source code
    either. -wrap is merely passed on to the html directly without any
    specific interpretation on the part of CGI.pm.
    That description seems to be quite inaccurate. The behavior of the wrap
    attribute depends on what browser you are using, but as far as I can tell
    no modern browser behaves the way that description says. I think that is
    an excellent argument for not including it in the perldoc. Also, it does
    not cover "hard" or "soft".
    Unfortunately, the oddities of HTML cannot be entirely abstracted away, no
    matter how much we wish they could be.

    Xho
     
    xhoster, Jul 20, 2006
    #7
  8. Yup. Setting WRAP to 'physical' solved the problem. I had to add some
    additional code, too:

    $value =~ s/(\S)\s*?\x0A\s*\x0A\s*?(\S)/$1<br><br>$2/g;
    $value =~ s/(\S)\s*?\x0D\s*\x0D\s*?(\S)/$1<br><br>$2/g;
    $value =~ s/\s*\x0A\s*/ /g;
    $value =~ s/\s*\x0D\s*/ /g;

    The 'physical' wrap isn't so great, either. It sends a line break at every
    wrapping point. But that's OK, because I can get the formatting I desire by
    preserving just the double (or greater) line breaks as <br><br>, and
    replacing the single line breaks with a space. I tried to do this in a
    platform independent way in the above code. It tests out well.

    Thanks again to all who replied!

    Joe Czapski
    Boston, Mass.
     
    Joseph Czapski, Jul 20, 2006
    #8
  9. Xho wrote:
    ....
    ....

    You're right! Further testing shows me that the 'physical' wrap of the
    textarea box does NOT behave as described in the HTML spec. when using
    Internet Explorer. The form does not return newlines at each wrap point,
    but returns only newlines typed by the user. I was taking the spec. as
    truth.

    This is good news actually. Now I can know exactly where the user typed
    newlines, and format appropriately.

    Joe Czapski
    Boston, Mass.
     
    Joseph Czapski, Jul 20, 2006
    #9
  10. Really? The 'wrap' attribute is not mentioned in the HTML 4.01
    Specification, not even as deprecated, and various browsers may (and do)
    ignore some of the wrap variants.

    I think you should consider to not let the textarea width determine the
    text formating, but instead use a module, e.g. Text::Format, for the
    purpuse.
     
    Gunnar Hjalmarsson, Jul 20, 2006
    #10
  11. Joseph Czapski

    Justin C Guest

    I don't like to get pedantic but I like even less incorrect information
    being passed on.

    XHTML is lower case only, at least from 1.0 onwards. So that'd be <br/>.


    Justin.
     
    Justin C, Jul 20, 2006
    #11
  12. Joseph Czapski

    David Squire Guest

    Yikes. I had no idea. Thanks for that. No browser that I know of yet cares.

    I had (wrongly) assumed that it continued the (perhaps de facto) case
    insensitivity of HTML.

    I must admit that I can't see any advantage to case-sensitivity for
    XHTML tokens, particularly given the history of HTML.


    DS
     
    David Squire, Jul 20, 2006
    #12
  13. Joseph Czapski

    Justin C Guest

    It's got something to do with XML being case sensitive.

    http://www.w3.org/TR/xhtml1/#h-4.2

    Actually, the above link doesn't say any more than I have above... but
    it's from the horses mouth. I'm sure, if you want the gorey details,
    they're around on Google.


    Justin.
     
    Justin C, Jul 21, 2006
    #13
  14. Joseph Czapski

    Ben Morrow Guest

    I'd be very surprised if Mozilla-based browsers didn't object *if* you
    serve the XHTML as XHTML (i.e. with an XML content-type). If you serve
    it as HTML (which is wrong anyway, AppC of the XHTML spec
    notwithstanding) then the browser will parse it by HTML's rules. In this
    case you'd be much better off using HTML instead.
    All XML element names are case-sensitive.

    Ben
     
    Ben Morrow, Jul 21, 2006
    #14
  15. Apart from being wrong in detail, as already pointed out, this seems
    to me to be bizarrely wrong at the level of principles too. Even
    though they are, strictly speaking, off-topic for this group, I feel
    bound to make a comment.

    The format of a submitted textarea is reasonably well specified in the
    real HTML specification,
    http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.7
    (in conjunction with
    http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.3 ).

    (This is not confuddled by proprietary "wrap=" attributes, which are
    implemented in diverse and confusing ways. Obviously I've noted the
    subsequent discussion about browsers inserting newlines for local
    display purposes only, and not sending them as part of the submitted
    data).

    Anyhow, my point is that the submitted data (once the form submission
    encoding layer has been unwrapped at the server side) is in principle
    *plain text*.

    Sure, that plain text *could* be HTML "source", or equally it could be
    C++ source or a Perl script or... just plain *plain text*.

    The idea of simply stuffing-in <br> tags wherever a newline is seen in
    the source is quite bizarre to me. If you want to produce proper HTML
    from what was meant to be plain text then you need a properly defined
    procedure for doing so (you see such functionality in the
    editing features of various Wikis, for example).

    On the other hand if your users are expecting to be inputting HTML
    "source code", you sure don't want to go inserting unsolicited tags.
    You might very well want to analyze the input for potentially
    compromising markup, though (scripting attacks and such).
    What's "it" meant to be in this sentence? Have we even understood
    what it is that the O.P is intending to achieve? Whatever it is, I'm
    highly sceptical of the server-side processing merely sprinkling the
    input with <br> tags instead of newlines, and nothing more: it does
    not seem to be a solution to any variant of this problem that I can
    think of. BICBW, of course.

    regards
     
    Alan J. Flavell, Jul 21, 2006
    #15
  16. Joseph Czapski

    David Squire Guest

    Hmmm. I see it so often that I would almost call it a FAQ. People ask
    "where did my linebreaks go?" when displaying text in a browser. This is
    due to not realizing that HTML does not use CR, LF etc. for this purpose.

    A common situation where this might arise is a simple comment field
    where the comment typed is to be displayed on an HTML page, and the
    designer wants user newlines to be retained in formatting. Often <BR>
    tags is all that is needed to get the desired effect... and indeed the
    OP has already indicated that doing just that solved his problem.

    DS
     
    David Squire, Jul 21, 2006
    #16
  17. But the input was *NOT* meant to be HTML in the first place, so
    attempting to display it as such is completely illogical. If it's
    plain text, then send it as text/plain. Even MSIE has finally caught
    up with that concept.
    Yeah, and then the mischievous user inserts some naughty javascript,
    or includes a link to some dangerous web page, and soon the damage is
    done.
    *Absolutely not*. Have you *no* sense of network security?
    The "desired effect" is not half of what you're liable to get, if you
    allow arbitrary web users to type their choice of HTML and you calmly
    insert it into your web page.
    It might have "solved" what the O.P perceived to be the problem. After
    all, the (in)famous Matt would have had no idea when he launched his
    Script Archive just what kinds of network abuse he would be
    responsible for.

    --
     
    Alan J. Flavell, Jul 21, 2006
    #17
  18. Joseph Czapski

    David Squire Guest

    I don't agree with this. You could see it as a terribly simple Wiki
    code: only newlines are significant as extra mark-up. There are all
    sorts of Wikis around now that take non-HTML mark-up entered as plain
    text in forms and convert it to HTML.
    Fair enough. Point taken. There would have to be other sanity checks too.


    DS
     
    David Squire, Jul 21, 2006
    #18
  19. ....

    Sorry for the further confusion. Now I think that *eliminating* the WRAP
    attribute entirely is the best thing to do. And my code snippet after
    getting the $value back from CGI.pm is:

    $value =~ s/(\S)\s*?\x0A\s*\x0A\s*?(\S)/$1<br><br>$2/g;
    $value =~ s/(\S)\s*?\x0D\s*\x0D\s*?(\S)/$1<br><br>$2/g;
    $value =~ s/\s*\x0A\s*/<br>/g;
    $value =~ s/\s*\x0D\s*/<br>/g;


    Joe Czapski
    Boston, Mass.
     
    Joseph Czapski, Jul 21, 2006
    #19
  20. Joseph Czapski

    Dr.Ruud Guest

    Joseph Czapski schreef:
    $value =~ s/(\S)\s*?(\x0A|\x0D)\s*\2\s*?(\S)/$1<br><br>$3/g ;

    or maybe

    $value =~ s/\s*(?:\x0A|\x0D)\s*/<br>/g ;

    I would also do a s/(<br>)(.)/$1\n$2/g ;
     
    Dr.Ruud, Jul 21, 2006
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.