Help with cleaning input text - removing control characters

Discussion in 'Javascript' started by Peter O'Reilly, Aug 5, 2004.

  1. I have an HTML form with a textarea input box. When the user conducts a
    post request (e.g. clicks the submit button), an HTML preview page is
    presented to them with the information they have filled out in the prior
    page's form elements.

    Naturally some users like to copy and paste text into the textarea box and
    presumably do so from say a word processor program. Some Macintosh based
    users I know of experience problems with foreign looking characters
    appearing in the HTML output, i.e tiny square boxes. The server processing
    their requests is PC/Microsoft Windows (2000) based.

    To fix the problem, I know this is a matter of removing certain control
    characters. I would like to write some client side Javascript validation
    code to handle this.

    The problem for me is two-fold. I do not have a Mac/PowerPC to use for
    testing. I am not all that familiar with Macs or know what control
    characters to screen for. (About the only thing I know is Mac and Windows
    use different control character representations for line feeds or carriage
    returns or both).

    Can someone shed some light on this for me? For example, which characters
    to look for in parsing strings, i.e. \n, \t, etc. Thanks.

    --
    Peter O'Reilly
     
    Peter O'Reilly, Aug 5, 2004
    #1
    1. Advertising

  2. Peter O'Reilly

    Evertjan. Guest

    Peter O'Reilly wrote on 05 aug 2004 in comp.lang.javascript:

    > To fix the problem, I know this is a matter of removing certain control
    > characters. I would like to write some client side Javascript validation
    > code to handle this.
    >


    <input
    onchange="this.value=this.value.replace(/[^a-z\d ]+/ig,'')"
    >


    removes anything that is not alphanumeric or space after loss of focus

    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)
     
    Evertjan., Aug 5, 2004
    #2
    1. Advertising

  3. Peter O'Reilly

    Mick White Guest

    Evertjan. wrote:

    > Peter O'Reilly wrote on 05 aug 2004 in comp.lang.javascript:
    >
    >
    >>To fix the problem, I know this is a matter of removing certain control
    >>characters. I would like to write some client side Javascript validation
    >>code to handle this.
    >>

    >
    >
    > <input
    > onchange="this.value=this.value.replace(/[^a-z\d ]+/ig,'')"


    onchange="this.value=this.value.replace(/[^a-z\d ]+/ig,' ')"

    Replace any character that is not a-z or a number with a space.
    Better, no?
    Mick
    >
    >
    > removes anything that is not alphanumeric or space after loss of focus
    >
     
    Mick White, Aug 5, 2004
    #3
  4. Peter O'Reilly

    Mick White Guest

    Mick White wrote:

    > Evertjan. wrote:
    >
    >> Peter O'Reilly wrote on 05 aug 2004 in comp.lang.javascript:
    >>
    >>
    >>> To fix the problem, I know this is a matter of removing certain control
    >>> characters. I would like to write some client side Javascript
    >>> validation
    >>> code to handle this.
    >>>

    >>
    >>
    >> <input
    >> onchange="this.value=this.value.replace(/[^a-z\d ]+/ig,'')"

    >
    >
    > onchange="this.value=this.value.replace(/[^a-z\d ]+/ig,' ')"
    >
    > Replace any character that is not a-z or a number with a space.
    > Better, no?
    > Mick


    Oops, you're right, I didn't notice the space in your "not" character set.
    Mick
    >
    >>
    >>
    >> removes anything that is not alphanumeric or space after loss of focus
    >>
     
    Mick White, Aug 5, 2004
    #4
  5. Peter O'Reilly

    Evertjan. Guest

    Mick White wrote on 05 aug 2004 in comp.lang.javascript:
    >> <input
    >> onchange="this.value=this.value.replace(/[^a-z\d ]+/ig,'')"

    >
    > onchange="this.value=this.value.replace(/[^a-z\d ]+/ig,' ')"
    >
    > Replace any character that is not a-z or a number with a space.
    > Better, no?
    >


    Better, yes.
    But not quite complete:

    ==============

    Replace any group of characters that are
    not a-z
    or A-Z
    or a number
    or a space
    with a space:

    onchange="this.value=this.value.replace(/[^a-z\d ]+/ig,' ')"

    [this will leave multiple spaces as they are,
    but replace multiple repaceants with one space]

    ==============

    Replace any character that is
    not a-z
    or A-Z
    or a number
    or a space
    with a space:

    onchange="this.value=this.value.replace(/[^a-z\d ]/ig,' ')"

    [this will leave multiple spaces as they are,
    and replace multiple repaceants with multiple spaces]


    ==============

    Replace any group of characters that are
    not a-z
    or A-Z
    or a number
    with a space:

    onchange="this.value=this.value.replace(/[^a-z\d]+/ig,' ')"

    [this will replace multiple white space with one space,
    and replace multiple repaceants with multiple spaces]

    ===============

    not tested, beware of any silly mistake.

    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)
     
    Evertjan., Aug 5, 2004
    #5
  6. Evertjan & Mick,

    Thank you both for the very helpful replies and code samples.

    To be honest though, I am a little bit uncomfortable with the "what to
    allow" approach. Don't get me wrong, your regular expressions are great, but
    I'm afraid it may be a bit too aggressive in replacing text. For example,
    consideration must be given for characters like !, @, #, $ ~ , etc. Of
    course, those characters can always be added to the regular expression. I'm
    afraid I will not think of all possible allowable characters.

    Instead, a "what not to allow" approach would be most ideal,
    e.g.specifically targeting those few characters to screen out. What those
    characters are is a mystery to me.
    Perhaps String.charCodeAt() approach is needed?

    Thanks again/dank u wel.

    --
    Peter O'Reilly
     
    Peter O'Reilly, Aug 5, 2004
    #6
  7. Peter O'Reilly

    Evertjan. Guest

    Peter O'Reilly wrote on 05 aug 2004 in comp.lang.javascript:

    > Instead, a "what not to allow" approach would be most ideal,
    > e.g.specifically targeting those few characters to screen out. What
    > those characters are is a mystery to me.


    If you do not know the character or it's ascii value or it's unicode value,
    it will be very difficult to specify a positive exclusion, Peter.

    onchange="this.value=this.value.replace(/[@\\\n\x08\x1b\u00A9]+/ig,' ')"

    This will exclude:
    The @
    the \ itself (\\)
    the linfeed char (\n)
    the backspace (\x08 = hex 8)
    the escape (\x21 = hex 1b = decimal 27)
    the unicode copyright symbol (\u00A9 = ©)


    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)
     
    Evertjan., Aug 5, 2004
    #7
  8. > If you do not know the character or it's ascii value or it's unicode
    value,
    > it will be very difficult to specify a positive exclusion, Peter.


    Evertjan, it's good to see that you are finally catching on. If someone
    could shed some more insight into the original query, that would be great.
    I'm sure someone else here must have experienced such problem and found a
    solution for it.

    In particular information on the character encoding issues or type(s) used
    by English Macintosh users
    (versus the IBM-PC/OEM ASCII character set I am accustomed to) would be
    helpful.


    --
    Peter "UTF-8" O'Reilly
     
    Peter O'Reilly, Aug 6, 2004
    #8
  9. On Thu, 05 Aug 2004 16:33:49 +0000, Evertjan. wrote:

    > <input
    > onchange="this.value=this.value.replace(/[^a-z\d ]+/ig,'')"


    Don't forget to perform this validation on the server side, too, for those
    with JavaScript disabled in their browser.

    La'ie Techie
     
    LÄÊ»ie Techie, Aug 7, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Aaron Kunkle

    removing terminal control characters

    Aaron Kunkle, Sep 4, 2003, in forum: Python
    Replies:
    1
    Views:
    705
    yaipa h.
    Sep 5, 2003
  2. joesin

    Cleaning User Input...

    joesin, Aug 17, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    521
    =?Utf-8?B?RGF2aWQgSmVzc2Vl?=
    Aug 17, 2006
  3. Chris  Chiasson
    Replies:
    6
    Views:
    625
    Richard Tobin
    Nov 14, 2006
  4. Avatar

    Cleaning out a text file.

    Avatar, Jul 15, 2003, in forum: Perl Misc
    Replies:
    3
    Views:
    110
    Jürgen Exner
    Jul 15, 2003
  5. Ted Byers
    Replies:
    11
    Views:
    191
Loading...

Share This Page