Sanitizing user inputs in multiple languages

Discussion in 'Perl Misc' started by Sam, Apr 5, 2007.

  1. Sam

    Sam Guest

    An application I am developing needs to accept input in any of the 15
    languages I've opted for, from a single, common HTML form.

    I generally sanitize user inputs from an HTML form by specifying a
    list of allowed characters. How can I do a similar sanitization for
    inputs that can be in any of the 15 languages if the encoding is
    UTF-8? Is there a better method?
     
    Sam, Apr 5, 2007
    #1
    1. Advertising

  2. Sam wrote:
    > An application I am developing needs to accept input in any of the 15
    > languages I've opted for, from a single, common HTML form.
    >
    > I generally sanitize user inputs from an HTML form by specifying a
    > list of allowed characters. How can I do a similar sanitization for
    > inputs that can be in any of the 15 languages if the encoding is
    > UTF-8?


    Quite simple actually. Just take the superset of the white lists of
    characters for each language.

    > Is there a better method?


    Depends on your definition of "better".
    More secure? Probably no, white lists are much more secure than black lists.
    Easier? Well, depends on your definition of "sanitize". If you want to e.g.
    eliminate x-site scripting, then you can simply remove those few characters,
    that are know to cause x-site scripting. There are modules to do that.

    jue
     
    Jürgen Exner, Apr 5, 2007
    #2
    1. Advertising

  3. Sam

    Sam Guest

    ( Discussion thread in Google Group -
    http://groups.google.com/group/perl...8e68b039cd2/95f19485ac944239#95f19485ac944239
    )

    Sam wrote:
    > How can I do a similar sanitization for
    > inputs that can be in any of the 15 languages if the encoding is
    > UTF-8?


    Jürgen Exner wrote:
    > Quite simple actually. Just take the superset of the white lists of
    > characters for each language.


    Yes, I had something similar in mind too, but am stuck on how to
    actually implement it. Here's what I've picked up so far -

    1. First make sure that input is UTF-8 encoded (
    http://www.w3.org/International/questions/qa-forms-utf-8 )
    2. Select the allowed characters in each language (all alphabets and
    numbers) and use in a regex.

    For implementing the second step, I found a useful UTF-8 encoding
    table which has the UTF and Hex code for the characters in each
    language ( http://www.utf8-chartable.de/ ).

    Here's the problem - How do I do identify the important characters of
    a language I don't know? For example, I know some of the alphabets of
    the Arabic language, but don't really know if characters like (for eg)
    the ARABIC POETIC VERSE SIGN is necessary. Second, how do I use the
    UTF-8 hex codes for the characters in a regex?

    Somebody must have a better solution ... I do get the feeling that
    this approach isn't great.

    Jürgen Exner wrote:
    > Well, depends on your definition of "sanitize". If you want to e.g.
    > eliminate x-site scripting, ...


    Yes, the final intention is to prevent x-site scripting, but am not
    aware of any widely used, popular modules for this.
     
    Sam, Apr 5, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jim Washington

    Sanitizing untrusted code for eval()

    Jim Washington, Aug 22, 2005, in forum: Python
    Replies:
    9
    Views:
    505
    Alan Kennedy
    Aug 23, 2005
  2. Replies:
    3
    Views:
    1,418
    Roedy Green
    Jun 20, 2008
  3. Petr Muller
    Replies:
    0
    Views:
    218
    Petr Muller
    Mar 9, 2009
  4. Petr Muller
    Replies:
    0
    Views:
    484
    Petr Muller
    Mar 9, 2009
  5. Gabriel Genellina

    Re: xml input sanitizing method in standard lib?

    Gabriel Genellina, Mar 9, 2009, in forum: Python
    Replies:
    2
    Views:
    266
    Gabriel Genellina
    Mar 10, 2009
Loading...

Share This Page