Detect non-ascii substrings in a file

Discussion in 'Ruby' started by killy971, Jun 19, 2008.

  1. killy971

    killy971 Guest

    I have files encoded in Shift_JIS, that mainly contains JSP source
    code (ascii), but sometimes also contains strings that are non-ascii
    (japanese words).

    So, I would like to know if there is a way with ruby to :
    - detect files containing something else than ascii,
    - extract the non-ascii strings thare were found.

    Thank you !
     
    killy971, Jun 19, 2008
    #1
    1. Advertising

  2. killy971

    Ron Fox Guest

    Any character that has the top bit clear is potentially valid ascii,
    though if you take away the non printing characters there's an
    additional exlusion set.
    According to http://en.wikipedia.org/wiki/Shift-JIS

    Testing for character codes with the top bit set should indicate
    either katakana or double byte characters. See the chart there for
    which ranges are double byte, which are single and which are not legal.

    RF

    killy971 wrote:
    > I have files encoded in Shift_JIS, that mainly contains JSP source
    > code (ascii), but sometimes also contains strings that are non-ascii
    > (japanese words).
    >
    > So, I would like to know if there is a way with ruby to :
    > - detect files containing something else than ascii,
    > - extract the non-ascii strings thare were found.
    >
    > Thank you !



    --
    Ron Fox
    NSCL
    Michigan State University
    East Lansing, MI 48824-1321
     
    Ron Fox, Jun 19, 2008
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TOXiC
    Replies:
    5
    Views:
    1,325
    TOXiC
    Jan 31, 2007
  2. Vlastimil Brom
    Replies:
    1
    Views:
    938
    John Nagle
    Aug 22, 2010
  3. Michel Claveau - MVP
    Replies:
    3
    Views:
    459
    John Machin
    Aug 22, 2010
  4. bruce
    Replies:
    38
    Views:
    322
    Mark Lawrence
    Nov 1, 2013
  5. MRAB
    Replies:
    0
    Views:
    116
Loading...

Share This Page