Preventing binary input

Discussion in 'C++' started by Avinash, Apr 12, 2007.

  1. Avinash

    Avinash Guest

    I am writing an application that takes a file as an input.
    I want to avoid binary files that have been specified by the user.
    Is there any way to detect that a file contains binary data?

    Thanks,
    Avinash.
     
    Avinash, Apr 12, 2007
    #1
    1. Advertising

  2. Avinash wrote:
    > I am writing an application that takes a file as an input.
    > I want to avoid binary files that have been specified by the user.
    > Is there any way to detect that a file contains binary data?


    No. The distinction between "binary" and "text" is purely
    speculative and exists only in our heads. The same data
    repository [on any external storage] can be opened as binary
    or text depending on our intentions.

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Apr 12, 2007
    #2
    1. Advertising

  3. Avinash skrev:
    > I am writing an application that takes a file as an input.
    > I want to avoid binary files that have been specified by the user.
    > Is there any way to detect that a file contains binary data?
    >


    You could parse while you're not encountering illegal codepoints,
    and terminate if you do. But what constitutes as 'illegal codepoints'
    is up to you. Preparse it to validate.

    --
    OU
     
    Obnoxious User, Apr 12, 2007
    #3
  4. Avinash

    James Kanze Guest

    On Apr 12, 7:50 pm, "Avinash" <> wrote:
    > I am writing an application that takes a file as an input.
    > I want to avoid binary files that have been specified by the user.
    > Is there any way to detect that a file contains binary data?


    It depends on the system. On some systems, you can't open a
    binary file in text mode, so the open would fail. This isn't
    the case for Unix or Windows, however. Other than that, you can
    look at the first n bytes (for some appropriate n), and use a
    heuristic to guess: historically, the presence of nul bytes, for
    example, or for that matter, on most systems, the presence of
    any bytes in the ranges 0x00-0x06 or 0x0E-0x1F generally means
    binary; so may (often) a byte in the range 0x80-0x9F. This
    depend on the text encoding used, however: the sequence 0xC3,
    0x89 is a capital E with an acute accent in UTF-8, and nul bytes
    are likely to be pretty frequent if UTF-16 or UTF-32 is used.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Apr 13, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fangs
    Replies:
    3
    Views:
    9,852
    darshana
    Oct 26, 2008
  2. Stimp

    preventing malicious user input

    Stimp, Sep 14, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    540
  3. Lars Netzel
    Replies:
    3
    Views:
    3,060
    Lars Netzel
    Apr 5, 2005
  4. Marc Schellens
    Replies:
    8
    Views:
    3,040
    John Harrison
    Jul 15, 2003
  5. owen
    Replies:
    4
    Views:
    122
    Martin Honnen
    Dec 2, 2004
Loading...

Share This Page