find a pattern in binary file

Discussion in 'C++' started by vizzz, Jun 20, 2008.

  1. vizzz

    vizzz Guest

    Hi there,
    i need to find an hex pattern like 0x650A1010 in a binary file.
    i can make a small algorithm that fetch all the file for the match,
    but this file is huge, and i'm scared about performances.
    Is there any stl method for a fast search?
    Andrea
     
    vizzz, Jun 20, 2008
    #1
    1. Advertising

  2. vizzz

    Kai-Uwe Bux Guest

    vizzz wrote:

    > Hi there,
    > i need to find an hex pattern like 0x650A1010 in a binary file.
    > i can make a small algorithm that fetch all the file for the match,
    > but this file is huge, and i'm scared about performances.
    > Is there any stl method for a fast search?


    You could try std::search() with istreambuf_iterator< unsigned char >.

    However:

    (a) It is not clear that you will get good performance. Some implementations
    are not really all that good with stream iterators.

    (b) I am not sure whether search() is allowed to use backtracking
    internally, in which case you cannot use it with stream iterators. You
    should check.

    (c) Even if search finds an occurrence, it reports the result as an
    iterator. I do not know of a convenient way to convert that into an offset.


    Maybe, rolling your own is not all that bad. You could read the file in
    chunks (keeping the last three characters from the previous block) and use
    std::search() on the blocks. With the right blocksize, this could be really
    fast.


    If your OS allows memory mapping of the file, you could do that and use
    std::search() with unsigned char * on the whole thing. That could be the
    fasted way, but will leave the realm of standard C++.


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Jun 20, 2008
    #2
    1. Advertising

  3. vizzz

    Ivan Guest

    On Jun 20, 1:11 pm, vizzz <> wrote:
    > Hi there,
    > i need to find an hex pattern like 0x650A1010 in a binary file.
    > i can make a small algorithm that fetch all the file for the match,
    > but this file is huge, and i'm scared about performances.
    > Is there any stl method for a fast search?
    > Andrea


    Hmmm... I had a look at this and ran accross a simple problem. How do
    you read a binary file and just echo the HEX for byte to the screen.
    The issue is the c++ read function doesn't return number of bytes
    read... so on the last read into a buffer how do you know how many
    characters to print?

    Thanks,
    Ivan Novick
    http://www.mycppquiz.com
     
    Ivan, Jun 21, 2008
    #3
  4. vizzz

    Kai-Uwe Bux Guest

    Ivan wrote:

    > On Jun 20, 1:11 pm, vizzz <> wrote:
    >> Hi there,
    >> i need to find an hex pattern like 0x650A1010 in a binary file.
    >> i can make a small algorithm that fetch all the file for the match,
    >> but this file is huge, and i'm scared about performances.
    >> Is there any stl method for a fast search?
    >> Andrea

    >
    > Hmmm... I had a look at this and ran accross a simple problem. How do
    > you read a binary file and just echo the HEX for byte to the screen.


    #include <iostream>
    #include <ostream>
    #include <fstream>
    #include <iterator>
    #include <iomanip>
    #include <algorithm>
    #include <cassert>

    class print_hex {

    std::eek:stream * ostr_ptr;
    unsigned int line_length;
    unsigned int index;

    public:

    print_hex ( std::eek:stream & str_ref, unsigned int length )
    : ostr_ptr( &str_ref )
    , line_length ( length )
    , index ( 0 )
    {}

    void operator() ( unsigned char ch ) {
    ++index;
    if ( index >= line_length ) {
    (*ostr_ptr) << std::hex << std::setw(2) << std::setfill( '0' )
    << (unsigned int)(ch) << '\n';
    index = 0;
    } else {
    (*ostr_ptr) << std::hex << std::setw(2) << std::setfill( '0' )
    << (unsigned int)(ch) << ' ';
    }
    }

    };

    int main ( int argn, char ** args ) {
    assert( argn == 2 );
    std::ifstream in ( args[1] );
    std::for_each( std::istreambuf_iterator< char >( in ),
    std::istreambuf_iterator< char >(),
    print_hex( std::cout, 25 ) );
    std::cout << '\n';
    }


    > The issue is the c++ read function doesn't return number of bytes
    > read... so on the last read into a buffer how do you know how many
    > characters to print?


    Have a look at readsome().



    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Jun 21, 2008
    #4
  5. vizzz

    Eric Pruneau Guest

    "vizzz" <> a écrit dans le message de news:
    ...
    > Hi there,
    > i need to find an hex pattern like 0x650A1010 in a binary file.
    > i can make a small algorithm that fetch all the file for the match,
    > but this file is huge, and i'm scared about performances.
    > Is there any stl method for a fast search?
    > Andrea


    Check out boost::regex

    http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html
     
    Eric Pruneau, Jun 21, 2008
    #5
  6. vizzz

    James Kanze Guest

    On Jun 20, 10:43 pm, Kai-Uwe Bux <> wrote:
    > vizzz wrote:


    > > i need to find an hex pattern like 0x650A1010 in a binary
    > > file. i can make a small algorithm that fetch all the file
    > > for the match, but this file is huge, and i'm scared about
    > > performances. Is there any stl method for a fast search?


    > You could try std::search() with istreambuf_iterator< unsigned char >.


    That's very problematic. istreambuf_iterator< unsigned char >
    will expect a basic_streambuf< unsigned char >, which isn't
    defined by the standard (and you're not allowed to define it).
    A number of implementations do provide a generic version of
    basic_streambuf, but since the standard doesn't say what the
    generic version should do, they tend to differ. (I remember
    sometime back someone posting in fr.comp.lang.c++ that he had
    problems because g++ and VC++ provide incompatible generic
    versions.)

    It would, I suppose, be possible to use istream_iterator<
    unsigned char >, provided the file was opened in binary mode,
    and you reset skipws. I have my doubts about the performance of
    this solution, but it's probably worth a try---if the
    performance turns out to be acceptable, you won't get much
    simpler.

    Except, of course, that search requires forward iterators, and
    won't (necessarily) work with input iterators.

    [...]
    > Maybe, rolling your own is not all that bad. You could read
    > the file in chunks (keeping the last three characters from the
    > previous block) and use std::search() on the blocks. With the
    > right blocksize, this could be really fast.


    A lot depends on other possible constraints. He didn't say, but
    his example was to look for 0x650A1010, not the sequence 0x65,
    0x0A, 0x10, 0x10. If what he is really looking for is a four
    byte word, correctly aligned, then as long as the block size is
    a multiple of 4, he could use search() with an
    iterator::value_type of uint32_t. For arbitrary positions and
    sequences, on the other hand, some special handling might be
    necessary for cases where the sequence spans a block boundary.

    When I had to do something similar, I reserved a guard zone in
    front of my buffer, and used a BM search in the buffer. When
    the BM search would have taken me beyond the end of the buffer,
    I copied the last N bytes of the buffer into the end of the
    guard zone before reading the next block, and started my next
    search from them. This would probably make keeping track of the
    offset a bit tricky (I didn't need the offset), and for the best
    performance on the system I was using then, I had to respect
    alignment of the buffer as well, which also added some extra
    complexity. (But I got the speed we needed:).)

    > If your OS allows memory mapping of the file, you could do
    > that and use std::search() with unsigned char * on the whole
    > thing. That could be the fasted way, but will leave the realm
    > of standard C++.


    If the entire file will fit into memory, perhaps just reading it
    all into memory, and then using std::search, would be an
    appropriate solution. Or perhaps not: it's often faster to use
    a somewhat smaller buffer, and manage the "paging" yourself.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 21, 2008
    #6
  7. vizzz

    James Kanze Guest

    On Jun 21, 2:13 am, Kai-Uwe Bux <> wrote:
    > Ivan wrote:
    > > On Jun 20, 1:11 pm, vizzz <> wrote:


    > > Hmmm... I had a look at this and ran accross a simple
    > > problem. How do you read a binary file and just echo the
    > > HEX for byte to the screen.


    > #include <iostream>
    > #include <ostream>
    > #include <fstream>
    > #include <iterator>
    > #include <iomanip>
    > #include <algorithm>
    > #include <cassert>


    > class print_hex {


    > std::eek:stream * ostr_ptr;
    > unsigned int line_length;
    > unsigned int index;


    > public:


    > print_hex ( std::eek:stream & str_ref, unsigned int length )
    > : ostr_ptr( &str_ref )
    > , line_length ( length )
    > , index ( 0 )
    > {}


    > void operator() ( unsigned char ch ) {
    > ++index;
    > if ( index >= line_length ) {
    > (*ostr_ptr) << std::hex << std::setw(2) << std::setfill( '0' )
    > << (unsigned int)(ch) << '\n';
    > index = 0;
    > } else {
    > (*ostr_ptr) << std::hex << std::setw(2) << std::setfill( '0' )
    > << (unsigned int)(ch) << ' ';


    Wouldn't it be preferable to set the formatting flags in the
    constructor? I'd also provide an "indent" argument; if index
    were 0, I'd output indent spaces, otherwise a single space---or
    perhaps the best solution would be to provide a start of line
    and a separator string to the constructor, then:

    (*ostr_ptr)
    << (inLineCount == 0 ? startString : separString)
    << std::setw( 2 ) << (unsigned int)( ch ) ;
    ++ inLineCount ;
    if ( inLineCount == lineLength ) {
    (*ostr_ptr) << endString ;
    inLineCount = 0 ;
    }

    (This supposes that hex and fill were set in the constructor.)
    Given the copying that's going on, I'd also simulate move
    semantics, so that the final destructor could do something like:

    if ( inLineCount != 0 ) {
    (*ostr_ptr) << endString ;
    }

    > }
    > }
    > };



    > int main ( int argn, char ** args ) {
    > assert( argn == 2 );
    > std::ifstream in ( args[1] );
    > std::for_each( std::istreambuf_iterator< char >( in ),
    > std::istreambuf_iterator< char >(),
    > print_hex( std::cout, 25 ) );


    Unless you're doing something relatively generic, with support
    for different separators, etc., this really looks like a case of
    for_each abuse.

    > std::cout << '\n';


    Which results in one new line too many if the number of elements
    just happened to be an exact multiple of the line length.

    About the only real use for this sort of output I've found is
    debugging or experimenting, but there, I use it often enough
    that I've a generic Dump<T> class (and a generic function which
    returns it, for automatic type deduction), so that I can write
    things like:

    std::cout << dump( someObject ) << std::endl ;

    The code that ends up getting called in the << operator is:

    IOSave saver( dest ) ;
    dest.fill( '0' ) ;
    dest.setf( std::ios::hex, std::ios::basefield ) ;
    char const* baseStr = "" ;
    if ( (dest.flags() & std::ios::showbase) != 0 ) {
    baseStr = "0x" ;
    dest.unsetf( std::ios::showbase ) ;
    }
    unsigned char const* const
    end = myObj + sizeof( T ) ;
    for ( unsigned char const* p = myObj ; p != end ; ++ p ) {
    if ( p != myObj ) {
    dest << ' ' ;
    }
    dest << baseStr << std::setw( 2 ) << (unsigned int)( *p ) ;
    }

    (Note that there's extra code there to support my personal
    preference: a "0x" with a small x, even if std::ios::uppercase
    is specified.)

    > }
    > > The issue is the c++ read function doesn't return number of
    > > bytes read... so on the last read into a buffer how do you
    > > know how many characters to print?


    > Have a look at readsome().


    Yes, have a look at it. Read it's specification very carefully.
    Because if you do, you're realize that it is absolutely
    worthless here.

    The function he's looking for is istream::gcount(), which
    returns the number of bytes read by the last unformatted read.
    His basic loop would be:

    while ( input.read( &buffer[ 0 ], buffer.size() ) ) {
    process( buffer.begin(), buffer.end() ) ;
    }
    process( buffer.begin(), buffer.begin() + input.gcount() ) ;

    (But IMHO, istream really isn't appropriate for binary; if I'm
    really working with a binary file, I'll drop down to the system
    API.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 21, 2008
    #7
  8. vizzz

    James Kanze Guest

    On Jun 21, 3:59 am, "Eric Pruneau" <> wrote:
    > "vizzz" <> a écrit dans le message de news:
    > ...


    > > i need to find an hex pattern like 0x650A1010 in a binary file.
    > > i can make a small algorithm that fetch all the file for the match,
    > > but this file is huge, and i'm scared about performances.
    > > Is there any stl method for a fast search?
    > > Andrea


    > Check out boost::regex


    Which requires a forward iterator, and so can't be used on data
    in a file (for which he'll have at best an input iterator).

    Also, if he's only looking for a fixed string, it's likely to be
    significantly slower than some other algorithms.

    > http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html
     
    James Kanze, Jun 21, 2008
    #8
  9. vizzz

    vizzz Guest

    On 21 Giu, 12:26, James Kanze <> wrote:
    > On Jun 21, 3:59 am, "Eric Pruneau" <> wrote:
    >
    > > "vizzz" <> a écrit dans le message de news:
    > > ...
    > > > i need to find an hex pattern like 0x650A1010 in a binary file.
    > > > i can make a small algorithm that fetch all the file for the match,
    > > > but this file is huge, and i'm scared about performances.
    > > > Is there any stl method for a fast search?
    > > > Andrea

    > > Check out  boost::regex

    >
    > Which requires a forward iterator, and so can't be used on data
    > in a file (for which he'll have at best an input iterator).
    >
    > Also, if he's only looking for a fixed string, it's likely to be
    > significantly slower than some other algorithms.


    Maybe explaining my goal can be useful.
    in jpeg2000 files (jp2) there are several boxes made of 4byte length,
    4byte type and then data.
    i must check if box exist by searching somewhere in the file (boxes
    can be anywhere in the whole file) for the box type (ex 0x650A1010).
     
    vizzz, Jun 21, 2008
    #9
  10. vizzz

    Kai-Uwe Bux Guest

    James Kanze wrote:

    > On Jun 21, 2:13 am, Kai-Uwe Bux <> wrote:
    >> Ivan wrote:
    >> > On Jun 20, 1:11 pm, vizzz <> wrote:

    >
    >> > Hmmm... I had a look at this and ran accross a simple
    >> > problem. How do you read a binary file and just echo the
    >> > HEX for byte to the screen.

    [snip]
    >> > The issue is the c++ read function doesn't return number of
    >> > bytes read... so on the last read into a buffer how do you
    >> > know how many characters to print?

    >
    >> Have a look at readsome().

    >
    > Yes, have a look at it. Read it's specification very carefully.
    > Because if you do, you're realize that it is absolutely
    > worthless here.


    I reread it again. I fail to see why it's worthless. Obviously, I am missing
    something.

    > The function he's looking for is istream::gcount(), which
    > returns the number of bytes read by the last unformatted read.
    > His basic loop would be:
    >
    > while ( input.read( &buffer[ 0 ], buffer.size() ) ) {
    > process( buffer.begin(), buffer.end() ) ;
    > }
    > process( buffer.begin(), buffer.begin() + input.gcount() ) ;


    On the other hand, that looks very clean.


    Best

    Kai-Uwe
     
    Kai-Uwe Bux, Jun 21, 2008
    #10
  11. vizzz

    Mirco Wahab Guest

    vizzz wrote:
    > Maybe explaining my goal can be useful.
    > in jpeg2000 files (jp2) there are several boxes made of 4byte length,
    > 4byte type and then data.
    > i must check if box exist by searching somewhere in the file (boxes
    > can be anywhere in the whole file) for the box type (ex 0x650A1010).


    What is the largest file size and on which system
    do you want this to happen?

    The C-memchr is, on modern compilers, very very
    fast (it does 8 byte alignment on the pointer,
    scans 32 or 64 bit at a time by bit ops and so on.)

    You can't simply beat that one. Read the file
    as a block (fread after stat(), ftell/SEEK_END)
    or in chunks and find the first byte (and compare
    the rest).

    Otherwise, you could give memcmp() a shot
    http://www.cplusplus.com/reference/clibrary/cstring/memcmp.html
    maybe its optimized as hard as memchr() is.
    I didn't look into this but know from memchr()
    it would get about double speed compared to the
    naive implementation: if(*p == *q) ...

    But if you can't slurp the whole file at
    once into memory, you have of course to
    deal with the possibility of broken pattern
    across the read block boundary.

    Regards

    M.
     
    Mirco Wahab, Jun 21, 2008
    #11
  12. vizzz

    Kai-Uwe Bux Guest

    James Kanze wrote:

    > On Jun 21, 2:13 am, Kai-Uwe Bux <> wrote:
    >> Ivan wrote:
    >> > On Jun 20, 1:11 pm, vizzz <> wrote:

    >
    >> > Hmmm... I had a look at this and ran accross a simple
    >> > problem. How do you read a binary file and just echo the
    >> > HEX for byte to the screen.

    >
    >> #include <iostream>
    >> #include <ostream>
    >> #include <fstream>
    >> #include <iterator>
    >> #include <iomanip>
    >> #include <algorithm>
    >> #include <cassert>

    >
    >> class print_hex {

    >
    >> std::eek:stream * ostr_ptr;
    >> unsigned int line_length;
    >> unsigned int index;

    >
    >> public:

    >
    >> print_hex ( std::eek:stream & str_ref, unsigned int length )
    >> : ostr_ptr( &str_ref )
    >> , line_length ( length )
    >> , index ( 0 )
    >> {}

    >
    >> void operator() ( unsigned char ch ) {
    >> ++index;
    >> if ( index >= line_length ) {
    >> (*ostr_ptr) << std::hex << std::setw(2) << std::setfill( '0' )
    >> << (unsigned int)(ch) << '\n';
    >> index = 0;
    >> } else {
    >> (*ostr_ptr) << std::hex << std::setw(2) << std::setfill( '0' )
    >> << (unsigned int)(ch) << ' ';

    >
    > Wouldn't it be preferable to set the formatting flags in the
    > constructor?


    Yup.

    > I'd also provide an "indent" argument; if index
    > were 0, I'd output indent spaces, otherwise a single space---or
    > perhaps the best solution would be to provide a start of line
    > and a separator string to the constructor, then:


    Good idea.


    > (*ostr_ptr)
    > << (inLineCount == 0 ? startString : separString)
    > << std::setw( 2 ) << (unsigned int)( ch ) ;
    > ++ inLineCount ;
    > if ( inLineCount == lineLength ) {
    > (*ostr_ptr) << endString ;
    > inLineCount = 0 ;
    > }
    >
    > (This supposes that hex and fill were set in the constructor.)
    > Given the copying that's going on, I'd also simulate move
    > semantics, so that the final destructor could do something like:
    >
    > if ( inLineCount != 0 ) {
    > (*ostr_ptr) << endString ;
    > }
    >
    >> }
    >> }
    >> };

    >
    >
    >> int main ( int argn, char ** args ) {
    >> assert( argn == 2 );
    >> std::ifstream in ( args[1] );
    >> std::for_each( std::istreambuf_iterator< char >( in ),
    >> std::istreambuf_iterator< char >(),
    >> print_hex( std::cout, 25 ) );

    >
    > Unless you're doing something relatively generic, with support
    > for different separators, etc., this really looks like a case of
    > for_each abuse.


    Actually, with regard to for_each, I am growing more and more comfortable
    using it. Of all algorithms, for_each seems the most silly; on the other
    hand it is also the one that has the largest potential for specialized
    versions that take advantage of internal knowledge about the underlying
    sequence. E.g., I can easily imagine a special version for iterators into a
    deque (where for_each would iterate over pages and within each page would
    use a very fast loop using T* where it can skip the test for reaching a
    page end). Similar optimizations should be possible for stream iterators.


    >> std::cout << '\n';

    >
    > Which results in one new line too many if the number of elements
    > just happened to be an exact multiple of the line length.


    You are making up specs :)

    But seriously: you are right, of course.


    > About the only real use for this sort of output I've found is
    > debugging or experimenting, but there, I use it often enough
    > that I've a generic Dump<T> class (and a generic function which
    > returns it, for automatic type deduction), so that I can write
    > things like:
    >
    > std::cout << dump( someObject ) << std::endl ;

    [snip]

    Hm, I never had a use for hex dumping objects. But, maybe I should try that
    out.


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Jun 21, 2008
    #12
  13. vizzz

    James Kanze Guest

    On Jun 21, 8:35 pm, Kai-Uwe Bux <> wrote:
    > James Kanze wrote:
    > > On Jun 21, 2:13 am, Kai-Uwe Bux <> wrote:
    > >> Ivan wrote:
    > >> > On Jun 20, 1:11 pm, vizzz <> wrote:


    > >> > Hmmm... I had a look at this and ran accross a simple
    > >> > problem. How do you read a binary file and just echo the
    > >> > HEX for byte to the screen.

    > [snip]
    > >> > The issue is the c++ read function doesn't return number of
    > >> > bytes read... so on the last read into a buffer how do you
    > >> > know how many characters to print?


    > >> Have a look at readsome().


    > > Yes, have a look at it. Read it's specification very carefully.
    > > Because if you do, you're realize that it is absolutely
    > > worthless here.


    > I reread it again. I fail to see why it's worthless.
    > Obviously, I am missing something.


    It will read a maximum of streambuf::in_avail characters. If
    there are no characters in the buffer, streambuf::in_avail calls
    showmanyc. And by default, all showmanyc does is return 0. An
    implementation of filebuf may do more, if the system supports
    some means of finding out exactly how many characters are in the
    file, but it's not required to. Which means that basically,
    readsome() may stop (returning 0 characters read) as soon as
    there are no more characters in the buffer.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 22, 2008
    #13
  14. vizzz

    James Kanze Guest

    On Jun 22, 12:49 am, Kai-Uwe Bux <> wrote:
    > James Kanze wrote:


    [...]
    > > Unless you're doing something relatively generic, with
    > > support for different separators, etc., this really looks
    > > like a case of for_each abuse.


    > Actually, with regard to for_each, I am growing more and more
    > comfortable using it.


    I'm actually pretty comfortable using it too. Regretfully, we
    seem to be a minority, and the programmers having to maintain my
    code find it "unnatural", and that it hurts readability, to move
    the contents of a loop out into a separate class. Unless that
    class is in some way "reusable", i.e. it represents some more
    general application.

    [...]
    > >> std::cout << '\n';


    > > Which results in one new line too many if the number of
    > > elements just happened to be an exact multiple of the line
    > > length.


    > You are making up specs :)


    You started it:). You decided that he needed newlines in ths
    sequence to begin with. (OK: somebody did say something about
    megabytes somewhere. But maybe he has a very, very wide
    screen.)

    > But seriously: you are right, of course.


    > > About the only real use for this sort of output I've found is
    > > debugging or experimenting, but there, I use it often enough
    > > that I've a generic Dump<T> class (and a generic function which
    > > returns it, for automatic type deduction), so that I can write
    > > things like:


    > > std::cout << dump( someObject ) << std::endl ;


    > [snip]


    > Hm, I never had a use for hex dumping objects. But, maybe I
    > should try that out.


    I didn't really, for the longest time (which is why it isn't at
    my site---I only added it to the library very recently). Even
    now, most of its use is for "experimenting": for trying to guess
    the representation of some type in an undocumented format, for
    example.

    On the other hand, if I ever find time to write up an article on
    how to correctly use iostream, I'll probably include it, because
    it is a good example of how to handle arbitrary formatting for
    any possible type.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 22, 2008
    #14
  15. vizzz

    James Kanze Guest

    On Jun 21, 8:57 pm, Mirco Wahab <> wrote:
    > vizzz wrote:
    > > Maybe explaining my goal can be useful.
    > > in jpeg2000 files (jp2) there are several boxes made of 4byte length,
    > > 4byte type and then data.
    > > i must check if box exist by searching somewhere in the file
    > > (boxes can be anywhere in the whole file) for the box type
    > > (ex 0x650A1010).


    > What is the largest file size and on which system do you want
    > this to happen?


    > The C-memchr is, on modern compilers, very very fast (it does
    > 8 byte alignment on the pointer, scans 32 or 64 bit at a time
    > by bit ops and so on.)


    Maybe. I'm not familiar with the jpeg format, but somehow, I'd
    be a bit surprised if the 4 byte value isn't required to be
    aligned. And if it's aligned, treating the buffer as an array
    of uint32_t, and using std::find, will almost certainly be
    significantly faster than memchr.

    > You can't simply beat that one.


    Actually, you almost always can.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 22, 2008
    #15
  16. vizzz

    vizzz Guest

    On 21 Giu, 20:57, Mirco Wahab <> wrote:
    > vizzz wrote:
    > > Maybe explaining my goal can be useful.
    > > in jpeg2000 files (jp2) there are several boxes made of 4byte length,
    > > 4byte type and then data.
    > > i must check if box exist by searching somewhere in the file (boxes
    > > can be anywhere in the whole file) for the box type (ex 0x650A1010).

    >
    > What is the largest file size and on which system
    > do you want this to happen?


    About 800-900MB on win32 (i'm using VS2008)
     
    vizzz, Jun 22, 2008
    #16
  17. vizzz

    Ivan Guest

    On Jun 21, 3:10 am, James Kanze <> wrote:
    > (But IMHO, istream really isn't appropriate for binary; if I'm
    > really working with a binary file, I'll drop down to the system
    > API.)


    That's exactly what was I thinking, but I wasn't sure if it was just
    my lack of C++ knowledge that made it a pain to read binary data with
    istream.

    Thanks,
    Ivan Novick
    http://www.mycppquiz.com/
     
    Ivan, Jun 23, 2008
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ryan Tan via JavaKB.com

    Search for byte pattern in a binary file.

    Ryan Tan via JavaKB.com, Nov 18, 2004, in forum: Java
    Replies:
    20
    Views:
    1,998
    Thomas Weidenfeller
    Nov 19, 2004
  2. yaipa
    Replies:
    13
    Views:
    747
    yaipa
    Jan 19, 2005
  3. vinitbhu
    Replies:
    4
    Views:
    415
    Mark Space
    Mar 17, 2008
  4. Shashank Khanvilkar

    finding a binary pattern in a file.

    Shashank Khanvilkar, Sep 20, 2005, in forum: Perl Misc
    Replies:
    2
    Views:
    137
    News KF
    Sep 20, 2005
  5. nani
    Replies:
    2
    Views:
    179
    comp.llang.perl.moderated
    Mar 14, 2008
Loading...

Share This Page