searching sequence in file

I

Igor R.

Hello,

I want to find the latest occurance of some sequence in a binary file.
Intuitively, it seems that the shortest way is:

ifstream file("myfile", std::ios::binary|std::ios::in);
std::string delim("\r\n\r\n"); // sequence to search
typedef std::istreambuf_iterator<char> iterator;
iterator begin(file), end;
iterator pos = std::find_end(begin, end, delim.begin(), delim.end());

However, it doesn't work, because std::find_end requires
ForwardIterator, while istreambuf_iterator is an InputIterator. On
MSVC 9.0 So the above code just crashes!

So, I've got 2 questions:

1) Why std::find_end doesn't enforce its requirements at compile time?
What does standard say about such a behavior?
2) How to make InputIterator out of istreambuf_iterator?

Thanks!
 
L

Leandro Melo

Hello,

I want to find the latest occurance of some sequence in a binary file.
Intuitively, it seems that the shortest way is:

ifstream file("myfile", std::ios::binary|std::ios::in);
std::string delim("\r\n\r\n"); // sequence to search
typedef std::istreambuf_iterator<char> iterator;
iterator begin(file), end;
iterator pos = std::find_end(begin, end, delim.begin(), delim.end());

However, it doesn't work, because std::find_end requires
ForwardIterator, while istreambuf_iterator is an InputIterator. On
MSVC 9.0 So the above code just crashes!

So, I've got 2 questions:

1) Why std::find_end doesn't enforce its requirements at compile time?
What does standard say about such a behavior?

Because currently there's no language feature in C++ to accomplish
that. What you want is going to be possible in the next standard (C+
+0x) throug the use of *concepts*. For now, the best one can do is use
an auxilary utility/library such as Boost Concept Check.
2) How to make InputIterator out of istreambuf_iterator?

I can't think of a way right now. In fact, don't know if this is
possible directly. IIRC, the default IOStreams buffers consume the
content once they pass over it. That's why they're input iterators.
Perhaps, you could write your own buffer iterator that stores/caches
values in order to allow multi-passes.

A naive strategy would be to read the file backwards using traditional
mechanisms (tellg, seekg, etc).
 
Z

ZikO

Igor said:
Hello,

I want to find the latest occurance of some sequence in a binary file.
Intuitively, it seems that the shortest way is:
If you dont bother interators and are looking for answer I would do that
this way:

// code
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main(int argc, char *argv[])
{
// text.txt contains:
//
111001001010001111010111010111110001011011000101000111010111001010011110
ifstream in("test.txt");
string s;
getline(in, s);
streampos sp;
sp = s.find_last_of("10100");
cout << sp << endl;
return 0;
}
// end of code
 
I

Igor R.

Because currently there's no language feature in C++ to accomplish that.

One can do this by putting compile-time assert on
std::iterator_traits<T>::iterator_category. Or even less restricting:
probably, it's possible to put the above assert only for the types
that iterator_traits template is specialized for them (using some TMP
techniques).
Anyway, it's very disappointing when an algorithm that supposed to be
generic suddenly crashes without any obvious reason. It took me some
time to realize what happened. The algorithm may work less efficiently
on less capable iterator, but it shouldn't crash! Either it works or
it doesn't compile. I thought such a type-safety is one of the key-
points of stl generic algorithms (and of the generic programming as
whole).
I can't think of a way right now. In fact, don't know if this is possible directly.

BTW, I found such a wrapper in Boost.Spirit:
http://www.boost.org/doc/libs/1_38_0/libs/spirit/classic/doc/multi_pass.html


Thank you,

Igor'.
 
I

Igor R.

If you dont bother interators and are looking for answer I would do that
this way:


Well, the file I working with is a binary file which size can be few
Gb :)
 
Z

ZikO

Igor said:
Well, the file I working with is a binary file which size can be few
Gb :)
Right :p

Silly me, I should have noticed you are opening file in binary mode.
Apologies.
 
J

James Kanze

I want to find the latest occurance of some sequence in a binary file.
Intuitively, it seems that the shortest way is:
ifstream file("myfile", std::ios::binary|std::ios::in);
std::string delim("\r\n\r\n"); // sequence to search
typedef std::istreambuf_iterator<char> iterator;
iterator begin(file), end;
iterator pos = std::find_end(begin, end, delim.begin(), delim.end());
However, it doesn't work, because std::find_end requires
ForwardIterator, while istreambuf_iterator is an InputIterator. On
MSVC 9.0 So the above code just crashes!
So, I've got 2 questions:
1) Why std::find_end doesn't enforce its requirements at
compile time? What does standard say about such a behavior?

It's undefined behavior. G++ (4.1.0) and Sun CC with the
STLport generate an error at compile time, VC++ and Sun CC with
the default library don't.

The next version of the standard will require the error.
2) How to make InputIterator out of istreambuf_iterator?

It can't be done. In general, you can't change the type of an
iterator that's already been defined.

You could write your own streambuf iterator which was a forward
iterator, but it would be horribly slow---basically, you'd have
to do a tellg after each access, and a seekg before each access.

In practice, I'm not sure what you're trying to do. You're
looking for the *last* occurance, which means that you'll have
to read to end of file, regardless of the algorithm (otherwise,
there might be a later occurance that you haven't seen). Which
means that you'll have lost the position in the file where you
found the match, unless you explicitly read the position. Maybe
KMP searching (a simple finite automat), saving the position
each time you find a match. Then reset the error once you've
found end of file, and seek to the last position saved. (Note
that you'll probably need some sort of modifications to KMP, since
in cases like "\r\n\r\n\r\n", you have to match the end of the
sequence, even though you've found a match two characters
ahead.)
 
J

James Kanze

One can do this by putting compile-time assert on
std::iterator_traits<T>::iterator_category. Or even less
restricting: probably, it's possible to put the above assert
only for the types that iterator_traits template is
specialized for them (using some TMP techniques).
Anyway, it's very disappointing when an algorithm that
supposed to be generic suddenly crashes without any obvious
reason. It took me some time to realize what happened. The
algorithm may work less efficiently on less capable iterator,
but it shouldn't crash! Either it works or it doesn't compile.
I thought such a type-safety is one of the key- points of stl
generic algorithms (and of the generic programming as whole).

It's not a true forward iterator, and I'm not at all sure that
it will work with std::find_end. Boost.Spirit uses it when
parsing, detecting (from the grammar) when backtracking will be
necessary, and storing anything it reads in a look-aside buffer
in such cases. In your case, backtracking will almost always be
necessary, and depending on exactly how the Boost.Spirit
iterator works, it may not work in your case, or it may end up
storing all of the file in memory.
 
L

Leandro Melo

One can do this by putting compile-time assert on
std::iterator_traits<T>::iterator_category. Or even less restricting:
probably, it's possible to put the above assert only for the types
that iterator_traits template is specialized for them (using some TMP
techniques).

Yes. In the case of iterators, the existence of the iterator_category
makes it easier. Could be coded in a manner similar to Boost Static
Assert. Still, concepts provide a more *natural* way to do that using
direct language support.

Anyway, it's very disappointing when an algorithm that supposed to be
generic suddenly crashes without any obvious reason. It took me some
time to realize what happened. The algorithm may work less efficiently
on less capable iterator, but it shouldn't crash! Either it works or
it doesn't compile. I thought such a type-safety is one of the key-
points of stl generic algorithms (and of the generic programming as
whole).

Your case is really disappointing. My guess is that you run into a
very particular problem regarding Input Iterators and Forward
Iterators.

Probably, most of the times you try to use a type that doesnt' meet
the requirements in the STL you'll get a compile error. For instance,
during the second pass of two-phase lookup. However, Input Iterators
and Forward Iterators mostly differ by semantics/behavior and not by
syntax (check their valid expressions). So for the above algorithm the
compiler isn't smart enought to detect the problem and an extra
utility code needs to be used. Apparently, some STL implementations
care for that; Others don't.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top