Search for a string backwards in a file.

S

SK

Hey folks,

I am searching for a string (say "ABC") backwards in a file.
First I seek to the end.
Then I try to make a check like -

do {
file.clear ();
file.get(c);
file.seekg(-2, std::ios::cur);
} while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
// readback, hypotheticl func, problem here)
Is there some function/trick like peek that can instead read backwards
one character at a time?
Or is there a better way to solve this problem.

Thank you.
 
J

John Harrison

SK said:
Hey folks,

I am searching for a string (say "ABC") backwards in a file.
First I seek to the end.
Then I try to make a check like -

do {
file.clear ();
file.get(c);
file.seekg(-2, std::ios::cur);
} while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
// readback, hypotheticl func, problem here)
Is there some function/trick like peek that can instead read backwards
one character at a time?
Or is there a better way to solve this problem.

Thank you.

Is the file small enough to read the whole file into memory? If so then read
the whole file into a string and search in the string not in the file.
Anything else rapidly gets very complicated and also not terribly efficient.
Files aren't designed to be read backwards, I would prefer to redesign your
file format so that you don't need to read backwards than to actually
attempt this.

And no there is no quick trick to do this.

john
 
K

Karl Heinz Buchegger

SK said:
Hey folks,

I am searching for a string (say "ABC") backwards in a file.
First I seek to the end.
Then I try to make a check like -

do {
file.clear ();
file.get(c);
file.seekg(-2, std::ios::cur);
} while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
// readback, hypotheticl func, problem here)
Is there some function/trick like peek that can instead read backwards
one character at a time?
No.

Or is there a better way to solve this problem.

Seek to the end.

Seek back a number of characters
Read a number of characters into a buffer.
Search that buffer from the end.

If you find the pattern -> fine
If you don't find the pattern: seek back a number
of characters into the buffer, check the buffer
and repeat.
 
J

John Harrison

Karl Heinz Buchegger said:
Seek to the end.

Seek back a number of characters
Read a number of characters into a buffer.
Search that buffer from the end.

If you find the pattern -> fine
If you don't find the pattern: seek back a number
of characters into the buffer, check the buffer
and repeat.

And don't forget to deal with the case where the string you are searching
for straddles one of your seek positions. I.e. if you aren't careful, one
half of the string ends up in one buffer and the other half in another
buffer, so you never find it.

john
 
K

Karl Heinz Buchegger

John said:
And don't forget to deal with the case where the string you are searching
for straddles one of your seek positions. I.e. if you aren't careful, one
half of the string ends up in one buffer and the other half in another
buffer, so you never find it.

:)
It's a tricky thing which could be solved with letting the reads overlap:

+---------------------------+
+---------------------------+

| |
Overlapping area, large enough that the pattern will fit
into it.

But there should be left something to think about for the OP.

The best thing the OP could do is: Avoid that topic at all by redesigning
the file format.
 
S

SK

John Harrison said:
Is the file small enough to read the whole file into memory?

No, it goes in several mbs.
If so then read
the whole file into a string and search in the string not in the file.
Anything else rapidly gets very complicated and also not terribly efficient.
Files aren't designed to be read backwards, I would prefer to redesign your
file format so that you don't need to read backwards than to actually
attempt this.

Wish I could redesign the file format, but can't actually :-(
But what I am parsing are timestamps in a file. I need to know the
first and the last timestamp in a given file.
And no there is no quick trick to do this.

Thanks.
 
J

John Harrison

SK said:
"John Harrison" <[email protected]> wrote in message

No, it goes in several mbs.
If so then read

Wish I could redesign the file format, but can't actually :-(
But what I am parsing are timestamps in a file. I need to know the
first and the last timestamp in a given file.

Then Karl's suggestion is the right one, read blocks of data from the end of
a file, and search backwards though each block, repeat if you don't find
anything, and watch out for the timestamp falling half way between two
blocks. In practise this means that the blocks have to overlap sufficiently
so that the timestamp will always be contained entirely within one block.

john
 
S

Siemel Naran

SK said:
I am searching for a string (say "ABC") backwards in a file.
First I seek to the end.
Then I try to make a check like -

do {
file.clear ();
file.get(c);
file.seekg(-2, std::ios::cur);
} while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
// readback, hypotheticl func, problem here)
Is there some function/trick like peek that can instead read backwards
one character at a time?
Or is there a better way to solve this problem.

This approach seems fine to me. Others suggested reading the file into
blocks of strings. But I'm not sure if there's a need to do this as the
default fstreams already read the file into blocks of strings (though the
standard requires them to, but they do for file streams). As for how they
normally handle reading from the end of the file, I don't know for sure, but
I think it's like this: when the user reads the last character, then read
the last N chars into memory where N is the size of a block, position the
streambuf's get pointer to the last character, return the character at the
get pointer.
file.get(c);
file.seekg(-2, std::ios::cur);

The above might be more clearly expressed with

c = file.peek();
file.seekg(-1, std::ios::cur);

Also, since you're looking for a string of length 3, you could save the last
3 chars read into variables like c1, c2, c3. Or even an array, vector, etc.
Then even if you encounter an 'A' you could ignore it if the next characters
are not 'BC'.

Finally, to get improved performance you could work directly with the
underlying streambuf. Call file.rdbuf() to get a pointer to the streambuf.
It will really be a filebuf, but pretend you don't know this. To position
the stream to the beginning or end use pubseekpos. To position the stream
relative to the current position use pubseekoff. To get the current
character as in peek use sgetc.

If this is not fast enough, you could even write your own class derived from
filebuf that implements a sbumpcminus() function that gets the current
character then decrements the get pointer. Use setg to set the get pointers
and range of the get area. But this is getting really advanced.

peek at the current character
 
J

Jorge Rivera

SK said:
Hey folks,

I am searching for a string (say "ABC") backwards in a file.
First I seek to the end.
Then I try to make a check like -

do {
file.clear ();
file.get(c);
file.seekg(-2, std::ios::cur);
} while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
// readback, hypotheticl func, problem here)
Is there some function/trick like peek that can instead read backwards
one character at a time?
Or is there a better way to solve this problem.

Thank you.

You can do something like this.

std::string toFind = "ABC";
std::string reversed;
std::copy(toFind.rbegin(), toFind.rend(),std::front_inserter(reversed));

std::ifstream ifs("MyFile.txt", std::ios::in);
if(ifs)
{
std::string fileContent = std::string(
std::istreambuf_iterator<char>(ifs),
std::istreambuf_iterator<char>());

std::string::size_type found_position = fileContent .rfind(reversed);
if(std::string::npos != found_position)
std::cout<<"Requested string found at position:
"<<found_positions<<std::endl;
}

(Forgive any typos...)

JLR
 
O

Old Wolf

it goes in several mbs.

Wish I could redesign the file format, but can't actually :-(
But what I am parsing are timestamps in a file. I need to know the
first and the last timestamp in a given file.

If the file is in ascii format (ie. it is made up of lines), then
you can read forwards a line at a time (it won't take long even to
parse several megs that way).
If you really must start at the end, then start, say, 50k from the
end and find the first newline, and then read forwards from there
(assuming your file does not contain lines longer than 50k).
 
J

Julie

Karl Heinz Buchegger wrote:
The best thing the OP could do is: Avoid that topic at all by redesigning
the file format.

This comment really has no value to the discussion because it assumes that: the
OP has the liberty to arbitrarily redesign the file format, that the
requirements for the existing file format are secondary to the (code)
implementation details, and that reading from the end of the file isn't a
suitable solution to the problem as expressed by the OP.
 
K

Karl Heinz Buchegger

Julie said:
Karl Heinz Buchegger wrote:


This comment really has no value to the discussion because it assumes that: the
OP has the liberty to arbitrarily redesign the file format, that the
requirements for the existing file format are secondary to the (code)
implementation details, and that reading from the end of the file isn't a
suitable solution to the problem as expressed by the OP.

On the other hand it happens quite frequently, that posters post
questions about how to do something and it turns out later that
a simple change in the assignment (if possible) is the far better
solution.
 
J

Julie

Karl said:
On the other hand it happens quite frequently, that posters post
questions about how to do something and it turns out later that
a simple change in the assignment (if possible) is the far better
solution.

Perhaps you should phrase your original comment as a question, rather than an
assertion.
 
K

Karl Heinz Buchegger

Hardy said:
why just reverse the string and search it in the normal way?:)

Because it isn't very efficient to start searching a 10 MB file
at the beginning, when all you are interested in, is the *last*
occourence of the search string and you know that this
search string is located somewhere in the last 100 Bytes or so.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top