Search for a string backwards in a file.

Discussion in 'C++' started by SK, Jun 15, 2004.

  1. SK

    SK Guest

    Hey folks,

    I am searching for a string (say "ABC") backwards in a file.
    First I seek to the end.
    Then I try to make a check like -

    do {
    file.clear ();
    file.get(c);
    file.seekg(-2, std::ios::cur);
    } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    // readback, hypotheticl func, problem here)
    Is there some function/trick like peek that can instead read backwards
    one character at a time?
    Or is there a better way to solve this problem.

    Thank you.
    SK, Jun 15, 2004
    #1
    1. Advertising

  2. "SK" <> wrote in message
    news:...
    > Hey folks,
    >
    > I am searching for a string (say "ABC") backwards in a file.
    > First I seek to the end.
    > Then I try to make a check like -
    >
    > do {
    > file.clear ();
    > file.get(c);
    > file.seekg(-2, std::ios::cur);
    > } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    > // readback, hypotheticl func, problem here)
    > Is there some function/trick like peek that can instead read backwards
    > one character at a time?
    > Or is there a better way to solve this problem.
    >
    > Thank you.


    Is the file small enough to read the whole file into memory? If so then read
    the whole file into a string and search in the string not in the file.
    Anything else rapidly gets very complicated and also not terribly efficient.
    Files aren't designed to be read backwards, I would prefer to redesign your
    file format so that you don't need to read backwards than to actually
    attempt this.

    And no there is no quick trick to do this.

    john
    John Harrison, Jun 15, 2004
    #2
    1. Advertising

  3. SK wrote:
    >
    > Hey folks,
    >
    > I am searching for a string (say "ABC") backwards in a file.
    > First I seek to the end.
    > Then I try to make a check like -
    >
    > do {
    > file.clear ();
    > file.get(c);
    > file.seekg(-2, std::ios::cur);
    > } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    > // readback, hypotheticl func, problem here)
    > Is there some function/trick like peek that can instead read backwards
    > one character at a time?


    No.

    > Or is there a better way to solve this problem.


    Seek to the end.

    Seek back a number of characters
    Read a number of characters into a buffer.
    Search that buffer from the end.

    If you find the pattern -> fine
    If you don't find the pattern: seek back a number
    of characters into the buffer, check the buffer
    and repeat.


    --
    Karl Heinz Buchegger
    Karl Heinz Buchegger, Jun 15, 2004
    #3
  4. "Karl Heinz Buchegger" <> wrote in message
    news:...
    > SK wrote:
    > >
    > > Hey folks,
    > >
    > > I am searching for a string (say "ABC") backwards in a file.
    > > First I seek to the end.
    > > Then I try to make a check like -
    > >
    > > do {
    > > file.clear ();
    > > file.get(c);
    > > file.seekg(-2, std::ios::cur);
    > > } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    > > // readback, hypotheticl func, problem here)
    > > Is there some function/trick like peek that can instead read backwards
    > > one character at a time?

    >
    > No.
    >
    > > Or is there a better way to solve this problem.

    >
    > Seek to the end.
    >
    > Seek back a number of characters
    > Read a number of characters into a buffer.
    > Search that buffer from the end.
    >
    > If you find the pattern -> fine
    > If you don't find the pattern: seek back a number
    > of characters into the buffer, check the buffer
    > and repeat.
    >


    And don't forget to deal with the case where the string you are searching
    for straddles one of your seek positions. I.e. if you aren't careful, one
    half of the string ends up in one buffer and the other half in another
    buffer, so you never find it.

    john
    John Harrison, Jun 15, 2004
    #4
  5. John Harrison wrote:
    >
    > "Karl Heinz Buchegger" <> wrote in message
    > news:...
    > > SK wrote:
    > > >
    > > > Hey folks,
    > > >
    > > > I am searching for a string (say "ABC") backwards in a file.
    > > > First I seek to the end.
    > > > Then I try to make a check like -
    > > >
    > > > do {
    > > > file.clear ();
    > > > file.get(c);
    > > > file.seekg(-2, std::ios::cur);
    > > > } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    > > > // readback, hypotheticl func, problem here)
    > > > Is there some function/trick like peek that can instead read backwards
    > > > one character at a time?

    > >
    > > No.
    > >
    > > > Or is there a better way to solve this problem.

    > >
    > > Seek to the end.
    > >
    > > Seek back a number of characters
    > > Read a number of characters into a buffer.
    > > Search that buffer from the end.
    > >
    > > If you find the pattern -> fine
    > > If you don't find the pattern: seek back a number
    > > of characters into the buffer, check the buffer
    > > and repeat.
    > >

    >
    > And don't forget to deal with the case where the string you are searching
    > for straddles one of your seek positions. I.e. if you aren't careful, one
    > half of the string ends up in one buffer and the other half in another
    > buffer, so you never find it.


    :)
    It's a tricky thing which could be solved with letting the reads overlap:

    +---------------------------+
    +---------------------------+

    | |
    Overlapping area, large enough that the pattern will fit
    into it.

    But there should be left something to think about for the OP.

    The best thing the OP could do is: Avoid that topic at all by redesigning
    the file format.

    --
    Karl Heinz Buchegger
    Karl Heinz Buchegger, Jun 15, 2004
    #5
  6. How about reversing "ABC" and doing a plain search?

    Henrik Vallgren

    "SK" <> skrev i meddelandet
    news:...
    > Hey folks,
    >
    > I am searching for a string (say "ABC") backwards in a file.
    > First I seek to the end.
    > Then I try to make a check like -
    >
    > do {
    > file.clear ();
    > file.get(c);
    > file.seekg(-2, std::ios::cur);
    > } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    > // readback, hypotheticl func, problem here)
    > Is there some function/trick like peek that can instead read backwards
    > one character at a time?
    > Or is there a better way to solve this problem.
    >
    > Thank you.
    Henrik Vallgren, Jun 15, 2004
    #6
  7. SK

    SK Guest

    "John Harrison" <> wrote in message news:<>...
    > "SK" <> wrote in message
    > news:...
    > > Hey folks,
    > >
    > > I am searching for a string (say "ABC") backwards in a file.
    > > First I seek to the end.
    > > Then I try to make a check like -
    > >
    > > do {
    > > file.clear ();
    > > file.get(c);
    > > file.seekg(-2, std::ios::cur);
    > > } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    > > // readback, hypotheticl func, problem here)
    > > Is there some function/trick like peek that can instead read backwards
    > > one character at a time?
    > > Or is there a better way to solve this problem.
    > >
    > > Thank you.

    >
    > Is the file small enough to read the whole file into memory?


    No, it goes in several mbs.
    If so then read
    > the whole file into a string and search in the string not in the file.
    > Anything else rapidly gets very complicated and also not terribly efficient.
    > Files aren't designed to be read backwards, I would prefer to redesign your
    > file format so that you don't need to read backwards than to actually
    > attempt this.
    >


    Wish I could redesign the file format, but can't actually :-(
    But what I am parsing are timestamps in a file. I need to know the
    first and the last timestamp in a given file.

    > And no there is no quick trick to do this.


    Thanks.
    SK, Jun 15, 2004
    #7
  8. "SK" <> wrote in message
    news:...
    > "John Harrison" <> wrote in message

    news:<>...
    > > "SK" <> wrote in message
    > > news:...
    > > > Hey folks,
    > > >
    > > > I am searching for a string (say "ABC") backwards in a file.
    > > > First I seek to the end.
    > > > Then I try to make a check like -
    > > >
    > > > do {
    > > > file.clear ();
    > > > file.get(c);
    > > > file.seekg(-2, std::ios::cur);
    > > > } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    > > > // readback, hypotheticl func, problem here)
    > > > Is there some function/trick like peek that can instead read backwards
    > > > one character at a time?
    > > > Or is there a better way to solve this problem.
    > > >
    > > > Thank you.

    > >
    > > Is the file small enough to read the whole file into memory?

    >
    > No, it goes in several mbs.
    > If so then read
    > > the whole file into a string and search in the string not in the file.
    > > Anything else rapidly gets very complicated and also not terribly

    efficient.
    > > Files aren't designed to be read backwards, I would prefer to redesign

    your
    > > file format so that you don't need to read backwards than to actually
    > > attempt this.
    > >

    >
    > Wish I could redesign the file format, but can't actually :-(
    > But what I am parsing are timestamps in a file. I need to know the
    > first and the last timestamp in a given file.
    >


    Then Karl's suggestion is the right one, read blocks of data from the end of
    a file, and search backwards though each block, repeat if you don't find
    anything, and watch out for the timestamp falling half way between two
    blocks. In practise this means that the blocks have to overlap sufficiently
    so that the timestamp will always be contained entirely within one block.

    john
    John Harrison, Jun 15, 2004
    #8
  9. SK

    Siemel Naran Guest

    "SK" <> wrote in message

    > I am searching for a string (say "ABC") backwards in a file.
    > First I seek to the end.
    > Then I try to make a check like -
    >
    > do {
    > file.clear ();
    > file.get(c);
    > file.seekg(-2, std::ios::cur);
    > } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    > // readback, hypotheticl func, problem here)
    > Is there some function/trick like peek that can instead read backwards
    > one character at a time?
    > Or is there a better way to solve this problem.


    This approach seems fine to me. Others suggested reading the file into
    blocks of strings. But I'm not sure if there's a need to do this as the
    default fstreams already read the file into blocks of strings (though the
    standard requires them to, but they do for file streams). As for how they
    normally handle reading from the end of the file, I don't know for sure, but
    I think it's like this: when the user reads the last character, then read
    the last N chars into memory where N is the size of a block, position the
    streambuf's get pointer to the last character, return the character at the
    get pointer.

    > file.get(c);
    > file.seekg(-2, std::ios::cur);


    The above might be more clearly expressed with

    c = file.peek();
    file.seekg(-1, std::ios::cur);

    Also, since you're looking for a string of length 3, you could save the last
    3 chars read into variables like c1, c2, c3. Or even an array, vector, etc.
    Then even if you encounter an 'A' you could ignore it if the next characters
    are not 'BC'.

    Finally, to get improved performance you could work directly with the
    underlying streambuf. Call file.rdbuf() to get a pointer to the streambuf.
    It will really be a filebuf, but pretend you don't know this. To position
    the stream to the beginning or end use pubseekpos. To position the stream
    relative to the current position use pubseekoff. To get the current
    character as in peek use sgetc.

    If this is not fast enough, you could even write your own class derived from
    filebuf that implements a sbumpcminus() function that gets the current
    character then decrements the get pointer. Use setg to set the get pointers
    and range of the get area. But this is getting really advanced.

    peek at the current character
    Siemel Naran, Jun 15, 2004
    #9
  10. SK

    Jorge Rivera Guest

    SK wrote:
    > Hey folks,
    >
    > I am searching for a string (say "ABC") backwards in a file.
    > First I seek to the end.
    > Then I try to make a check like -
    >
    > do {
    > file.clear ();
    > file.get(c);
    > file.seekg(-2, std::ios::cur);
    > } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    > // readback, hypotheticl func, problem here)
    > Is there some function/trick like peek that can instead read backwards
    > one character at a time?
    > Or is there a better way to solve this problem.
    >
    > Thank you.


    You can do something like this.

    std::string toFind = "ABC";
    std::string reversed;
    std::copy(toFind.rbegin(), toFind.rend(),std::front_inserter(reversed));

    std::ifstream ifs("MyFile.txt", std::ios::in);
    if(ifs)
    {
    std::string fileContent = std::string(
    std::istreambuf_iterator<char>(ifs),
    std::istreambuf_iterator<char>());

    std::string::size_type found_position = fileContent .rfind(reversed);
    if(std::string::npos != found_position)
    std::cout<<"Requested string found at position:
    "<<found_positions<<std::endl;
    }

    (Forgive any typos...)

    JLR
    Jorge Rivera, Jun 15, 2004
    #10
  11. SK

    Old Wolf Guest

    (SK) wrote:
    > > > I am searching for a string (say "ABC") backwards in a file.
    > > > First I seek to the end.
    > > > Then I try to make a check like -
    > > > Is there some function/trick like peek that can instead read backwards
    > > > one character at a time?
    > > > Or is there a better way to solve this problem.

    >
    > it goes in several mbs.
    >
    > Wish I could redesign the file format, but can't actually :-(
    > But what I am parsing are timestamps in a file. I need to know the
    > first and the last timestamp in a given file.


    If the file is in ascii format (ie. it is made up of lines), then
    you can read forwards a line at a time (it won't take long even to
    parse several megs that way).
    If you really must start at the end, then start, say, 50k from the
    end and find the first newline, and then read forwards from there
    (assuming your file does not contain lines longer than 50k).
    Old Wolf, Jun 15, 2004
    #11
  12. SK

    Julie Guest

    Karl Heinz Buchegger wrote:
    <snip>
    > The best thing the OP could do is: Avoid that topic at all by redesigning
    > the file format.


    This comment really has no value to the discussion because it assumes that: the
    OP has the liberty to arbitrarily redesign the file format, that the
    requirements for the existing file format are secondary to the (code)
    implementation details, and that reading from the end of the file isn't a
    suitable solution to the problem as expressed by the OP.
    Julie, Jun 15, 2004
    #12
  13. Julie wrote:
    >
    > Karl Heinz Buchegger wrote:
    > <snip>
    > > The best thing the OP could do is: Avoid that topic at all by redesigning
    > > the file format.

    >
    > This comment really has no value to the discussion because it assumes that: the
    > OP has the liberty to arbitrarily redesign the file format, that the
    > requirements for the existing file format are secondary to the (code)
    > implementation details, and that reading from the end of the file isn't a
    > suitable solution to the problem as expressed by the OP.


    On the other hand it happens quite frequently, that posters post
    questions about how to do something and it turns out later that
    a simple change in the assignment (if possible) is the far better
    solution.

    --
    Karl Heinz Buchegger
    Karl Heinz Buchegger, Jun 16, 2004
    #13
  14. SK

    Julie Guest

    Karl Heinz Buchegger wrote:
    >
    > Julie wrote:
    > >
    > > Karl Heinz Buchegger wrote:
    > > <snip>
    > > > The best thing the OP could do is: Avoid that topic at all by redesigning
    > > > the file format.

    > >
    > > This comment really has no value to the discussion because it assumes that: the
    > > OP has the liberty to arbitrarily redesign the file format, that the
    > > requirements for the existing file format are secondary to the (code)
    > > implementation details, and that reading from the end of the file isn't a
    > > suitable solution to the problem as expressed by the OP.

    >
    > On the other hand it happens quite frequently, that posters post
    > questions about how to do something and it turns out later that
    > a simple change in the assignment (if possible) is the far better
    > solution.


    Perhaps you should phrase your original comment as a question, rather than an
    assertion.
    Julie, Jun 16, 2004
    #14
  15. SK

    Hardy Guest

    why just reverse the string and search it in the normal way?:)

    "John Harrison" <> дÈëÓʼþ
    news:...
    >
    > "SK" <> wrote in message
    > news:...
    > > Hey folks,
    > >
    > > I am searching for a string (say "ABC") backwards in a file.
    > > First I seek to the end.
    > > Then I try to make a check like -
    > >
    > > do {
    > > file.clear ();
    > > file.get(c);
    > > file.seekg(-2, std::ios::cur);
    > > } while (c != 'A' && c.readback() != 'B' && c.readback != 'C')
    > > // readback, hypotheticl func, problem here)
    > > Is there some function/trick like peek that can instead read backwards
    > > one character at a time?
    > > Or is there a better way to solve this problem.
    > >
    > > Thank you.

    >
    > Is the file small enough to read the whole file into memory? If so then

    read
    > the whole file into a string and search in the string not in the file.
    > Anything else rapidly gets very complicated and also not terribly

    efficient.
    > Files aren't designed to be read backwards, I would prefer to redesign

    your
    > file format so that you don't need to read backwards than to actually
    > attempt this.
    >
    > And no there is no quick trick to do this.
    >
    > john
    >
    >
    Hardy, Jun 17, 2004
    #15
  16. Hardy wrote:
    >
    > why just reverse the string and search it in the normal way?:)
    >


    Because it isn't very efficient to start searching a 10 MB file
    at the beginning, when all you are interested in, is the *last*
    occourence of the search string and you know that this
    search string is located somewhere in the last 100 Bytes or so.


    --
    Karl Heinz Buchegger
    Karl Heinz Buchegger, Jun 17, 2004
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sharp

    Reading a string backwards

    Sharp, Mar 15, 2005, in forum: Java
    Replies:
    2
    Views:
    4,786
    Yamin
    Mar 15, 2005
  2. Peter Merz
    Replies:
    1
    Views:
    572
  3. Matt DeFoor

    reading file backwards and parsing

    Matt DeFoor, Apr 12, 2004, in forum: C Programming
    Replies:
    11
    Views:
    4,020
    Barry Schwarz
    Apr 14, 2004
  4. Toby DiPasquale

    String search backwards?

    Toby DiPasquale, Jan 24, 2006, in forum: Ruby
    Replies:
    1
    Views:
    114
    Matthew Moss
    Jan 24, 2006
  5. mud_saisem

    Search a Large files backwards

    mud_saisem, Mar 2, 2010, in forum: Perl Misc
    Replies:
    7
    Views:
    134
    Uri Guttman
    Mar 2, 2010
Loading...

Share This Page