Read file bottom up

S

Samant.Trupti

I need to read a huge file. The content that I want to read are
somewhere at the bottom. Can anyone please let me know a way to read
the file bottom up?
Thanks
Trupti
 
J

Jeff Schwab

I need to read a huge file. The content that I want to read are
somewhere at the bottom. Can anyone please let me know a way to read
the file bottom up?

Take a look at the istream::seekg method. It will let you position the
file-pointer at an arbitrary place in the input stream.
 
S

sean_in_raleigh

I need to read a huge file. The content that I want to read are
somewhere at the bottom. Can anyone please let me know a way to read
the file bottom up?
Thanks
Trupti

$ tail my_huge_file.txt

If that's not suitable for your environment, you
should provide some details on that.

Sean
 
P

peter koch

Take a look at the istream::seekg method.  It will let you position the
file-pointer at an arbitrary place in the input stream.

No it will not. seek will only (portably) position you somewhere where
you have been before.

/Peter
 
P

peter koch

 $ tail my_huge_file.txt

If that's not suitable for your environment, you
should provide some details on that.

Sean

tail is not appropriate as it is not portable, not C++ and not generic
(tail is for text-files only).

/Peter
 
J

Jeff Schwab

peter said:
No it will not. seek will only (portably) position you somewhere where
you have been before.

According to what? I'm looking at the draft standard for 0x, and don't
see anything like that. The seekg method can fail, but I don't see
anything magical about "where you have been before."
 
S

Sebastian \psy\ Messerschmidt

peter said:
No it will not. seek will only (portably) position you somewhere where
you have been before.

Where do you get this information from?
What about:

ifstream is;
is.open ("test.txt", ios::binary );

// get length of file:
is.seekg (0, ios::end);
int length = is.tellg();
is.seekg (0, ios::beg);


The only thing that might support your answer is that it might be left
to the implementation to step through the whole file in order to get to
the end. In UNIX however filesystems do store the files length, so
retrieving the end can be done almost certain in O(1) ...

So do you mean by "not portably" that some implementations might step
through the file to find the end (what you might refer to with "position
you somewhere where you have been before.")


cheers psy
 
J

James Kanze

According to what? I'm looking at the draft standard for 0x,
and don't see anything like that.

You're not looking very well, then.
The seekg method can fail, but I don't see anything magical
about "where you have been before."

All seekg does is call filebuf::seekpos or filebuf::seekoff.
(The restriction doesn't apply to stringbuf, and of course, any
other streambuf class can make any restrictions it pleases.) The
documentation for filebuf::seekpos says directly: "If sp has not
been obtained by a previous successful call to one of the
positioning functions (seekoff or seekpos) on the same file the
effects are undefined." (It's hard to be clearer.) The
documentation of filebuf::seekoff says "[...]seek to the new
position: if width > 0, call std::fseek(file, width * off,
whence), otherwise call std::fseek(file, 0, whence)." So we're
sent to the C standard, and of course, C has never allowed
arbitrary positionning in a file opened in text mode. (FWIW:
the width in the preceding text is derived from the imbued
locale; if you open a file in binary mode AND imbue the "C"
locale, which will always result in a width equal 1, you can
seek to an arbitrary position. At least in theory.)
 
J

James Kanze

peter koch schrieb:
Where do you get this information from?

ISO 14882. At least, that's where I got it from. (There are
some exceptions, if you open the file in binary and imbue the
"C" locale.)
What about:
ifstream is;
is.open ("test.txt", ios::binary );
// get length of file:
is.seekg (0, ios::end);
int length = is.tellg();
is.seekg (0, ios::beg);

What about it? It might work on some systems, but it certainly
isn't portable. It won't work on the systems I usually use,
Linux and Solaris, but only because the length of a file won't
fit in an int. More generally:

-- istream::tellg() returns an streampos, which isn't
guaranteed to be convertible to an integral type,

-- even if streampos is convertible to an integral type,
there's no guarantee that the numeric value of this type has
any real signification---an implementation *could* break it
up into fields, with a "sector number" in the low order
bits, and the offset in the high order bits, for example,
and

-- even if it represents some numerical offset within the
system, it probably isn't guaranteed to fit in an int.

And of course, unless you've imbued the "C" locale, all bets are
off anyway.

Finally, this isn't really relevant anyway, since the question
was to seek to an arbitrary location, which isn't necessarily
the end or the beginning.
The only thing that might support your answer is that it might be left
to the implementation to step through the whole file in order to get to
the end.

The only thing that supports his statement is that it is what
the language standard says.
In UNIX however filesystems do store the files length, so
retrieving the end can be done almost certain in O(1) ...
So do you mean by "not portably" that some implementations
might step through the file to find the end (what you might
refer to with "position you somewhere where you have been
before.")

He means that the standard doesn't guarantee it.

In practice, for what the original poster was asking: if he's
willing to restrict portability to Unix and Windows, he can
probably read the last 500 some bytes by using

is.seekg( -512, ios::end ) ;
// Start reading...

He should be aware, however, that the seek will not guarantee
that he is placed 512 bytes from the end---under Unix, he will
be, but under Windows, he will be placed somewhere between 256
bytes and 512 bytes from the end, depending on the data (unless
he opens the file in binary mode).
 
J

Jeff Schwab

James said:
You're not looking very well, then.

Have I done something to make you dislike me personally? I'm getting a
pretty strong vibe that you hate my guts, from this and other posts.
Did I do something bad to you?

All seekg does is call filebuf::seekpos or filebuf::seekoff.
(The restriction doesn't apply to stringbuf, and of course, any
other streambuf class can make any restrictions it pleases.) The
documentation for filebuf::seekpos says directly: "If sp has not
been obtained by a previous successful call to one of the
positioning functions (seekoff or seekpos) on the same file the
effects are undefined."

Thank you, I see that now (in 27.8.1.6)
(It's hard to be clearer.) The
documentation of filebuf::seekoff says "[...]seek to the new
position: if width > 0, call std::fseek(file, width * off,
whence), otherwise call std::fseek(file, 0, whence)." So we're
sent to the C standard, and of course, C has never allowed
arbitrary positionning in a file opened in text mode. (FWIW:
the width in the preceding text is derived from the imbued
locale; if you open a file in binary mode AND imbue the "C"
locale, which will always result in a width equal 1, you can
seek to an arbitrary position. At least in theory.)
 
S

Sebastian \psy\ Messerschmidt

James said:
ISO 14882. At least, that's where I got it from. (There are
some exceptions, if you open the file in binary and imbue the
"C" locale.)




What about it? It might work on some systems, but it certainly
isn't portable. It won't work on the systems I usually use,
Linux and Solaris, but only because the length of a file won't
fit in an int. More generally:

-- istream::tellg() returns an streampos, which isn't
guaranteed to be convertible to an integral type,

Ok, I wasn't aware of this.
-- even if streampos is convertible to an integral type,
there's no guarantee that the numeric value of this type has
any real signification---an implementation *could* break it
up into fields, with a "sector number" in the low order
bits, and the offset in the high order bits, for example,
and

Mmh, ok. I think I've got the idea ... still I wonder why a file buffer
shouldn't be accessible at random adresses
-- even if it represents some numerical offset within the
system, it probably isn't guaranteed to fit in an int.

Sry, the int was a bad idea. I was just a bit lazy to look up for the
"size_type" which I guess would be the right type.
And of course, unless you've imbued the "C" locale, all bets are
off anyway.

Finally, this isn't really relevant anyway, since the question
was to seek to an arbitrary location, which isn't necessarily
the end or the beginning.

Yap, but wasn't the OP looking for a way to read from the files end?
Anyway, i've seen many many implementations to retrieve filelength (in
binary mode I must admit) the way I described above. Just wondering what
the correct way would be. I totally agree, that in text mode the value
retrieved won't match the file size in bytes.

I've really looked around a lot and almost every hit said something
similiar to this:
http://www.cplusplus.com/reference/iostream/istream/tellg.html

They state, that the return value is integral ("An integral value of
type streampos with the number of characters between the beginning of
the input sequence and the current position") and they also state that
the position is absolute ("Returns the absolute position of the get
pointer.")

Same goes for seekp.
Also it is said that construction from int has to be supported by
streampos ...

http://www.cplusplus.com/reference/iostream/streampos.html

Do they simply ignore the facts or I did I get your answers wrong?

cheers
psy
 
G

guinness.tony

Anyway, i've seen many many implementations to retrieve filelength (in
binary mode I must admit) the way I described above. Just wondering what
the correct way would be. I totally agree, that in text mode the value
retrieved won't match the file size in bytes.

I've really looked around a lot and almost every hit said something
similiar to this:http://www.cplusplus.com/reference/iostream/istream/tellg.html

They state, that the return value is integral ("An integral value of
type streampos with the number of characters between the beginning of
the input sequence and the current position")

The Standard disagrees.

template <class state> class fpos;
typedef fpos<char_traits<char>::state_type> streampos;

istream::tellg() returns istream::pos_type (where pos_type comes from
the
stream class's traits::pos_type and is usually streampos.)

Table 88 in 27.4.3.2 indicates that a distance (streamoff) can be
obtained by
subtracting one streampos from another. Although I can't find it
explicitly
stated, usage (and one of the footnotes) imply to me that streamoff is
integral.

So the portable way to determine file length should be:

std::ifstream file( "myfile.dat", std::ios_base::binary );

file.seekg( 0, std::ios_base::end );
std::streampos const endpos = file.tellg();
file.seekg( 0, std::ios_base::beg );
std::streamoff const file_length = file.tellg() - endpos;
 
G

guinness.tony

    std::ifstream file( "myfile.dat", std::ios_base::binary );

    file.seekg( 0, std::ios_base::end );
    std::streampos const endpos = file.tellg();
    file.seekg( 0, std::ios_base::beg );
    std::streamoff const file_length = file.tellg() - endpos;
// Aaagh!
std::streamoff const file_length = endpos - file.tellg();
 
J

James Kanze

Ok, I wasn't aware of this.
Mmh, ok. I think I've got the idea ... still I wonder why a
file buffer shouldn't be accessible at random adresses

Mainly because that operation isn't supported by all systems.
In the past, most systems used various file types, and text
files often imposed a fixed length line. So text output padded
with white space, and text input stripped the padding. And of
course, an integral value couldn't be made to mean anything
sensible.

The intent always was that you could seek to an arbitrary point
in a file opened in binary mode, although there may be problems
there as well when multi-byte encodings are involved. The
history here is quite complicated, since it involves trying to
make a concept designed for a simple environment (the Unix file
system and ASCII) work in more complicated environments.
Basically, the idea is that fpos (which is inspired by C's
fpos_t) will save both the OS's view of the position and the
encoding state. But of course, in an extreme case, you can't
know either unless you've actually been there. The C standard
takes the point of view that 1) for a file opened in text mode,
this is actually the case, and that except for seeking to either
end, you can only seek to where you've been, and 2) if the file
was opened in binary mode, you can at least specify an offset in
bytes relative to either end or the current position, and that
you know more about the content that the library does, and will
take whatever precautions necessary for encoding state.

In addition, about the time C was being standardized, the
question of seeking in files longer than what a long could
contain became pertinant---fpos_t could use a struct or an
internal type longer than long to specify the OS position even
if it didn't fit into a long.

And finally, C++ throws in locale dependent code translation,
even in char based streams. (I think C uses locale dependent
code translation to convert bytes to wchar_t when reading and
writing wchar_t. With the added twist that the locale used is a
global variable, which can be changed in ways unknown to you by
any function you happen to call.)

In C++, streampos corresponds more or less to C's fpos_t (but
with some poorly specified twists), and streamoff corresponds to
the relative positionning, which is done with a long in C.

For the most part, in modern OS's, like Windows and Unix (but
not the OS's of mainframes), the only file structure is an array
of bytes. And most modern encodings don't depend on position
dependent state---if you're using UTF-8, for example, each
character may contain more than one bytes, but how many bytes,
or the meaning of the character doesn't depend on some byte
you're read an unspecified distance previously. And of course,
with 64 bit long long's, we're set for the foreseeable future
with regards to file size. In such circumstances, it would make
sense for both streampos and streamoff to be typedef's to long
long---although the standard doesn't allow it in the case of
streampos. You still have a slight problem in Windows, in that
for text files, the system's representation of the position
doesn't correspond exactly to the number of bytes you'd read to
get there (and in some cases, you can successfully position well
beyond the end of the file, and successfully read from that
position---but since doing so is undefined behavior, that's your
tough luck). But globally, it's not too big of a problem if one
is aware of it. Thus, while not required by the standard, from
a quality of implementation point of view, I would be very
disappointed in a library implementation for Windows or Unix in
which streampos was not a typedef to an integral type (long
long, if long is only 32 bits).
Sry, the int was a bad idea. I was just a bit lazy to look up
for the "size_type" which I guess would be the right type.

Maybe:). I'd use long long, and cross my fingers. I've seen
library implementations in which streampos was based on a long
for the positional element, even though long wasn't large enough
to represent the position in a file. And while I'd like to just
say that such implementations are broken, there are historical
reasons which constrain implementors somewhat. In practice, I
don't think that there's any way to handle files longer than
about 2 GB that is both portable and safe.
Yap, but wasn't the OP looking for a way to read from the
files end?

Yes, but by using the relative positionning functions, he
doesn't need to know the length of the file to do so. In a
binary file, he *can* seek to the end - some number of bytes.
And in a text file, under Unix or Windows, he can do it as well,
with the restriction that the actual position might be off
somewhat under Windows (but that's not necessarily a fatal
problem).
Anyway, i've seen many many implementations to retrieve
filelength (in binary mode I must admit) the way I described
above.

I've seen it a lot, too. That doesn't mean that the standard
says it's right.
Just wondering what the correct way would be.

The simple answer is that you can't find the file length,
portably, in standard C++, except by actually reading the file.
At least if by "file length" you mean the number of bytes you
can successfully read.
I totally agree, that in text mode the value retrieved won't
match the file size in bytes.
I've really looked around a lot and almost every hit said
something similiar to
this:http://www.cplusplus.com/reference/iostream/istream/tellg.html
They state, that the return value is integral ("An integral
value of type streampos with the number of characters between
the beginning of the input sequence and the current position")
and they also state that the position is absolute ("Returns
the absolute position of the get pointer.")
Same goes for seekp.

Interesting. The standard doesn't even allow streampos to be
"an integral value". At most, it will convert to a streamoff
(with loss of information in the case of a state dependent
encoding). There is also a requirement that you can convert a
streamoff to an integral type. If streamoff is not an integral
type (which should only happen on very exotic machines),
however, using streampos as an integral value involves two user
defined conversions, which can't happen explicitly.

And of course, the standard says absolutely nothing about the
semantics of this integral value, except that if you use it to
seek in the file (within the restrictions of the standard), it
will get you where you want. Although there is no guarantee
that the mappings aren't arbitrary, I can't see an
implementation doing anything abitrary in the case of binary
files, since something like filebuf::seekoff( n, ios::beg ) is
guaranteed to put you at the place you would have been at if
you'd read n bytes---using an arbitrary mapping would require a
lot of juggling in some of the conversions between integral
types and streampos. (For a text file, of course, that function
call is undefined behavior.)
Also it is said that construction from int has to be supported by
streampos ...

Construction from an int, yes (although the standard fails to
indicate what the semantic should be). Explicit
construction---I don't see anything which would prevent the
constructor from being explicit.
Do they simply ignore the facts or I did I get your answers wrong?

The full requirements of fpos (and streampos is required to be
an instantiation of fpos) and streamoff are given in table 116;
there is certainly no requirement that fpos be directly
convertible to an integral type.

(I'm just wondering. I wonder if it would be worth writing up a
proposal to require 1) that streamoff be a typedef to an
integral type, and 2) that if the file is opened in binary mode,
it's integral value correspond to the number of bytes from the
beginning of the file. I suspect that that would cause no
problems, and would in fact make the standard conform to the
pratical reality for most programmers.)
 
J

James Kanze

The Standard disagrees.

template <class state> class fpos;
typedef fpos<char_traits<char>::state_type> streampos;
istream::tellg() returns istream::pos_type (where pos_type
comes from the stream class's traits::pos_type and is usually
streampos.)
Table 88 in 27.4.3.2 indicates that a distance (streamoff) can
be obtained by subtracting one streampos from another.
Although I can't find it explicitly stated, usage (and one of
the footnotes) imply to me that streamoff is integral.

Which footnote?

I'm curious. Historically, streamoff is used pretty much in the
same way long was in fseek(), so there is some justification in
that expectation, but I've never seen anything which would
require it. In fact, I rather thought that the reason for using
streamoff instead of long was to allow an implementation to
define a type with more range than a long, and use it. Today,
of course, that could be long long, but in 1990, when the
standard was written, that option wasn't available, and it would
have required a class type (or an extention, e.g. __int64).
So the portable way to determine file length should be:
std::ifstream file( "myfile.dat", std::ios_base::binary );
file.seekg( 0, std::ios_base::end );
std::streampos const endpos = file.tellg();
file.seekg( 0, std::ios_base::beg );
std::streamoff const file_length = file.tellg() - endpos;

You're still assuming that the numeric value is something which
has a meaning beyond being usable as a streamoff argument.
While it might be reasonable to make such a requirement for a
binary file, it definitely doesn't hold for text files, and I
don't think that the standard requires it otherwise.
 
G

guinness.tony

Which footnote?

I think I must have been referring to footnote 174)
(in 17.4.4.7): "... types described as synonyms for
basic integral types, such as ... streamoff."
 
J

James Kanze

I think I must have been referring to footnote 174)
(in 17.4.4.7): "... types described as synonyms for
basic integral types, such as ... streamoff."

Interesting. There's no where else that I can find where the
standard says anything at all about streamoff, really, except
that it must be a typedef. (But streampos must also be a
typedef.) Everywhere it's referred to, there's a reference to
§27.4.3.2 (the specifications of fpos) for its requirements, and
there's certainly nothing there which forbids a class type.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top