copy the last n-bytes of a big file into another file

T

tirzan

Hi all,
I would like to copy a the last part of a (quite big) file into
another one (remove the header of few big images, basically remove the
first 1024 bytes).
I'm driving crazy, can you please help me?

This one of the attempts (but it just copies the file, how can I tell
to start copying from a certain position?):

std::fstream f(inputFileName, std::fstream::in|
std::fstream::binary);
std::istream_iterator<unsigned char> begin(f.seekg(1024,
std::ios::beg));
f << std::noskipws;
std::istream_iterator<unsigned char> end;
std::fstream f2(outputFileName, std::fstream::eek:ut|
std::fstream::trunc| std::fstream::binary);
std::eek:stream_iterator<char> begin2(f2);
std::copy(begin, end, begin2);

thanks very much!!!
T.
 
J

James Kanze

Will try...
Try to avoid one-letter variables. Hard to understand what
they are for when looking at them later in the code.

Hard to read them sometimes as well.
'fstream' is of type 'basic_fstream<char>'. Do you really want to
change the type to 'unsigned char' here? You'd be better off using the
same template argument, no? How does it compile, anyway?

There's no real reason for the template arguments to be
identical here. They represent entirely different things. (The
second template argument of the istream_iterator should be the
same as that of the string, but it defaults to char, so it is.)
And, 'seekg' returns the stream. You're constructing an
iterator from the stream. Are you sure your iterator isn't
going to reset the stream?

That's not part of the behavior specified by the standard for
the constructor of the iterator.
Why don't you just do
std::istream_iterator<char> begin(f);
std::advance(begin, 1024); // skip the first 1024 bytes

Because it could be significantly slower. (For 1024 characters,
I doubt it, but for larger differences, advancing an
istream_iterator means reading that many elements.)
Now, what does that do for a *binary* input file?

The same thing as it does for a text input file. It means that
the initial skip of white space in the << operator won't take
place.
Again, use <char> .

Not if he's reading unsigned char.
So, to summarize: be simpler and more explicit.

He can't be much simpler. I'd rather he be a little more
explicit, however. Say by checking the status of the stream
after the seek, before copying the rest.
And post the final (hopefully short) program. We could try it
with a file longer than 1024 characters.

At first view, his code should work. I'd still like to see the
status of the input file after the seek, however.

After opening the two files and the seek, I'd probably just
write:

f2 << f.rdbuf();

rather than bother with std::copy.
 
T

tirzan

Will try...
thanks!

    std::istream_iterator<char> begin(f);
    std::advance(begin, 1024); // skip the first 1024 bytes

tried it, doesn't work :-(
Now, what does that do for a *binary* input file?

ensures all the characters are copied, if I won't put it the dimension
of the input-output files won't match any longer ;-)
Again, use <char> .

I'll do
So, to summarize: be simpler and more explicit.  And post the final
(hopefully short) program.  We could try it with a file longer than 1024
characters.

ok, here there is a simplified source code, with all the suggestion
that both you and James kindly made: major.altervista.org/cut.cxx

unfortunately it is still not working (it doesn't skip the first 1024
bytes of the file, it just copies the file).

The program should work with any "big" (up to 1gb) file, so it doesn't
really matter to put the one I'm actually working on. However, if you
are curious, here it is: http://bio3d.colorado.edu/imod/files/tutorialData.tar.gz
the file is called BBa.st (only 32Mb or so).

cheers,
T.
 
T

tirzan

ok, here there is a simplified source code, with all the suggestion
that both you and James kindly made: major.altervista.org/cut.cxx

is it working now! ;-)
used:
std::advance(beginInput, 1024);
std::copy(beginInput, endInput, beginOutput);

thanks guys!!

cheers,
T.
 
V

Volker Lukas

Victor Bazarov wrote:
[...}
Third, don't use my 'advance' recommendation, it's bogus, the input
iterators aren't incrementable using op++() (and that's what 'advance'
uses). They are only incrementable using op++(int).
Can you clarify this? I see Table 72 on page 517 in the standard which
specifies requirements on Input Iterators. This table lists both pre-
and postincrement.
Fourth, on my
system outputting rdbuf() left out one character for some reason.
I think the reason is described in part 24.5.1, paragraph 1 of the
standard. It says about istream_iterator that on every increment, and
after construction, an element is read from the stream. Construction
will advance one position, so to say.
 
J

James Kanze

[...]
As for using
<<, I am somewhat (what's the word?) cautious when using formatted I/O
operators on what is intended as a straight copy.

I agree, but << on a streambuf is rather a special case. It
really behaves more like unformatted input, rather than
formatted input.
Do you think that 'copy' would be slower than using op<<?
Both would have a loop, both read the buffer char by char,
yes?

No. A simple implementation would use sgetn, which should be
faster than character by character with an efficient
implementation of filebuf. A more complex implementation could
go further. In the past, this was the standard idiom for
copying files, and it's possible that some implementations still
optimize it because of this.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top