Reading text file contents to a character buffer

Nick Keighley · Aug 4, 2010

Thanks. That helped. Do you need to NULL terminate the string when you
read from file? What about if it is read using "fread"?

well you need some way of indicating the end of the buffer. You could
use a count or nul terminate the string.

BTW. NULL is the null pointer constant it is NOT used to terminate a
string. That's a null or ASCII nul or '\0'

Nick Keighley · Aug 4, 2010

[use a scale factor for memory allocation size]

Well, If I use this, I think I will have more allocations. Consider a
10KB file. I am allocating 4KB (I have PAGE_SIZE set as 4096 bytes)
first. If I use my code, I will allocate 8KB for the second iteration
and 12KB for the last iteration. Now on your code, second allocation
will be made for 6KB, third allocation for 9 and the last for 13. So
my initial code finished in just 3 allocation while the new code takes
4.

I am not understanding how this is a better strategy for allocations?

but o(n^1.5) will beat o(1) eventually

4 4
8 6
12 9
16 13.5
20 20.25
24 30.375

Ben Bacarisse · Aug 4, 2010

Nick Keighley said:
Ben Bacarisse said:

Navaneeth <[email protected]> writes:

Click to expand...

[...] code which works in both C and C++ compilers.

Click to expand...

Click to expand...

and, most of the time, it is for really trivial crap, like, casting the
result of malloc/realloc calls, or using "()" for no-argument functions
rather than "(void)", ...

Click to expand...

just curious, what's wrong with using (void) for an empty argment list
in C++? I always thought this was just a style thing

Yes, I am sure it's fine (bar style issues). The problem comes when one
goes the other way: using idiomatic C++ (e.g. void f()

in C because
you then loose the checking on the function call. I presume that BGB
uses the less idiomatic C++ version (void f(void)

in order to get the
benefit of switching between compilers, but the text suggests otherwise.

Malcolm McLean · Aug 4, 2010

I am not understanding how this is a better strategy for allocations?

file sizes tend to be distributed logarithmically. That is to say you
have lots of tiny 1K or so files, rather fewer 2K ones, noticeably
less 5K ones, and then at the extreme right of the distribution you
have a few outliers - one 1GB, another 1.2GB, the biggest 1.9GB.

We want to balance allocation requests with memory requests.
Allocating 2GB for every file would exhaust all our memory. Allocating
a buffer and growing by one byte at a time would be very slow. The
question is, what increment to use?

By growing the buffer exponentially, we "capture" roughly the same
number of files in each allocation. A constant of 1.5 means that we
never ask for more than 50% more memory than the file actually needs,
which is reasonable - it's unlikely that the function will fail
because of allocation failure if there is actually enough memory in
the machine to hold the data. (You might want a patch to handle the
special case of a file that takes all available memory in a segment).

Bartc · Aug 4, 2010

Well, If I use this, I think I will have more allocations. Consider a
10KB file. I am allocating 4KB (I have PAGE_SIZE set as 4096 bytes)
first. If I use my code, I will allocate 8KB for the second iteration
and 12KB for the last iteration. Now on your code, second allocation
will be made for 6KB, third allocation for 9 and the last for 13. So
my initial code finished in just 3 allocation while the new code takes
4.

I am not understanding how this is a better strategy for allocations?

Your method is just to increase the buffer by 4KB each time?

That's OK for small files; for big files it will grow the buffer too slowly:
a 1MB file would need to call realloc() 255 times with that method.

Using a 50% increase in buffer size each time, only 14 reallocs are needed.

(With my own scheme which doubles each time, there are only 8 reallocs.
Although the doubling stops at around 8MB; then it also grows linearly by
8MB a time; that's not so bad because there are only so many reallocs it can
do from that point before memory is exhausted anyway.

But for reading entire files, I tend to bypass the problem completely by
obtaining the file size first, then I only need 0 reallocs().)

BGB / cr88192 · Aug 5, 2010

Ben Bacarisse said:
Nick Keighley said:

Navaneeth <[email protected]> writes:

Click to expand...

[...] code which works in both C and C++ compilers.

Click to expand...

and, most of the time, it is for really trivial crap, like, casting the
result of malloc/realloc calls, or using "()" for no-argument functions
rather than "(void)", ...

Click to expand...

just curious, what's wrong with using (void) for an empty argment list
in C++? I always thought this was just a style thing

Click to expand...

Yes, I am sure it's fine (bar style issues). The problem comes when one
goes the other way: using idiomatic C++ (e.g. void f() in C because
you then loose the checking on the function call. I presume that BGB
uses the less idiomatic C++ version (void f(void) in order to get the
benefit of switching between compilers, but the text suggests otherwise.

I had thought "(void)" was disallowed in C++, hmm...

well, anyways, I normally use "()" in either case, as personally I find it
looks nicer, and also looks more like most other languages (like JS, C#, and
Java, which I also use sometimes), and also like the notation used when
calling said function (arguments checking or not...).

one could argue though: how often is it that anyone will actually try to
pass arguments to a no-argument function, or more so, what would it sensibly
do with said arguments anyways, besides ignore them?...

admittedly though, it is fairly rare that I don't pass arguments to
functions, since the usual role of a function is to accept inputs and
produce outputs, it generally is fairly rare that one will have a function
with only outputs (or, a no-argument, no-return function, that is much
different from being a no-op...).

granted, this may be that, as a general rule, I prefer to avoid using
globals either (almost everything is either passed as arguments, or stored
in a "context" which is usually also passed as an argument, meaning that
no-argument functions are fairly rare in practice...).

partial reason for not liking using globals:
they mix poorly with multi-threading, as then often one needs mutexes or
similar to keep threads from stepping on each other. it is typically cleaner
IMO to use context structs, and assume that threads will not share these
structs (or, at least, if the contents of the struct are stateful, where it
is much safer to share stateless/immutable data).

or such...

Ben Bacarisse · Aug 5, 2010

BGB / cr88192 said:
Ben Bacarisse said:

Nick Keighley said:

<snip>

[...] code which works in both C and C++ compilers.

<snip>

and, most of the time, it is for really trivial crap, like, casting the
result of malloc/realloc calls, or using "()" for no-argument functions
rather than "(void)", ...

just curious, what's wrong with using (void) for an empty argment list
in C++? I always thought this was just a style thing

Click to expand...

Yes, I am sure it's fine (bar style issues). The problem comes when one
goes the other way: using idiomatic C++ (e.g. void f() in C because
you then loose the checking on the function call. I presume that BGB
uses the less idiomatic C++ version (void f(void) in order to get the
benefit of switching between compilers, but the text suggests otherwise.

Click to expand...

I had thought "(void)" was disallowed in C++, hmm...

well, anyways, I normally use "()" in either case, as personally I find it
looks nicer, and also looks more like most other languages (like JS, C#, and
Java, which I also use sometimes), and also like the notation used when
calling said function (arguments checking or not...).

one could argue though: how often is it that anyone will actually try to
pass arguments to a no-argument function, or more so, what would it sensibly
do with said arguments anyways, besides ignore them?...

It's worse than that. If the function does take an argument, and you
make a mistake in the declaration (say in a header file) you will write
void f(); rather than void f(void). Anything could happen now unless
the call is exactly right. A change in architecture where the promoted
type of the actual argument no longer matches the one expected will
cause problems. An argument of the wrong type, or simply forgetting one
(or more) of them will cause problems. These can be subtle problems.

The biggest improvement in C was the introduction of prototypes and I
don't want to work without them (even partially).

The old C was a nightmare in this respect, especially when moving code
between very different architectures. Calling int i; f(&i); when f
does something like memset(p, sizeof(int)); with it's parameter would go
horribly wrong on some architectures. If f has a prototype in scope,
all is well because the pointer will be converted to the right type.

<snip>

Richard Bos · Aug 5, 2010

Bartc said:
If you're going to worry about all these possibilities, even for reading a
6-line config file, then you're never going to get anywhere.

If you're trying to find the file size in bytes to read a 6-line config
file, you're going the wrong way about it. That's the second most
important thing to realise about portably getting a file size: you
almost never need to.
(The third most important thing is that, if you're going to jump to the
hoops many people suggest to get a "practically" portable solution, you
might as well bite the bullet and use a clean, system-specific solution
that you _know_ works.)

Richard

Richard Bos · Aug 14, 2010

BGB / cr88192 said:
I go both routes...

usually copy/paste happens if one is doing mix and match of code from
different places, or codebases, in order to make a combined result...

Meaning that both the C programmers _and_ the C++ ones hate your very
guts. Well done.

Might I point out, again, the existence of 'extern "C"'?

Richard

Jorgen Grahn · Aug 28, 2010

"hate your very guts"... really???

Isn't that a bit extreme?

It reminds me of the level of overkill in today's American politics.

I have an *Idea*. Why don't all of the C zealots paint themselves
half white and half black, and all of the C++ zealots paint themselves
half black and half white. Then they can battle each other to the
death while the rest of us stand around wondering what the hell the
big difference is between them. :O)

I think what will happen is that the C zealots and the C++ zealots
will join forces and beat up you indifferent bastards first ;->

But seriously ... I see no C-vs-C++ zealotry in the thread above. You
may want to read it again.

(Nice Star Trek reference, though!)

/Jorgen

URGENT	1	Jan 31, 2023
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
no error by fscanf on reading from output file	18	Oct 30, 2011
Copying contents of gzip file in character buffer	3	Mar 13, 2008
Function to determine the number of chars in a FILE buffer	11	Aug 31, 2011
Text processing	29	Sep 26, 2011
reading from a text file	40	Oct 29, 2005
File Limit of 1021	7	Nov 21, 2008

Reading text file contents to a character buffer

Nick Keighley

Nick Keighley

Ben Bacarisse

Malcolm McLean

Bartc

BGB / cr88192

Ben Bacarisse

Richard Bos

Richard Bos

Jorgen Grahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads