Maximum char array size?

  • Thread starter Walter Dnes (delete the 'z' to get my real address
  • Start date
W

Walter Dnes (delete the 'z' to get my real address

I've noticed a few threads (full of sound and fury, signifying
nothing) here recently about allocation of large memory blocks. I'm
about to start on a personal pet project where I'll be using memchr(),
memcmp(), memmove() a lot. Is there an ANSI C maximium size for
character arrays which are guaranteed to succeed, assuming the machine
has sufficient memory? And will the memxxx() functions work with that
size? I'm looking at hopefully 65792 bytes ( == 64K + 256, long story,
don't ask) in a memory block. I could get by with less, but the program
code would be clunkier. I could put up with some disk-thrashing, but I
obviously don't want the program dumping core.

My home platform is linux+gcc. I need to be compatable with the
Windows 32-bit world, but not necessarily ancient real-mode DOS. If the
max size is implementation-specific, is there a standard variable that I
can look up at run/compile time? Also, are there any advantages to
malloc(), versus declaring an array at compile time (other than the need
to macro-ize the array dimension in a regular declaration)?
 
M

Mark McIntyre

memcmp(), memmove() a lot. Is there an ANSI C maximium size for
character arrays which are guaranteed to succeed, assuming the machine
has sufficient memory?

The standard (5.2.4.1) guarantees an implementation must support
- 4095 characters in a character string literal or wide string literal
(after concatenation)
- 65535 bytes in an object (in a hosted environment only)

And will the memxxx() functions work with that size?

There's no guarantee that either function will succeed, memory sufficient
or not.
If the
max size is implementation-specific, is there a standard variable that I
can look up at run/compile time?

It is implementation-specific and thre's no way I'm aware of to know it.
Also, are there any advantages to
malloc(), versus declaring an array at compile time (other than the need
to macro-ize the array dimension in a regular declaration)?

build size, lower footprint in automatic memory (eg stack vs heap),
possibly slower....
 
T

Thomas Matthews

Walter said:
I've noticed a few threads (full of sound and fury, signifying
nothing) here recently about allocation of large memory blocks. I'm
about to start on a personal pet project where I'll be using memchr(),
memcmp(), memmove() a lot. Is there an ANSI C maximium size for
character arrays which are guaranteed to succeed, assuming the machine
has sufficient memory? And will the memxxx() functions work with that
size? I'm looking at hopefully 65792 bytes ( == 64K + 256, long story,
don't ask) in a memory block. I could get by with less, but the program
code would be clunkier. I could put up with some disk-thrashing, but I
obviously don't want the program dumping core.

Since C has pointers, why do you need to use memmove() a lot?
Moving large blocks of memory is a waste of computer resources.

At work, I was thinking about writing an optimized memcpy or memmove
function, when I asked myself, "We should never be needed to move
big blocks of memory, just use a pointer." But alas, there are times
when one must move data "out of the way" in order to be analyzed.


--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library
 
T

those who know me have no need of my name

in comp.lang.c i read:
Is there an ANSI C maximium size for character arrays which are guaranteed
to succeed, assuming the machine has sufficient memory?

in a hosted environment the standard requires that at least one object
65535 bytes in size can be created, the older standard demanded only 32767
bytes. seem low? the standard doesn't try to place too much burden on all
implementations, so the translation limits are far below what some systems
are capable of providing, e.g., a 64 bit server might easily accommodate
objects which are gigabytes in size and many of those (all depending on the
amount of money spent on storage and time allowed to make use of `slow'
stuff), but some microcomputers cannot have more than 64k in use for all
objects (yes, in total).
And will the memxxx() functions work with that size?

the mem* functions will work with any object which can be successfully
created, whether at compile or run time.
I'm looking at hopefully 65792 bytes ( == 64K + 256, long story,
don't ask) in a memory block.
My home platform is linux+gcc. I need to be compatable with the
Windows 32-bit world, but not necessarily ancient real-mode DOS.

you haven't mentioned how many of those blocks so it's hard to say, but in
general you should be fine with that size on those machines totalling up to
the amount of memory that is customarily free on the system if malloc or
realloc are used.
If the max size is implementation-specific, is there a standard variable
that I can look up at run/compile time?

it is. there is nothing standard you can consult -- compile and run the
program. (there are things you can consult on both the platforms you've
mentioned, but they are different from one another and outside of the c
standard.)
Also, are there any advantages to
malloc(), versus declaring an array at compile time

some people prefer objects of any real size to be of allocated storage, and
indeed some systems expect it so only provide a limited amount of space for
automatic objects. the third option is using static duration storage,
which can be just as limited as automatic objects, or not.
(other than the need to macro-ize the array dimension in a regular
declaration)?

do what? if you mean avoiding the use of `magic constants' in your program
(by using macros instead), then that applies equally to array dimensions as
to malloc arguments.

#define BUCKET_SIZE 65792
#define NUM_BUCKETS 1

char bucketarray [NUM_BUCKETS] [BUCKET_SIZE];
char * bucketmem = malloc(NUM_BUCKETS * BUCKET_SIZE); assert(0!=bucketmem);

with the current standard any of the *BUCKET* identifiers can be variables
instead of macros, though the array and the malloc, but not the pointer,
cannot be at file scope (would have to be in a block scope).
 
T

those who know me have no need of my name

in comp.lang.c i read:
At work, I was thinking about writing an optimized memcpy or memmove
function,

almost the only good reason that the mem* family exist in the standard
library is so that the implementation can supply an optimized version,
primarily not written in c. true it's a qoi issue, but one i think you'll
find is not often ignored.
 
R

Roman Ziak

My home platform is linux+gcc. I need to be compatable with the
Windows 32-bit world, but not necessarily ancient real-mode DOS. If the
max size is implementation-specific, is there a standard variable that I
can look up at run/compile time? Also, are there any advantages to
malloc(), versus declaring an array at compile time (other than the need
to macro-ize the array dimension in a regular declaration)?

I believe there would be the difference in compiled executable size
(depending on how GNU linker handles uninitialized sections, not sure).

And the other difference is that the malloc()-ed multidimensional arrays
are little painful to work with. In the example at the end of this message,
I did not get anywhere before I declared the new type ARRAY2.

The third difference is that Array1 (from example below) will be allocated
most
likely on stack and Array2 in the heap.

I did couple benchmarks with the example below using BCC, VC++, LCC, GCC
and DMC (Digital Mars Compiler) on WinXp. The DMC result is not here since
downloadable free version does not produce assembly directly, but I verified
by
disassembler that access to dynamic array is 2 instructions and static only
1.

I was able to achieve performance optimisations for static and dynamic array
with
GCC and LCC compilers only. Good work Jacob :)

Roman

**************** BCC

The following was compiled with "bcc32 -S test.c" (Borland C++ 5.5)
Seems to have a performance hit in the pointer version. Optimizations
using "-Ox" did not seem to help the access Array2 and the compiler
kept realoding registers with *[ebp-4] value (address of allocated Array2)
although it could reuse the same register ecx.

; Array1[0][0] = 100;
;
mov dword ptr [ebp-108],100

...

; Array2[0][0] = 100;
;
mov ecx,dword ptr [ebp-4]
mov dword ptr [ecx],100
;

**************** VC++

The following was compiled with "cl /Fa test.c" (MS VC++ 13.10.2179)
The compiler behaved same as Borland, i.e. Array2 took 2 instructions
rather than 1 for Array1. Optimizations using "/Ox" did not help and
compiler kept reloading the register with address of Array2.

mov DWORD PTR _Array1$[ebp], 100 ; 00000064H

....

mov ecx, DWORD PTR _Array2$[ebp]
mov DWORD PTR [ecx], 100 ; 00000064H

**************** LCC

The following was compiled using "lcc -S -O test.c" (Jacob Navia compiler
based
on Chris Fraser and David Hanson research compiler)
The non-optimized version did look similar to Borland and MS, but
optimization
made access to both arrays same:

before "-O"

; 39 Array1[1][0] = 200;
movl $200,-76(%ebp)
....
; 44 Array2[1][0] = 200;
movl -4(%ebp),%edi
movl $200,24(%edi)

after "-O"

; 39 Array1[1][0] = 200;
movl $200,-76(%ebp)
....
; 44 Array2[1][0] = 200;
movl $200,24(%eax)

**************** GCC

The following was compiled with "gcc -S -O test.c" (GCC 3.2 mingw special
20020817-1)

movl $100, -120(%ebp)
movl $200, -96(%ebp)
....
movl -124(%ebp), %eax
movl $100, (%eax)
movl -124(%ebp), %eax
movl $200, 24(%eax)

The following was compiled with "gcc -S -fexpensive-optimizations -O3
test.c" ...

movl $100, -120(%ebp)
movl $200, -96(%ebp)
movl $300, -72(%ebp)
movl $400, -36(%ebp)
movl $100, (%eax)
movl $200, 24(%eax)
movl $300, 48(%eax)
movl $400, 84(%eax)

**************** TEST.C

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NROW 4
#define NCOL 6
#define CELLS (NROW*NCOL)

void PrintArray(char *name, int *array, int ncol, int nrow)
{
int i,j;

printf("\n%s = {\n", name);

for(i=0; i<nrow; i++)
{
for(j=0; j<ncol; j++)
printf("%4d,",array[i*ncol+j]);
printf("\n");
}

printf("};\n");
}

typedef int ARRAY2[NCOL];

int main(int argc, char *argv[])
{
int Array1[NROW][NCOL];
ARRAY2 *Array2;
int i, j;

Array2 = (ARRAY2*)malloc(CELLS*sizeof(int));
printf("Allocated @ 0x%08x, &i=0x%08x, &j=0x%08x, &Array1=0x%08x,
&Array2=0x%08x\n", Array2, &i, &j, &Array1, &Array2);
memset(Array1,0,CELLS*sizeof(int));
memset(Array2,0,CELLS*sizeof(int));

Array1[0][0] = 100;
Array1[1][0] = 200;
Array1[2][0] = 300;
Array1[3][3] = 400;

Array2[0][0] = 100;
Array2[1][0] = 200;
Array2[2][0] = 300;
Array2[3][3] = 400;

PrintArray("static Array1", (int*)Array1,NCOL,NROW);
PrintArray("dynamic Array2", (int*)Array2,NCOL,NROW);

return 0;
}
 
E

Eric Sosman

Mark said:
The standard (5.2.4.1) guarantees an implementation must support
- 4095 characters in a character string literal or wide string literal
(after concatenation)
- 65535 bytes in an object (in a hosted environment only)

This much is correct ...
There's no guarantee that either function will succeed, memory sufficient
or not.

... but this is nonsense. Neither memcmp() nor memmove()
has any "failure" or "success" mode. One can, of course, get
them to misbehave by feeding them garbage arguments, but barring
that sort of nonsense the functions will behave as advertised.
 
M

Mark McIntyre

... but this is nonsense.

I disagree.
Neither memcmp() nor memmove() has any "failure" or "success" mode.

So what?
One can, of course, get
them to misbehave by feeding them garbage arguments, but barring
that sort of nonsense the functions will behave as advertised.

And you can guarantee that, can you? The standard doesn't, AFAICS. The
standard tells you how the function *ought* to behave. Thats not the same.
 
W

Walter Dnes (delete the 'z' to get my real address

Since C has pointers, why do you need to use memmove() a lot?
Moving large blocks of memory is a waste of computer resources.

Maybe I've chosen the wrong algorithm. I need to search for
byte-arrays 255 bytes or less in a binary file. I am using the term
"byte-arrays", *NOT STRINGS*, because they can contain '\0' as a valid
'character'. I was thinking something along the lines of...



1) given a byte-array-to-search-for
2) read in first 256 bytes of file into buffer

Beginning of outer loop
3) read in next 64 kbytes of file into buffer, starting at byte 256

Beginning of inner loop
4) use memchr() to find address of byte in buffer that matches
first byte of byte-array-to-search-for
5) use memcmp() to check if entire byte-array-to-search-for is
matched at that location
6) start search after the match, to see if any more matches,
repeating until search hits end of buffer
End of of inner loop

7) move last 256 bytes of of buffer to beginning of buffer
End of outer loop



Step 7 (outer loop) is the memory moving part. Until such time as
disk-threshing happens, the bigger the buffer, the better. If there's a
better algorithm, please do tell, and point me to it. Text editors have
probably invented that wheel already, but do they handle '\0' as a valid
'character'?
 
R

Roman Ziak

Walter,
1) given a byte-array-to-search-for
2) read in first 256 bytes of file into buffer

Why is first 256 bytes of the file any different from the rest ?
Beginning of outer loop
3) read in next 64 kbytes of file into buffer, starting at byte 256

Beginning of inner loop
4) use memchr() to find address of byte in buffer that matches
first byte of byte-array-to-search-for
5) use memcmp() to check if entire byte-array-to-search-for is
matched at that location
6) start search after the match, to see if any more matches,
repeating until search hits end of buffer
End of of inner loop

7) move last 256 bytes of of buffer to beginning of buffer
End of outer loop

In my opinion your algorithm is little overcomplicated at the beginning:
Let's take an arbitrary BufferSize (64k) and arbitrary searched byte array
size SearchSize (256) plus add special cases handling (which are necessary
anyway) :


1) Remaining = 0
Open file

Beginning of outer loop

2) move Remaining bytes from end of buffer to beginning. Since very
first time Remaining==0, no move happens.

3) read in next BufferSize-Remaining bytes of file into buffer from
position Remaining. If you are close to the EOF, you will be able to read
only so many characters, so you will need to keep the size of valid data in
the buffer (e.g. ValidSize). Should the ValidSize be less than SearchSize,
algorithm ends with so-far found matches.

Position = 0

Beginning of inner loop

4) use memchr() to find address of byte in buffer that matches first
byte of byte-array-to-search-for ... up to ValidSize-SearchSize. Please note
that this point is combination of original point 4) and 6), so we always
need to search from Position, which will be 0 in the first loop and
incrementing with matches found.

5) use memcmp() to check if entire byte-array-to-search-for is
matched at that location ... if found, move Position aftre the
end of found sequence

End of of inner loop ... (while Position <= ValidSize-SearchSize)

End of outer loop ... (while not EOF)

Close File
Step 7 (outer loop) is the memory moving part. Until such time as
disk-threshing happens, the bigger the buffer, the better. If there's a

In modified algorithm, this would be step 2. Why at the beginning ? Because
when you get to the end of file less SearchSize-1, you do not care about
remaining data since it is not possible for it to contain searched sequence.
better algorithm, please do tell, and point me to it. Text editors have
probably invented that wheel already, but do they handle '\0' as a valid
'character'?

I think that most text editors do handle 0 as valid character. It will show
as invalid character (e.g. as a square), but you can still see characters
after 0.

memxx functions you chose for the job do not treat 0 any different than
other byte values.

Roman
 
R

Roman Ziak

I apologise that I confused previous article by mixing terms "character" and
"byte" ... I always meant "byte".

Roman
 
C

CBFalconer

Walter Dnes (delete the 'z' to get my real address) said:
Maybe I've chosen the wrong algorithm. I need to search for
byte-arrays 255 bytes or less in a binary file. I am using the
term "byte-arrays", *NOT STRINGS*, because they can contain '\0'
as a valid 'character'. I was thinking something along the
lines of...

1) given a byte-array-to-search-for
2) read in first 256 bytes of file into buffer

Beginning of outer loop
3) read in next 64 kbytes of file into buffer, starting at
byte 256

Beginning of inner loop
4) use memchr() to find address of byte in buffer that
matches first byte of byte-array-to-search-for
5) use memcmp() to check if entire byte-array-to-search-for
is matched at that location
6) start search after the match, to see if any more
matches, repeating until search hits end of buffer
End of of inner loop

7) move last 256 bytes of of buffer to beginning of buffer
End of outer loop

Step 7 (outer loop) is the memory moving part. Until such time
as disk-threshing happens, the bigger the buffer, the better.
If there's a better algorithm, please do tell, and point me to
it. Text editors have probably invented that wheel already, but
do they handle '\0' as a valid 'character'?

There is definitely a better algorithm, requiring no buffer
whatsoever. A modification of the following will do your job, and
you don't have to dump the following string. It won't input
strings including '\0', but you can arrange to alter that.

/*
Leor said:
I think so. Here's a version I just threw together:
*/

/* And heres another throw -- binfsrch.c by CBF */
/* Released to public domain. Attribution appreciated */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action. In some cases
the presence of 0x1a EOF markers (MsDos) does.

This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/

size_t chrcount; /* debuggery, count of input chars, zeroed */

/* --------------------- */

/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;

assert(lgh > 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id != id[j])) j = next[j];
i++; j++;
next = j;
}
#if (0)
for (i = 0; i < lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id, i, next);
#endif
} /* initnext */

/* --------------------- */

/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */

assert(lgh > 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
chrcount++;
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
} /* kmpffind */

/* --------------------- */

/* Find marker in f, display following printing chars
up to some non printing character or EOF */
int binfsrch(const char *marker, FILE *f)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */

lgh = strlen(marker);
if (!(next = malloc(lgh * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILURE);
}
else {
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker, lgh, next, f)) {
/* found, take appropriate action */
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getc(f))) {
chrcount++;
putchar(ch);
}
puts("\"");
if (EOF == ch) break;
else chrcount++;
}
free(next);
return items;
}
} /* binfsrch */

/* --------------------- */

int main(int argc, char **argv)
{
FILE *f;

f = stdin;
if (3 == argc) {
if (!(f = fopen(argv[2], "rb"))) {
printf("Can't open %s\n", argv[2]);
exit(EXIT_FAILURE);
}
argc--;
}
if (2 != argc) {
puts("Usage: binfsrch name [binaryfile]");
puts(" (file defaults to stdin text mode)");
}
else if (binfsrch(argv[1], f)) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
printf("%lu chars\n", (unsigned long)chrcount);
return 0;
} /* main binfsrch */
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,133
Latest member
MDACVReview
Top