Maximum char array size?

Discussion in 'C Programming' started by Walter Dnes (delete the 'z' to get my real address, Jun 13, 2004.

  1. I've noticed a few threads (full of sound and fury, signifying
    nothing) here recently about allocation of large memory blocks. I'm
    about to start on a personal pet project where I'll be using memchr(),
    memcmp(), memmove() a lot. Is there an ANSI C maximium size for
    character arrays which are guaranteed to succeed, assuming the machine
    has sufficient memory? And will the memxxx() functions work with that
    size? I'm looking at hopefully 65792 bytes ( == 64K + 256, long story,
    don't ask) in a memory block. I could get by with less, but the program
    code would be clunkier. I could put up with some disk-thrashing, but I
    obviously don't want the program dumping core.

    My home platform is linux+gcc. I need to be compatable with the
    Windows 32-bit world, but not necessarily ancient real-mode DOS. If the
    max size is implementation-specific, is there a standard variable that I
    can look up at run/compile time? Also, are there any advantages to
    malloc(), versus declaring an array at compile time (other than the need
    to macro-ize the array dimension in a regular declaration)?
     
    Walter Dnes (delete the 'z' to get my real address, Jun 13, 2004
    #1
    1. Advertisements

  2. The standard (5.2.4.1) guarantees an implementation must support
    - 4095 characters in a character string literal or wide string literal
    (after concatenation)
    - 65535 bytes in an object (in a hosted environment only)

    There's no guarantee that either function will succeed, memory sufficient
    or not.
    It is implementation-specific and thre's no way I'm aware of to know it.
    build size, lower footprint in automatic memory (eg stack vs heap),
    possibly slower....
     
    Mark McIntyre, Jun 14, 2004
    #2
    1. Advertisements

  3. Since C has pointers, why do you need to use memmove() a lot?
    Moving large blocks of memory is a waste of computer resources.

    At work, I was thinking about writing an optimized memcpy or memmove
    function, when I asked myself, "We should never be needed to move
    big blocks of memory, just use a pointer." But alas, there are times
    when one must move data "out of the way" in order to be analyzed.


    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c++-faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c++/faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
    http://www.sgi.com/tech/stl -- Standard Template Library
     
    Thomas Matthews, Jun 14, 2004
    #3
  4. in comp.lang.c i read:
    in a hosted environment the standard requires that at least one object
    65535 bytes in size can be created, the older standard demanded only 32767
    bytes. seem low? the standard doesn't try to place too much burden on all
    implementations, so the translation limits are far below what some systems
    are capable of providing, e.g., a 64 bit server might easily accommodate
    objects which are gigabytes in size and many of those (all depending on the
    amount of money spent on storage and time allowed to make use of `slow'
    stuff), but some microcomputers cannot have more than 64k in use for all
    objects (yes, in total).
    the mem* functions will work with any object which can be successfully
    created, whether at compile or run time.
    you haven't mentioned how many of those blocks so it's hard to say, but in
    general you should be fine with that size on those machines totalling up to
    the amount of memory that is customarily free on the system if malloc or
    realloc are used.
    it is. there is nothing standard you can consult -- compile and run the
    program. (there are things you can consult on both the platforms you've
    mentioned, but they are different from one another and outside of the c
    standard.)
    some people prefer objects of any real size to be of allocated storage, and
    indeed some systems expect it so only provide a limited amount of space for
    automatic objects. the third option is using static duration storage,
    which can be just as limited as automatic objects, or not.
    do what? if you mean avoiding the use of `magic constants' in your program
    (by using macros instead), then that applies equally to array dimensions as
    to malloc arguments.

    #define BUCKET_SIZE 65792
    #define NUM_BUCKETS 1

    char bucketarray [NUM_BUCKETS] [BUCKET_SIZE];
    char * bucketmem = malloc(NUM_BUCKETS * BUCKET_SIZE); assert(0!=bucketmem);

    with the current standard any of the *BUCKET* identifiers can be variables
    instead of macros, though the array and the malloc, but not the pointer,
    cannot be at file scope (would have to be in a block scope).
     
    those who know me have no need of my name, Jun 14, 2004
    #4
  5. in comp.lang.c i read:
    almost the only good reason that the mem* family exist in the standard
    library is so that the implementation can supply an optimized version,
    primarily not written in c. true it's a qoi issue, but one i think you'll
    find is not often ignored.
     
    those who know me have no need of my name, Jun 14, 2004
    #5
  6. Walter Dnes (delete the 'z' to get my real address

    Roman Ziak Guest

    My home platform is linux+gcc. I need to be compatable with the
    I believe there would be the difference in compiled executable size
    (depending on how GNU linker handles uninitialized sections, not sure).

    And the other difference is that the malloc()-ed multidimensional arrays
    are little painful to work with. In the example at the end of this message,
    I did not get anywhere before I declared the new type ARRAY2.

    The third difference is that Array1 (from example below) will be allocated
    most
    likely on stack and Array2 in the heap.

    I did couple benchmarks with the example below using BCC, VC++, LCC, GCC
    and DMC (Digital Mars Compiler) on WinXp. The DMC result is not here since
    downloadable free version does not produce assembly directly, but I verified
    by
    disassembler that access to dynamic array is 2 instructions and static only
    1.

    I was able to achieve performance optimisations for static and dynamic array
    with
    GCC and LCC compilers only. Good work Jacob :)

    Roman

    **************** BCC

    The following was compiled with "bcc32 -S test.c" (Borland C++ 5.5)
    Seems to have a performance hit in the pointer version. Optimizations
    using "-Ox" did not seem to help the access Array2 and the compiler
    kept realoding registers with *[ebp-4] value (address of allocated Array2)
    although it could reuse the same register ecx.

    ; Array1[0][0] = 100;
    ;
    mov dword ptr [ebp-108],100

    ...

    ; Array2[0][0] = 100;
    ;
    mov ecx,dword ptr [ebp-4]
    mov dword ptr [ecx],100
    ;

    **************** VC++

    The following was compiled with "cl /Fa test.c" (MS VC++ 13.10.2179)
    The compiler behaved same as Borland, i.e. Array2 took 2 instructions
    rather than 1 for Array1. Optimizations using "/Ox" did not help and
    compiler kept reloading the register with address of Array2.

    mov DWORD PTR _Array1$[ebp], 100 ; 00000064H

    ....

    mov ecx, DWORD PTR _Array2$[ebp]
    mov DWORD PTR [ecx], 100 ; 00000064H

    **************** LCC

    The following was compiled using "lcc -S -O test.c" (Jacob Navia compiler
    based
    on Chris Fraser and David Hanson research compiler)
    The non-optimized version did look similar to Borland and MS, but
    optimization
    made access to both arrays same:

    before "-O"

    ; 39 Array1[1][0] = 200;
    movl $200,-76(%ebp)
    ....
    ; 44 Array2[1][0] = 200;
    movl -4(%ebp),%edi
    movl $200,24(%edi)

    after "-O"

    ; 39 Array1[1][0] = 200;
    movl $200,-76(%ebp)
    ....
    ; 44 Array2[1][0] = 200;
    movl $200,24(%eax)

    **************** GCC

    The following was compiled with "gcc -S -O test.c" (GCC 3.2 mingw special
    20020817-1)

    movl $100, -120(%ebp)
    movl $200, -96(%ebp)
    ....
    movl -124(%ebp), %eax
    movl $100, (%eax)
    movl -124(%ebp), %eax
    movl $200, 24(%eax)

    The following was compiled with "gcc -S -fexpensive-optimizations -O3
    test.c" ...

    movl $100, -120(%ebp)
    movl $200, -96(%ebp)
    movl $300, -72(%ebp)
    movl $400, -36(%ebp)
    movl $100, (%eax)
    movl $200, 24(%eax)
    movl $300, 48(%eax)
    movl $400, 84(%eax)

    **************** TEST.C

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define NROW 4
    #define NCOL 6
    #define CELLS (NROW*NCOL)

    void PrintArray(char *name, int *array, int ncol, int nrow)
    {
    int i,j;

    printf("\n%s = {\n", name);

    for(i=0; i<nrow; i++)
    {
    for(j=0; j<ncol; j++)
    printf("%4d,",array[i*ncol+j]);
    printf("\n");
    }

    printf("};\n");
    }

    typedef int ARRAY2[NCOL];

    int main(int argc, char *argv[])
    {
    int Array1[NROW][NCOL];
    ARRAY2 *Array2;
    int i, j;

    Array2 = (ARRAY2*)malloc(CELLS*sizeof(int));
    printf("Allocated @ 0x%08x, &i=0x%08x, &j=0x%08x, &Array1=0x%08x,
    &Array2=0x%08x\n", Array2, &i, &j, &Array1, &Array2);
    memset(Array1,0,CELLS*sizeof(int));
    memset(Array2,0,CELLS*sizeof(int));

    Array1[0][0] = 100;
    Array1[1][0] = 200;
    Array1[2][0] = 300;
    Array1[3][3] = 400;

    Array2[0][0] = 100;
    Array2[1][0] = 200;
    Array2[2][0] = 300;
    Array2[3][3] = 400;

    PrintArray("static Array1", (int*)Array1,NCOL,NROW);
    PrintArray("dynamic Array2", (int*)Array2,NCOL,NROW);

    return 0;
    }
     
    Roman Ziak, Jun 14, 2004
    #6
  7. Walter Dnes (delete the 'z' to get my real address

    Eric Sosman Guest

    This much is correct ...
    ... but this is nonsense. Neither memcmp() nor memmove()
    has any "failure" or "success" mode. One can, of course, get
    them to misbehave by feeding them garbage arguments, but barring
    that sort of nonsense the functions will behave as advertised.
     
    Eric Sosman, Jun 14, 2004
    #7
  8. I disagree.
    So what?
    And you can guarantee that, can you? The standard doesn't, AFAICS. The
    standard tells you how the function *ought* to behave. Thats not the same.
     
    Mark McIntyre, Jun 14, 2004
    #8
  9. Maybe I've chosen the wrong algorithm. I need to search for
    byte-arrays 255 bytes or less in a binary file. I am using the term
    "byte-arrays", *NOT STRINGS*, because they can contain '\0' as a valid
    'character'. I was thinking something along the lines of...



    1) given a byte-array-to-search-for
    2) read in first 256 bytes of file into buffer

    Beginning of outer loop
    3) read in next 64 kbytes of file into buffer, starting at byte 256

    Beginning of inner loop
    4) use memchr() to find address of byte in buffer that matches
    first byte of byte-array-to-search-for
    5) use memcmp() to check if entire byte-array-to-search-for is
    matched at that location
    6) start search after the match, to see if any more matches,
    repeating until search hits end of buffer
    End of of inner loop

    7) move last 256 bytes of of buffer to beginning of buffer
    End of outer loop



    Step 7 (outer loop) is the memory moving part. Until such time as
    disk-threshing happens, the bigger the buffer, the better. If there's a
    better algorithm, please do tell, and point me to it. Text editors have
    probably invented that wheel already, but do they handle '\0' as a valid
    'character'?
     
    Walter Dnes (delete the 'z' to get my real address, Jun 15, 2004
    #9
  10. Walter Dnes (delete the 'z' to get my real address

    Roman Ziak Guest

    Walter,
    Why is first 256 bytes of the file any different from the rest ?
    In my opinion your algorithm is little overcomplicated at the beginning:
    Let's take an arbitrary BufferSize (64k) and arbitrary searched byte array
    size SearchSize (256) plus add special cases handling (which are necessary
    anyway) :


    1) Remaining = 0
    Open file

    Beginning of outer loop

    2) move Remaining bytes from end of buffer to beginning. Since very
    first time Remaining==0, no move happens.

    3) read in next BufferSize-Remaining bytes of file into buffer from
    position Remaining. If you are close to the EOF, you will be able to read
    only so many characters, so you will need to keep the size of valid data in
    the buffer (e.g. ValidSize). Should the ValidSize be less than SearchSize,
    algorithm ends with so-far found matches.

    Position = 0

    Beginning of inner loop

    4) use memchr() to find address of byte in buffer that matches first
    byte of byte-array-to-search-for ... up to ValidSize-SearchSize. Please note
    that this point is combination of original point 4) and 6), so we always
    need to search from Position, which will be 0 in the first loop and
    incrementing with matches found.

    5) use memcmp() to check if entire byte-array-to-search-for is
    matched at that location ... if found, move Position aftre the
    end of found sequence

    End of of inner loop ... (while Position <= ValidSize-SearchSize)

    End of outer loop ... (while not EOF)

    Close File
    In modified algorithm, this would be step 2. Why at the beginning ? Because
    when you get to the end of file less SearchSize-1, you do not care about
    remaining data since it is not possible for it to contain searched sequence.
    I think that most text editors do handle 0 as valid character. It will show
    as invalid character (e.g. as a square), but you can still see characters
    after 0.

    memxx functions you chose for the job do not treat 0 any different than
    other byte values.

    Roman
     
    Roman Ziak, Jun 15, 2004
    #10
  11. Walter Dnes (delete the 'z' to get my real address

    Roman Ziak Guest

    I apologise that I confused previous article by mixing terms "character" and
    "byte" ... I always meant "byte".

    Roman
     
    Roman Ziak, Jun 15, 2004
    #11
  12. Walter Dnes (delete the 'z' to get my real address

    CBFalconer Guest

    There is definitely a better algorithm, requiring no buffer
    whatsoever. A modification of the following will do your job, and
    you don't have to dump the following string. It won't input
    strings including '\0', but you can arrange to alter that.

    /*
    */

    /* And heres another throw -- binfsrch.c by CBF */
    /* Released to public domain. Attribution appreciated */
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    #include <assert.h>

    /* The difference between a binary and a text file, on read,
    is the conversion of end-of-line delimiters. What those
    delimiters are does not affect the action. In some cases
    the presence of 0x1a EOF markers (MsDos) does.

    This is a version of Knuth-Morris-Pratt algorithm. The
    point of using this is to avoid any backtracking in file
    reading, and thus avoiding any use of buffer arrays.
    */

    size_t chrcount; /* debuggery, count of input chars, zeroed */

    /* --------------------- */

    /* Almost straight out of Sedgewick */
    /* The next array indicates what index in id should next be
    compared to the current char. Once the (lgh - 1)th char
    has been successfully compared, the id has been found.
    The array is formed by comparing id to itself. */
    void initnext(int *next, const char *id, int lgh)
    {
    int i, j;

    assert(lgh > 0);
    next[0] = -1; i = 0; j = -1;
    while (i < lgh) {
    while ((j >= 0) && (id != id[j])) j = next[j];
    i++; j++;
    next = j;
    }
    #if (0)
    for (i = 0; i < lgh; i++)
    printf("id[%d] = '%c' next[%d] = %d\n",
    i, id, i, next);
    #endif
    } /* initnext */

    /* --------------------- */

    /* reads f without rewinding until either EOF or *marker
    has been found. Returns EOF if not found. At exit the
    last matching char has been read, and no further. */
    int kmpffind(const char *marker, int lgh, int *next, FILE *f)
    {
    int j; /* char position in marker to check */
    int ch; /* current char */

    assert(lgh > 0);
    j = 0;
    while ((j < lgh) && (EOF != (ch = getc(f)))) {
    chrcount++;
    while ((j >= 0) && (ch != marker[j])) j = next[j];
    j++;
    }
    return ch;
    } /* kmpffind */

    /* --------------------- */

    /* Find marker in f, display following printing chars
    up to some non printing character or EOF */
    int binfsrch(const char *marker, FILE *f)
    {
    int *next;
    int lgh;
    int ch;
    int items; /* count of markers found */

    lgh = strlen(marker);
    if (!(next = malloc(lgh * sizeof *next))) {
    puts("No memory");
    exit(EXIT_FAILURE);
    }
    else {
    initnext(next, marker, lgh);
    items = 0;
    while (EOF != kmpffind(marker, lgh, next, f)) {
    /* found, take appropriate action */
    items++;
    printf("%d %s : \"", items, marker);
    while (isprint(ch = getc(f))) {
    chrcount++;
    putchar(ch);
    }
    puts("\"");
    if (EOF == ch) break;
    else chrcount++;
    }
    free(next);
    return items;
    }
    } /* binfsrch */

    /* --------------------- */

    int main(int argc, char **argv)
    {
    FILE *f;

    f = stdin;
    if (3 == argc) {
    if (!(f = fopen(argv[2], "rb"))) {
    printf("Can't open %s\n", argv[2]);
    exit(EXIT_FAILURE);
    }
    argc--;
    }
    if (2 != argc) {
    puts("Usage: binfsrch name [binaryfile]");
    puts(" (file defaults to stdin text mode)");
    }
    else if (binfsrch(argv[1], f)) {
    printf("\"%s\" : found\n", argv[1]);
    }
    else printf("\"%s\" : not found\n", argv[1]);
    printf("%lu chars\n", (unsigned long)chrcount);
    return 0;
    } /* main binfsrch */
     
    CBFalconer, Jun 15, 2004
    #12
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.