substring assignment in fortran, C, etc.

Discussion in 'C Programming' started by LC's No-Spam Newsreading account, May 19, 2009.

  1. I am trying to collect equivalent statements in various languages.

    I am now dealing with substring assignment.

    Let us assume that string="abcd", value="AB", i=2, j=3 (I won't
    scandalize if in your favourite language it is i=1 j=2 :=). Nor if one
    has to use n=j-i+1 (length("AB")).

    I want an assignment which returns string="aABd"

    I started from Fortran string(i:j)=value

    I came out with the examples below, but cannot find a satisfactory one
    for C using standard library functions.

    IDL: strput,string,value,i

    Java: string=string.substring(0,i)+value+string.substring(j+1) ;
    or using string buffers
    StringBuffer sb = new StringBuffer(string) ;
    sb.replace(i,j+1,value) ;

    awk: string=substr(string,1,i-1) value substr(a,j+1)

    mysql: set @string:=insert(@string,@i,@n,@value)

    Postscript: string i value putinterval /string exch def

    I even have csh

    @ i=$i-1
    @ j=$j+1
    set string=`echo $string | cut -c-$i`$value`echo $string | cut -c$j-`

    But what about C ?
    I can find an hardcoded solution for a character array

    char a[5]="abcd" ;
    a[1]='A';
    a[2]='B';

    which means I can possibly write some for loop

    But what if I want a (or string) to be a "standard" string i.e. a
    char *a ?


    Please note followup to clf

    --
    ----------------------------------------------------------------------
    is a newsreading account used by more persons to
    avoid unwanted spam. Any mail returning to this address will be rejected.
    Users can disclose their e-mail address in the article if they wish so.
     
    LC's No-Spam Newsreading account, May 19, 2009
    #1
    1. Advertising

  2. LC's No-Spam Newsreading account

    jameskuyper Guest

    Why was your message posted to comp.lang.c, but with followups
    redirected only to comp.lang.fortran? Anyone who doesn't notice that
    and fix it will be posting his response solely to c.l.f. Any such
    message that contains a technical error will not be seen by any of the
    C experts on c.l.c, the people most likely to be able to notice and
    correct such an error.

    LC's No-Spam Newsreading account wrote:
    > I am trying to collect equivalent statements in various languages.
    >
    > I am now dealing with substring assignment.
    >
    > Let us assume that string="abcd", value="AB", i=2, j=3 (I won't
    > scandalize if in your favourite language it is i=1 j=2 :=). Nor if one
    > has to use n=j-i+1 (length("AB")).
    >
    > I want an assignment which returns string="aABd"
    >
    > I started from Fortran string(i:j)=value


    The closest C equivalent involves a function call, rather than an
    assignment. This is inherently the case, since a string is a data
    format in C, and it is a format that is stored in an array. An array
    cannot be assigned in C. You could create a C struct type that either
    contains a fixed-size array, or points at an array whose size need not
    be fixed. Such a struct could value could be assigned, but no such
    struct type is part of standard C.
    Here is the corresponding function call. It would use the i=1, j=2
    option you mentioned above:

    memcpy(string+i, value, j+1-i)

    If string is the name of a pointer, rather than the name of an array,
    you can make this an assignment if you want to:

    string = memcpy(string+i, value, j+1-i);

    However, there's no good reason to do so, because the assignment
    doesn't actually change anything. If you modified the context, such as

    newstring = memcpy(string+i, value, j+1-i);

    then the assignment would no longer be pointless; but it would
    probably indicate a design error.

    ....
    > But what if I want a (or string) to be a "standard" string i.e. a
    > char *a ?


    In C, a string is a data format, not a type. That format consists of a
    series of char objects, ending with a nul character '\0'. The type
    char*a isn't a string type, it's a pointer to a character. That
    character might or might not point to a character in a string. If it
    does, that string might or might not take up the entire size of an
    array of char.

    If you're worried about the void* data type returned by memcpy(), you
    can convert it:

    (char*)memcpy(string+i, value, j+1-i)
     
    jameskuyper, May 19, 2009
    #2
    1. Advertising

  3. LC's No-Spam Newsreading account <> wrote:

    > I came out with the examples below, but cannot find a satisfactory one
    > for C using standard library functions.

    ....
    > Please note followup to clf


    Why? It is a question about how to do something in C. The only Fortran
    in sight is just used as an example of what effect you want to achieve.
    I don't even see why it was posted to comp.lang.fortran at all, much
    less with followups directed solely there. Do you really expect
    comp.lang.fortran to be the most appropriate place to discuss how to do
    something in C? That makes it sound to me just like an invitation for
    language flaming about how painful the Fortran folk might find the C way
    of doing something.

    --
    Richard Maine | Good judgment comes from experience;
    email: last name at domain . net | experience comes from bad judgment.
    domain: summertriangle | -- Mark Twain
     
    Richard Maine, May 19, 2009
    #3
  4. In comp.lang.fortran jameskuyper <> wrote:
    < Why was your message posted to comp.lang.c, but with followups
    < redirected only to comp.lang.fortran? Anyone who doesn't notice that
    < and fix it will be posting his response solely to c.l.f. Any such
    < message that contains a technical error will not be seen by any of the
    < C experts on c.l.c, the people most likely to be able to notice and
    < correct such an error.

    < LC's No-Spam Newsreading account wrote:
    <> I am trying to collect equivalent statements in various languages.

    <> I am now dealing with substring assignment.
    (snip)

    <> I want an assignment which returns string="aABd"

    <> I started from Fortran string(i:j)=value

    < The closest C equivalent involves a function call, rather than an
    < assignment.

    It involves function call syntax, though not necessarily (as I
    understand the C standard) an actual function call.

    < This is inherently the case, since a string is a data
    < format in C, and it is a format that is stored in an array. An array
    < cannot be assigned in C.

    There has been discussion in comp.lang.fortran on the advantages
    and disadvantages of function syntax vs. substring syntax.

    There is a disadvantage to Fortran substring syntax, especially
    in the lvalue case, in that it doesn't generalize. You can't,
    for example, directly substring a substring:

    string(i:j)(k:l) isn't legal.

    (snip)

    < Here is the corresponding function call. It would use the
    < i=1, j=2 option you mentioned above:

    < memcpy(string+i, value, j+1-i)

    My choice would be strncpy(string+i, value, j+1-i);
    but I agree that memcpy will also work. Keeping to the str...
    functions for string work seems more consistent.

    < If string is the name of a pointer, rather than the name of an
    < array, you can make this an assignment if you want to:

    (snip discussion about strings and pointers in C, not so
    relevant to the OP question.)

    As I mentioned in a post not follow up to comp.lang.c, the
    PL/I form is:

    substr(a,2,2)='AB'; or more generally

    substr(string,i,j+1-i)=value;

    Again, note function syntax but not a function.
    (PL/I calls them pseudo-variables.) This is consistent
    with the function call syntax in expressions.

    -- glen
     
    glen herrmannsfeldt, May 19, 2009
    #4
  5. LC's No-Spam Newsreading account

    jameskuyper Guest

    glen herrmannsfeldt wrote:
    > In comp.lang.fortran jameskuyper <> wrote:

    ....
    > <> I want an assignment which returns string="aABd"
    >
    > <> I started from Fortran string(i:j)=value
    >
    > < The closest C equivalent involves a function call, rather than an
    > < assignment.
    >
    > It involves function call syntax, though not necessarily (as I
    > understand the C standard) an actual function call.


    True; but it's a distinction of negligible importance; it's possible
    for a C compiler to inline some or all of any call to a function if
    the definition of that function is known to the compiler; highly
    optimizing C compilers inline more code than you might expect. In
    practice, it's simpler to just refer to them as function calls, and
    not worry about the details of what the compiler actually does with
    them.

    ....
    > < Here is the corresponding function call. It would use the
    > < i=1, j=2 option you mentioned above:
    >
    > < memcpy(string+i, value, j+1-i)
    >
    > My choice would be strncpy(string+i, value, j+1-i);
    > but I agree that memcpy will also work. Keeping to the str...
    > functions for string work seems more consistent.


    The desired functionality is underdefined; the difference between your
    version and mine matters only if "value" is shorter than the substring
    it is replacing. My version has possible undefined behavior in that
    case; yours avoids that, at the cost of being very marginally slower
    for large sub strings. What does fortran code given above do in that
    case?
     
    jameskuyper, May 19, 2009
    #5
  6. In comp.lang.fortran jameskuyper <> wrote:
    < glen herrmannsfeldt wrote:
    <> In comp.lang.fortran jameskuyper <> wrote:
    < ...

    <> < The closest C equivalent involves a function call,
    <> < rather than an assignment.

    <> It involves function call syntax, though not necessarily (as I
    <> understand the C standard) an actual function call.

    < True; but it's a distinction of negligible importance; it's possible
    < for a C compiler to inline some or all of any call to a function if
    < the definition of that function is known to the compiler;

    I would agree, except that the OP question is pretty much
    a question of syntax, not of the underlying implementation.

    The non-function-call syntax of the other languages mentioned
    may actually be implemented as a function call.

    < highly
    < optimizing C compilers inline more code than you might expect. In
    < practice, it's simpler to just refer to them as function calls, and
    < not worry about the details of what the compiler actually does with
    < them.

    (snip on memcpy vs. strncpy)

    < The desired functionality is underdefined; the difference between your
    < version and mine matters only if "value" is shorter than the substring
    < it is replacing. My version has possible undefined behavior in that
    < case; yours avoids that, at the cost of being very marginally slower
    < for large sub strings. What does fortran code given above do in that
    < case?

    The general rule is that Fortran pads with blanks. When CHARACTER
    was added to Fortran in Fortran 77 the lengths were always known
    at compile time. CHARACTER variables had a fixed length and were
    padded with blanks when a shorter value was stored. I believe that
    is true for the OP examples of substring assignment, but I am not
    so sure in all possible cases.

    -- glen
     
    glen herrmannsfeldt, May 19, 2009
    #6
  7. LC's No-Spam Newsreading account

    jameskuyper Guest

    glen herrmannsfeldt wrote:
    ....
    > The general rule is that Fortran pads with blanks. When CHARACTER
    > was added to Fortran in Fortran 77 the lengths were always known
    > at compile time. CHARACTER variables had a fixed length and were
    > padded with blanks when a shorter value was stored. I believe that
    > is true for the OP examples of substring assignment, but I am not
    > so sure in all possible cases.


    The function call

    sprintf(string+i, "%-*.*s%s", j+1-i, j+1-i, value, string+j);

    would handle padding with blanks as you describe if value were too
    short, if it weren't for the fact that it has undefined behavior
    (because of the overlap between the output string and the input
    string). If newstring were a seperate char array containing enough
    space to store the result, or a pointer to the first element of such
    an array, then

    sprintf(newstring, "%*.*s%-*.*s%s", i, i, string, j+1-i, j+1-i,
    value, string+j);

    would do the job with well-defined behavior.
    You can write C code to do just about anything you want with a string;
    but C's built-in string-oriented capabilities are not in the same
    league as those of, say, perl.
     
    jameskuyper, May 19, 2009
    #7
  8. "jameskuyper" <> wrote in message
    news:...

    > LC's No-Spam Newsreading account wrote:
    >> I am trying to collect equivalent statements in various languages.


    >> I am now dealing with substring assignment.


    >> Let us assume that string="abcd", value="AB", i=2, j=3 (I won't
    >> scandalize if in your favourite language it is i=1 j=2 :=). Nor if one
    >> has to use n=j-i+1 (length("AB")).


    >> I want an assignment which returns string="aABd"


    But why didn't you more properly torture c.l.c. with an example
    like:

    C:\gfortran\clf\Cstr>type Cstr.f90
    program Cstr
    implicit none
    character(*), parameter :: abcd = 'abcd'
    character(len(abcd)) string(8)
    character(*), parameter :: AB = 'AB'
    integer pos

    string = abcd
    pos = 2
    string:))(pos:pos+len(AB)-1) = AB
    write(*,'(a)') string
    end program Cstr

    C:\gfortran\clf\Cstr>gfortran Cstr.f90 -oCstr

    C:\gfortran\clf\Cstr>Cstr
    aABd
    aABd
    aABd
    aABd
    aABd
    aABd
    aABd
    aABd

    ?

    --
    write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
    6.0134700243160014d-154/),(/'x'/)); end
     
    James Van Buskirk, May 20, 2009
    #8
  9. LC's No-Spam Newsreading account

    BartC Guest

    "James Van Buskirk" <> wrote in message
    news:guvg2o$393$-september.org...
    > "jameskuyper" <> wrote in message
    > news:...
    >
    >> LC's No-Spam Newsreading account wrote:
    >>> I am trying to collect equivalent statements in various languages.

    >
    >>> I am now dealing with substring assignment.

    >
    >>> Let us assume that string="abcd", value="AB", i=2, j=3 (I won't
    >>> scandalize if in your favourite language it is i=1 j=2 :=). Nor if one
    >>> has to use n=j-i+1 (length("AB")).

    >
    >>> I want an assignment which returns string="aABd"

    >
    > But why didn't you more properly torture c.l.c. with an example
    > like:


    > character(*), parameter :: AB = 'AB'
    > string:))(pos:pos+len(AB)-1) = AB
    > write(*,'(a)') string

    ....

    I don't get this. Didn't Fortran use to be even more crude and basic than C?

    So how did the Fortran committees manage to bring it into the 21st century
    (and still call it Fortran), while C is still languishing in the 1970s it
    seems?

    --
    Bart
     
    BartC, May 20, 2009
    #9
  10. On Tue, 19 May 2009, Richard Maine wrote:
    > LC's No-Spam Newsreading account wrote:
    >
    >> I came out with the examples below, but cannot find a satisfactory one
    >> for C using standard library functions.

    > ...
    >> Please note followup to clf

    >
    > Why? It is a question about how to do something in C.


    Apologies if this seemed inappropriate. My reasons were twofold. One is
    that I read sort of regularly clf and keep it "caught up" while I do not
    read regularly clc. The other one is that I wanted to get an answer from
    somebody who was fluent in both languages (my question was not only "how
    to do something in C" but "how to do in C something similar to the way
    one does in Fortran"), say "the intersection of knowledgeable users on
    clf and clc".

    > Do you really expect comp.lang.fortran to be the most appropriate
    > place to discuss how to do something in C? That makes it sound to me
    > just like an invitation for language flaming


    I wanted to avoid flames, and apparently have been successful, all the
    replies are rather technical and up to the point ! Thanks !

    --
    ----------------------------------------------------------------------
    is a newsreading account used by more persons to
    avoid unwanted spam. Any mail returning to this address will be rejected.
    Users can disclose their e-mail address in the article if they wish so.
     
    LC's No-Spam Newsreading account, May 20, 2009
    #10
  11. LC's No-Spam Newsreading account

    Richard Bos Guest

    jameskuyper <> wrote:

    > If you're worried about the void* data type returned by memcpy(), you
    > can convert it:
    >
    > (char*)memcpy(string+i, value, j+1-i)


    But don't do that, because adding unnecessary casts is a very bad idea,
    which will trip you up on those few occasions when you _do_ need one.

    Richard
     
    Richard Bos, May 20, 2009
    #11
  12. LC's No-Spam Newsreading account

    Guest

    In article <GvPQl.31248$>,
    BartC <> wrote:
    >
    >"James Van Buskirk" <> wrote in message
    >news:guvg2o$393$-september.org...
    >
    >I don't get this. Didn't Fortran use to be even more crude and basic than C?


    That's a fair claim - though it's also disputable.

    >So how did the Fortran committees manage to bring it into the 21st century
    >(and still call it Fortran), while C is still languishing in the 1970s it
    >seems?


    By vendor innovation, using traditional ISO standards' methodology,
    being prepared to accept lessons from other languages, and hard work.


    Regards,
    Nick Maclaren.
     
    , May 20, 2009
    #12
  13. LC's No-Spam Newsreading account

    James Kuyper Guest

    LC's No-Spam Newsreading account wrote:
    > On Tue, 19 May 2009, Richard Maine wrote:
    >> LC's No-Spam Newsreading account wrote:
    >>
    >>> I came out with the examples below, but cannot find a satisfactory one
    >>> for C using standard library functions.

    >> ...
    >>> Please note followup to clf

    >>
    >> Why? It is a question about how to do something in C.

    >
    > Apologies if this seemed inappropriate. My reasons were twofold. One is
    > that I read sort of regularly clf and keep it "caught up" while I do not
    > read regularly clc.


    If you had not set followup to c.l.f, you would still have seen all
    replies on either group; there would be no need for you to read c.l.c,
    but those of use who read c.l.c and not c.l.f could still see the responses.

    > ... The other one is that I wanted to get an answer from
    > somebody who was fluent in both languages (my question was not only "how
    > to do something in C" but "how to do in C something similar to the way
    > one does in Fortran"), say "the intersection of knowledgeable users on
    > clf and clc".


    If that's what you were looking for, you could have single-posted to
    either group. Those in that intersection would have seen it, no matter
    which of the two groups you posted it to. Since you follow c.l.f, that
    would have been the appropriate one. I think you would have gotten fewer
    answers that way, less quickly, and possibly less accurate ones.

    >> Do you really expect comp.lang.fortran to be the most appropriate
    >> place to discuss how to do something in C? That makes it sound to me
    >> just like an invitation for language flaming

    >
    > I wanted to avoid flames, and apparently have been successful, all the
    > replies are rather technical and up to the point ! Thanks !


    I think your question requires more C expertise than Fortran expertise,
    so cross-posting it (and NOT re-directing followups) would have been
    more appropriate.
     
    James Kuyper, May 20, 2009
    #13
  14. LC's No-Spam Newsreading account

    Guest

    On 20 May, 10:40, wrote:
    > In article <GvPQl.31248$>,
    > BartC <> wrote:


    <snip>

    > >I don't get this. Didn't Fortran use to be even more crude and basic than C?


    > That's a fair claim - though it's also disputable.


    Hollerith strings

    <snip>
     
    , May 20, 2009
    #14
  15. "LC's No-Spam Newsreading account" <> wrote in message
    news:...
    >I am trying to collect equivalent statements in various languages.
    >
    > I am now dealing with substring assignment.
    >
    > Let us assume that string="abcd", value="AB", i=2, j=3 (I won't scandalize
    > if in your favourite language it is i=1 j=2 :=). Nor if one has to use
    > n=j-i+1 (length("AB")).
    >
    > I want an assignment which returns string="aABd"
    > [...]
    > But what about C ?
    > [...]



    You can try this out this very crude little program:
    _____________________________________________________________________
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <assert.h>


    char*
    static_substr_assign_ver1(
    char* str,
    char const* sstr,
    char const* rstr
    ) {
    char* pos = strstr(str, sstr);
    assert(str && sstr && rstr);
    if (pos) {
    size_t slen = strlen(sstr);
    if (slen) {
    size_t rlen = strlen(rstr);
    if (rlen) {
    if (rlen > slen) {
    rlen = slen;
    }
    memcpy(pos, rstr, rlen);
    return str;
    }
    }
    }
    return NULL;
    }


    char*
    static_substr_assign_ver2(
    char* str,
    char const* rstr,
    size_t si,
    size_t ei
    ) {
    size_t len = strlen(str);
    assert(str && rstr && si <= len - 1 &&
    ei <= len - 1 && ei >= si);
    if (str + ei < str + len && ei >= si) {
    size_t rlen = strlen(rstr);
    if (rlen) {
    if (rlen > ei - si + 1) {
    rlen = ei - si + 1;
    }
    memcpy(str + si, rstr, rlen);
    return str;
    }
    }
    return NULL;
    }


    int main(void) {
    char str[] = "xyzm";

    puts(static_substr_assign_ver1(str, "xyz", "Abc"));
    puts(static_substr_assign_ver2(str, "D", 3, 3));
    puts(static_substr_assign_ver2(str, "a", 0, 0));
    puts(static_substr_assign_ver1(str, "D", "d"));
    puts(static_substr_assign_ver2(str, "AB", 1, 2));

    return 0;
    }
    _____________________________________________________________________




    I attempted to put some quick and dirty validity/sanity checks in the
    functions in order to prevent buffer overruns. I quickly typed this in, and
    probably missed something!

    ;^o
     
    Chris M. Thomasson, May 20, 2009
    #15
  16. Given that the reason for my attempt
    >> > to collect equivalent statements in various languages.

    was aimed to
    > sort of the phrase books a traveller in a foreign country uses. It can
    > be useful to somebody (like me) who is fluent in 2-3 languages and
    > uses other languages only sometimes,


    .... so to write down some "standard idioms" which I can look up in order
    to avoid silly mistakes when using one of the unfamiliar languages.

    .... let me see if I can summarize all the helpful hints I've received
    (and archived).

    I wanted to collect statements equivalent to Fortran assignment to a
    substring : string(i:j)=value

    where all items (string, value, i, j) are variables of the appropriate
    type. I want string to be modified in place ( "abcd" -> "aABd" ), and
    I'm looking for the most compact or most legible form (ideally both)
    which uses native features. I.e. I do not care whether it is an
    assignment or a function call, or even a few statements, but I would
    like NOT to consider an user-written function (unless that's The Only
    Way). Also I do not care at this stage about error handling when strings
    are too long or too short, null padding or blank padding, etc.

    Now both the proposed C equivalents listed below match my requirement

    memcpy(string+i, value, j+1-i);
    strncpy(string+i, value, j+1-i);

    BUT there is a caveat which I'd like to be confirmed about the way
    "strings" shall be defined in C.

    I've even seen people using "typedef char * string;"

    I can use declarations like :

    1) char *a
    2) char a[]
    3) char a[somenumber]

    I am not at all scandalized by the third form (I'm used since ever to
    the Fortran CHARACTER*somenumber A), but I thought that C could have
    "variable-and-dynamic-length" null terminated strings, contrary to the
    more rigid fixed-length strings of Fortran,

    Now the point comes to whether I add initialization (DATA statement in
    my Fortran parliance) and assignment.

    I could initialize string a adding e.g. ="123456" to the declaration of
    form (1) and (2), but not (3). Also I am OBLIGED to do the
    initialization with the undefined length array notation a[] ( I *must*
    use (2D), while (2) gives compiler error 'array size missing' )

    1D) char *a="123456" ;
    2D) char a[]="123456" ;

    If I do not initialize (forms (1) and (3)) I can later assign a value.

    But form (1) requires the assignment as a="123456", while form (3)
    requires instead strcpy(a,"123456").

    What is more important, the first argument (destination) of strcpy
    cannot be a dynamic length string (1) i.e. char *a ! If it is one gets a
    segmentation fault. It must be a character array (2) or (3).

    Otherwise said I cannot declare a string of undefined length as char a[]
    unless I also initialize it (like CHARACTER*(*) valid only for a
    PARAMETER constant in a main).

    Is all this correct ?


    So the shortest main program which demonstrates my case (where all items
    are variables assigned explicitly a value, not just initialized) is

    int i,j ;
    char a[4] ; /* must use a maximum size */
    char *b ; /* no size implied */
    strcpy(a,"abcd") ; /* value assigned later THUS */
    b = "AB" ; /* value assigned later THUS */
    printf("a = %s\n", a);
    i=1 ; j=2 ;
    strncpy(a+i,b,j+1-i ) ; /* or memcpy */
    printf("a = %s\n", a);


    --
    ----------------------------------------------------------------------
    is a newsreading account used by more persons to
    avoid unwanted spam. Any mail returning to this address will be rejected.
    Users can disclose their e-mail address in the article if they wish so.
     
    LC's No-Spam Newsreading account, May 21, 2009
    #16
  17. LC's No-Spam Newsreading account

    jameskuyper Guest

    LC's No-Spam Newsreading account wrote:
    ....
    > I've even seen people using "typedef char * string;"


    Such typedefs reflect and reinforce the misconception that C has a
    string data type. It does not. It does have a string data format, but
    you can use many different C constructs to store data in that format.
    A char* can be used to point at the first element of a string, but it
    is not itself a string.

    > I can use declarations like :
    >
    > 1) char *a
    > 2) char a[]
    > 3) char a[somenumber]
    >
    > I am not at all scandalized by the third form (I'm used since ever to
    > the Fortran CHARACTER*somenumber A), but I thought that C could have
    > "variable-and-dynamic-length" null terminated strings, contrary to the
    > more rigid fixed-length strings of Fortran,


    The difference between 2) and 3) is entirely in how the fixed length
    of the array is determined. They both have a fixed length. They both
    can contain strings of any length up to but not including the length
    of the array. They are both capable of containing multiple strings,
    which is an example of the fact that, in C, "string" is a data format,
    not a data type.

    > Now the point comes to whether I add initialization (DATA statement in
    > my Fortran parliance) and assignment.
    >
    > I could initialize string a adding e.g. ="123456" to the declaration of
    > form (1) and (2), but not (3).


    You're incorrect about (3). The definition

    char a[5] = "123456";

    would be a constraint violation. However, the definitions

    char b[6] = "123456";
    char c[7] = "123456";
    char d[8] = "123456";

    are all perfectly fine. Note: b does not contain a string, since it
    has no terminating null character.

    > ... Also I am OBLIGED to do the
    > initialization with the undefined length array notation a[] ( I *must*
    > use (2D), while (2) gives compiler error 'array size missing' )


    That's because it is not an "undefined length array", it's an
    implicitly defined length, and without the initializer there's nothing
    to implicitly define the length.

    > 1D) char *a="123456" ;
    > 2D) char a[]="123456" ;
    >
    > If I do not initialize (forms (1) and (3)) I can later assign a value.


    No, only the pointer can be assigned to. You can assign to the
    elements of the arrays in case (2) and (3), and by doing so you can
    create one or more strings in them. However, this is true whether or
    not you initialize them.

    > But form (1) requires the assignment as a="123456", while form (3)
    > requires instead strcpy(a,"123456").


    No, there are many different ways to assign a value to the pointer.
    The key point to keep in mind is that declaring a pointer doesn't
    initialize any memory for a character string. That has to be done
    separately; for instance, using the string literal "123456" causes an
    unnamed array to be created to contain the corresponding string, and
    using that string literal to initialize a char* variable causes that
    variable to be set to point at the the first element of the array. But
    that pointer could be set to point at any other char in that array, or
    in any other char array, for that matter.

    strcpy() is one way to copy a string from one array to another, but
    there are many others. It works just as well for (2) as for (3).

    > What is more important, the first argument (destination) of strcpy
    > cannot be a dynamic length string (1)


    Incorrect. If the pointer were set to point at writable memory (which
    it currently is not - the arrays created by using string literals are
    not safely writeable), strcpy() could also be used to copy the string
    into whichever location in memory it is currently pointing at.

    > ... i.e. char *a ! If it is one gets a
    > segmentation fault. ...


    That is true only if it points at a memory segment that you don't
    currently have permission to write to. Whether or not this is the case
    for the arrays created to store string literals is up to the
    implementation, which is why its not safe to assume that you can write
    to them.

    > ... It must be a character array (2) or (3).
    >
    > Otherwise said I cannot declare a string of undefined length as char a[]
    > unless I also initialize it (like CHARACTER*(*) valid only for a
    > PARAMETER constant in a main).
    >
    > Is all this correct ?


    Not really. You've confused the issue by using the same name for all
    three cases. Let me distinguish them as follows:

    char *pc = "123456";
    char imp_length[] = "123456";
    char exp_length[7] = "123456";

    Any use of the string literal "123456" anywhere in your program causes
    at least one unnamed array of char to be created, initialized with the
    valued '1', '2', '3', '4', '5', '6', '\0', in that order. It's
    entirely up to the implementation whether or not all uses of "123456"
    refer to the same array, or whether each such use refers to a
    different array. In addition, it's entirely up to the implementation
    whether or not the array created for "123456" occupies the same
    location in memory as the last seven elements of the array created for
    "0123456". The behavior of any program that attempts to write anything
    into one of those blocks of memory is undefined.

    The variable named pc is a pointer that is initialized to point at the
    first character in one of those blocks of memory. It could, at any
    later time, be re-set to point at some other piece of memory. The
    following statement:

    pc = &imp_length[3] ;

    causes pc to point at the char within imp_length which has the value
    '4'. Here's where the difference between a data type and a data format
    comes into play: &imp_length[n] is itself a pointer to the first
    character of a string with a length of 5-n, for any value of n from 0
    to 5. All of those strings share the same terminating null character.
    five of them share the same '5' character, etc. Until you understand
    that statement, you really don't understand what C strings are.

    imp_length is an array of 7 characters; the length is determined
    implicily by counting the characters in the string literal "123456",
    and adding 1 for the terminating null character. That array is filled
    in by copying from the array used store the string literal. In this
    case, there's no way for your program to even determine whether the
    string literal's array actually exists; which means that in some cases
    it won't actually exist; the only copy of those characters could be in
    imp_length itself. Having been initialized with "123456", you're free
    to change the contents of that array; in particular, the statement

    imp_length[3] = '\0';

    means that it no longer contains a string of length 6. It now starts
    with a string of length 3; and contains another string of length 2
    starting at &imp_length[4]. It also contains 5 other strings, but
    they're just subsets of those two strings.

    exp_length is an array of 7 characters, just like imp_length. They
    have different names and different locations, but once defined, they
    have the same type and can be used in the same way. The only
    difference between them is how the length of the array is determined,
    and how it is initialized. If exp_length were initialized with
    "12345", there would be two '\0' characters at the end, rather than
    none. If the initializer were "1234567", the '7' would be copied into
    the last element of the array, and the array would not contain a
    string, because it would lack the required terminating null character
    required for strings. If the initializer were "12345678", it would be
    a constraint violation.

    > So the shortest main program which demonstrates my case (where all items
    > are variables assigned explicitly a value, not just initialized) is
    >
    > int i,j ;
    > char a[4] ; /* must use a maximum size */
    > char *b ; /* no size implied */


    Also, no memory allocated for a string, and no value has been assigned
    to the pointer. It is therefore NOT safe to use 'b' in any way until
    it has been initialized.

    > strcpy(a,"abcd") ; /* value assigned later THUS */


    This copies the first four characters from the array created for the
    string literal "abcd" into the array you've defined named 'a'. It then
    tries to copy the terminating null character, but finds that there is
    no room for it. The behavior of your program is therefore undefined.
    In practice, that null character might get written somewhere where it
    can cause a great deal of trouble, or it might get written somewhere
    completely innocuous. It's also possible that it will not get written,
    an event that might or might not cause your program to abort.

    > b = "AB" ; /* value assigned later THUS */


    This sets b to point at the 'A' character in the array set aside for
    the string literal "AB".

    > printf("a = %s\n", a);


    Because of the way typical compilers work, if you reach this point in
    the code, there's a pretty good chance that this will accidentally
    work as you expected it to, despite the erroneous strcpy() call, but
    you shouldn't count on it.
     
    jameskuyper, May 21, 2009
    #17
  18. LC's No-Spam Newsreading account <> writes:
    <snip>
    > ... let me see if I can summarize all the helpful hints I've received
    > (and archived).
    >
    > I wanted to collect statements equivalent to Fortran assignment to a
    > substring : string(i:j)=value
    >
    > where all items (string, value, i, j) are variables of the appropriate
    > type. I want string to be modified in place ( "abcd" -> "aABd" ), and
    > I'm looking for the most compact or most legible form (ideally both)
    > which uses native features. I.e. I do not care whether it is an
    > assignment or a function call, or even a few statements, but I would
    > like NOT to consider an user-written function (unless that's The Only
    > Way). Also I do not care at this stage about error handling when
    > strings are too long or too short, null padding or blank padding, etc.
    >
    > Now both the proposed C equivalents listed below match my requirement
    >
    > memcpy(string+i, value, j+1-i);
    > strncpy(string+i, value, j+1-i);
    >
    > BUT there is a caveat which I'd like to be confirmed about the way
    > "strings" shall be defined in C.


    It's a big one. This is no string type in C. A string is data format
    in an array.

    > I've even seen people using "typedef char * string;"
    >
    > I can use declarations like :
    >
    > 1) char *a
    > 2) char a[]
    > 3) char a[somenumber]


    None of these are strings, of course.

    > I am not at all scandalized by the third form (I'm used since ever to
    > the Fortran CHARACTER*somenumber A), but I thought that C could have
    > "variable-and-dynamic-length" null terminated strings, contrary to the
    > more rigid fixed-length strings of Fortran,
    >
    > Now the point comes to whether I add initialization (DATA statement in
    > my Fortran parliance) and assignment.
    >
    > I could initialize string a adding e.g. ="123456" to the declaration
    > of form (1) and (2), but not (3).


    You can do that for (3) as well. If somenumber is > 6 (not <= 6) then
    you get a string in the array -- it will be null terminated. If there
    is no room for the null you get a character array filled with
    characters rather than a string.

    > Also I am OBLIGED to do the
    > initialization with the undefined length array notation a[] ( I *must*
    > use (2D), while (2) gives compiler error 'array size missing' )
    >
    > 1D) char *a="123456" ;
    > 2D) char a[]="123456" ;
    >
    > If I do not initialize (forms (1) and (3)) I can later assign a
    > value.


    Hmm... Not in case (3). Assignment means using = and arrays can't be
    assigned. Some people talk of strcpy(a, "123") as assigning a string
    to the array, but that is loose talk at best. It is just the wrong
    word.

    > But form (1) requires the assignment as a="123456", while form (3)
    > requires instead strcpy(a,"123456").


    Oh, you're one of them! Sorry. No, don't call it that. You are
    copying a string form one place to another.

    > What is more important, the first argument (destination) of strcpy
    > cannot be a dynamic length string (1) i.e. char *a ! If it is one gets
    > a segmentation fault. It must be a character array (2) or (3).


    No. char *a; just declared the variable 'a' to be a pointer to some
    space. Until you assign it (yes, real assignment) it does not point
    to any valid address (technically the pointer is indeterminate). To
    use dynamic strings, you have to do all the allocation yourself. For
    example, to copy b to a:

    a = malloc(strlen(b) + 1);
    if (a)
    strcpy(a, b);

    > Otherwise said I cannot declare a string of undefined length as char
    > a[] unless I also initialize it (like CHARACTER*(*) valid only for a
    > PARAMETER constant in a main).


    The trouble is, as I said, there is no string type. C operates at a
    lower level than this. You can declare character arrays and, if you
    are careful, you can ensure they always contains valid strings; but it
    is your job to do all the work.

    > Is all this correct ?


    Not as bad as it seems from my comments.

    > So the shortest main program which demonstrates my case (where all
    > items are variables assigned explicitly a value, not just initialized)
    > is
    >
    > int i,j ;
    > char a[4] ; /* must use a maximum size */
    > char *b ; /* no size implied */
    > strcpy(a,"abcd") ; /* value assigned later THUS */


    BANG! This copies 5 bytes from the literal string to a. You have
    space for 4. I'd have written:

    char a[] = "abcd";

    but you wanted to illustrate strcpy, I know.

    > b = "AB" ; /* value assigned later THUS */
    > printf("a = %s\n", a);
    > i=1 ; j=2 ;
    > strncpy(a+i,b,j+1-i ) ; /* or memcpy */
    > printf("a = %s\n", a);


    --
    Ben.
     
    Ben Bacarisse, May 21, 2009
    #18
  19. In comp.lang.fortran Ben Bacarisse <> wrote:

    <> Also I am OBLIGED to do the
    <> initialization with the undefined length array notation a[] ( I *must*
    <> use (2D), while (2) gives compiler error 'array size missing' )

    <> 1D) char *a="123456" ;
    <> 2D) char a[]="123456" ;

    In case comp.lang.fortran readers are not familiar with this, it
    dimensions a of the correct size to hold the initialization, where
    "123456" is short for {'1','2','3','4','5','6',0}, and can be
    used for initialized arrays of any type. As I have heard, computers
    are better at counting than people, and requiring one to get the
    right dimension for the appropriate number of initialization values
    does not help the programmer. Does Fortran have a way to dimension
    an array of the appropriate length for its initial value?

    <> If I do not initialize (forms (1) and (3)) I can later assign a
    <> value.

    < Hmm... Not in case (3). Assignment means using = and arrays can't be
    < assigned. Some people talk of strcpy(a, "123") as assigning a string
    < to the array, but that is loose talk at best. It is just the wrong
    < word.

    <> But form (1) requires the assignment as a="123456", while form (3)
    <> requires instead strcpy(a,"123456").

    < Oh, you're one of them! Sorry. No, don't call it that. You are
    < copying a string form one place to another.

    Are you disqualifying it because of function notation, or because
    it is a function call? Compilers may implement it inline, and
    Fortran CHARACTER assignment may be implemented internally
    as a function call. Yes it is different, but not that different.

    Does a="123456" in Fortran "copy" a string?

    <> What is more important, the first argument (destination) of strcpy
    <> cannot be a dynamic length string (1) i.e. char *a ! If it is one gets
    <> a segmentation fault. It must be a character array (2) or (3).

    < No. char *a; just declared the variable 'a' to be a pointer to some
    < space. Until you assign it (yes, real assignment) it does not point
    < to any valid address (technically the pointer is indeterminate). To
    < use dynamic strings, you have to do all the allocation yourself. For
    < example, to copy b to a:

    < a = malloc(strlen(b) + 1);
    < if (a)
    < strcpy(a, b);

    <> Otherwise said I cannot declare a string of undefined length as char
    <> a[] unless I also initialize it (like CHARACTER*(*) valid only for a
    <> PARAMETER constant in a main).

    As has been said, this does not generate an array of undefined
    length, it has the appropriate length for its initial value.

    C doesn't have SIZE, but you can use sizeof() to determine
    the size, which is a compile time constant. (sizeof(a)/sizeof(*a))
     
    glen herrmannsfeldt, May 21, 2009
    #19
  20. LC's No-Spam Newsreading account

    Dan Nagle Guest

    On 2009-05-21 16:30:59 -0400, glen herrmannsfeldt <> said:

    > Does Fortran have a way to dimension
    > an array of the appropriate length for its initial value?


    character( len= *), parameter :: name = 'value'

    As of f08, array constants may be set to the correct size
    for their initial value.

    Allocatable arrays are set to the correct size automatically
    on assignment.

    --
    Cheers!

    Dan Nagle
     
    Dan Nagle, May 21, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. nagy
    Replies:
    36
    Views:
    1,011
    Terry Reedy
    Jul 20, 2006
  2. Luna Moon
    Replies:
    9
    Views:
    610
    Guest
    Sep 4, 2007
  3. Kevin Walzer

    Re: PIL (etc etc etc) on OS X

    Kevin Walzer, Aug 1, 2008, in forum: Python
    Replies:
    4
    Views:
    408
    Fredrik Lundh
    Aug 13, 2008
  4. deadpickle
    Replies:
    1
    Views:
    987
    Jens Thoms Toerring
    Nov 7, 2010
  5. Replies:
    3
    Views:
    203
    Sherm Pendley
    Aug 3, 2005
Loading...

Share This Page