An interview question by MS

Discussion in 'C Programming' started by Lambda, Nov 6, 2007.

  1. Lambda

    Lambda Guest

    In an interview, I was asked:

    Define a function to split a string with some token.

    I said let the function return a string array.

    He asked me how do i know the array size.
    Yes, I have to traverse the string to find how large the array is for
    the first time.
    And traverse the string when actually doing the split for the second
    time.

    I think this is maybe not the correct answer.

    How should i handle such situation?

    I find there is a 'strtok(s, ct)' function in the standard library.
    The first cll in a sequence has a non-NULL s.
    Each subsequent call, indicated by a NULL value of s.
    strtok returns NULL when no further token is found.

    Why indeed this function defined this way?
    Why not return a string array directly?
    Maybe this is the correct answer??
     
    Lambda, Nov 6, 2007
    #1
    1. Advertising

  2. Lambda said:

    > In an interview, I was asked:
    >
    > Define a function to split a string with some token.
    >
    > I said let the function return a string array.


    Strictly speaking, that's impossible. What you *can* do is return a pointer
    to the first element in an array (which is what many people actually mean
    when they say "return an array"). But I'm not convinced that this is the
    best solution.

    Note that "a string array" is in itself rather ambiguous. I would read it
    as "an array of strings", i.e. either an array of arrays containing
    strings or an array of pointers to char, each of which points at (the
    first character of) a string. But you might have intended to return one
    string per call, the same way that strtok does (in which case why not just
    use strtok or a variant thereof?). I'll assume, however, that you meant
    "an array of strings".

    > He asked me how do i know the array size.
    > Yes, I have to traverse the string to find how large the array is for
    > the first time.
    > And traverse the string when actually doing the split for the second
    > time.


    No, you don't need to do this in two passes. You can build the array
    dynamically, extending it as and when required.

    How to tell the *caller* the array size bears thinking about. One way is to
    wrap the whole thing in a struct, and either have your function accept a
    pointer to such a struct or, perhaps better, allocate the memory for it
    yourself and return a pointer to it.

    The way I would do this is to start by encapsulating the tokenisation in
    such a struct, and writing a function to create and initialise it. I would
    then write a function that accepts a pointer to such a struct, together
    with the string to be tokenised and the token delimiter list. It would
    then do the tokenisation, creating strings as required, and loading them
    into the array.

    At every stage, I would think about the most appropriate way to encapsulate
    each lower-level concept as I come across it (or, more likely, I'd re-use
    something from my personal library that already encapsulates it). I would
    anticipate the high-level tokenisation function doing little more than
    calling existing custom routines in the right order and for the right
    number of times.

    <snip>

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
     
    Richard Heathfield, Nov 6, 2007
    #2
    1. Advertising

  3. In article <>,
    Lambda <> wrote:

    >He asked me how do i know the array size.
    >Yes, I have to traverse the string to find how large the array is for
    >the first time.
    >And traverse the string when actually doing the split for the second
    >time.


    That's a plausible way to do it. But since you are going to have to
    dynamically allocate the array, you could allocate a small size
    initially and call realloc() to grow it when needed. In fact, you
    could allocate 1 entry initially and call realloc() every time,
    relying on the library to have a reasonable realloc() implementation.

    >I find there is a 'strtok(s, ct)' function in the standard library.
    >The first cll in a sequence has a non-NULL s.
    >Each subsequent call, indicated by a NULL value of s.
    >strtok returns NULL when no further token is found.
    >
    >Why indeed this function defined this way?
    >Why not return a string array directly?
    >Maybe this is the correct answer??


    It has the advantage of not allocating memory, which may be useful.
    It has the substantial disadvantage that it keeps hidden state - the
    current position in the string - so you can't interleave calls to
    strtok() with different strings. Some implementations have a version
    strtok_r() with an extra argument to avoid this.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
     
    Richard Tobin, Nov 6, 2007
    #3
  4. Lambda

    santosh Guest

    On Tuesday 06 Nov 2007 1:50 pm Lambda <> wrote in
    article <>:

    > In an interview, I was asked:
    >
    > Define a function to split a string with some token.


    The Standard library strtok does this, but if you're going to reinvent
    the functionality you can do better.

    > I said let the function return a string array.


    This is one possibility.

    > He asked me how do i know the array size.
    > Yes, I have to traverse the string to find how large the array is for
    > the first time.
    > And traverse the string when actually doing the split for the second
    > time.
    >
    > I think this is maybe not the correct answer.


    There is no "the correct" answer. Barring obviously wrong methods, each
    design may be appropriate to a certain situation. strtok() design is
    one possibility, though as I noted, you can improve upon it if you are
    writing from scratch.

    In general, other than modifying the original string like strtok(),
    there are two methods. One is for the splitting function to allocate
    memory itself and return the token. In this case the caller is usually
    left with the responsibility of freeing the memory, but it allows for
    efficient use of memory(?). The other method is to have the caller
    provide a buffer to store the token into. This allows for memory to be
    reused easily. Both methods can also be combined.

    There are also other less favourable options like using static and
    global arrays etc., usually bad design, particularly for a library
    function.

    > How should i handle such situation?


    Did they ask you to write the function then and there during the
    interview?

    Usually the best general answer is to discuss the pros and cons of
    various methods. It shows that you are aware of the subject which is
    actually a problem not just confined to strtok().

    > I find there is a 'strtok(s, ct)' function in the standard library.
    > The first cll in a sequence has a non-NULL s.
    > Each subsequent call, indicated by a NULL value of s.
    > strtok returns NULL when no further token is found.
    >
    > Why indeed this function defined this way?


    You should ask the original developers whoever they were.

    > Why not return a string array directly?
    > Maybe this is the correct answer??


    As I said there is no one correct answer, though there are many wrong
    answers.
     
    santosh, Nov 6, 2007
    #4
  5. On Tue, 06 Nov 2007 08:20:51 +0000, Lambda wrote:

    > In an interview, I was asked:
    >
    > Define a function to split a string with some token.
    >

    <snip>
    > I find there is a 'strtok(s, ct)' function in the standard library.
    > The first cll in a sequence has a non-NULL s.
    > Each subsequent call, indicated by a NULL value of s.
    > strtok returns NULL when no further token is found.
    >
    > Why indeed this function defined this way?
    > Why not return a string array directly?
    > Maybe this is the correct answer??

    Note that the separators can vary between calls to
    strtok on the same base string. They could for example depend
    on previous tokens (for example some strings might contain
    numerical fields separated by spaces, others (eg times) might
    contain numerical fields separated by ':'). This would be
    awkward to arrange with a splitter that processed the whole
    string in a oner.
     
    Duncan Muirhead, Nov 6, 2007
    #5
  6. Lambda

    Guest

    On Nov 6, 2:20 am, Lambda <> wrote:
    > In an interview, I was asked:
    >
    > Define a function to split a string with some token.
    >
    > I said let the function return a string array.
    >
    > He asked me how do i know the array size.
    > Yes, I have to traverse the string to find how large the array is for
    > the first time.
    > And traverse the string when actually doing the split for the second
    > time.
    >
    > I think this is maybe not the correct answer.
    >
    > How should i handle such situation?
    >
    > I find there is a 'strtok(s, ct)' function in the standard library.
    > The first cll in a sequence has a non-NULL s.
    > Each subsequent call, indicated by a NULL value of s.
    > strtok returns NULL when no further token is found.
    >
    > Why indeed this function defined this way?
    > Why not return a string array directly?
    > Maybe this is the correct answer??



    I find it highly improbably that an MS interview question was intended
    to elicit a C response. Surely the desired response was to create and
    return a C++ container object in some fashion.
     
    , Nov 6, 2007
    #6
  7. Lambda

    santosh Guest

    On Tuesday 06 Nov 2007 3:32 pm
    <> wrote in article
    <>:

    > On Nov 6, 2:20 am, Lambda <> wrote:
    >> In an interview, I was asked:
    >>
    >> Define a function to split a string with some token.
    >>
    >> I said let the function return a string array.
    >>
    >> He asked me how do i know the array size.
    >> Yes, I have to traverse the string to find how large the array is for
    >> the first time.
    >> And traverse the string when actually doing the split for the second
    >> time.
    >>
    >> I think this is maybe not the correct answer.
    >>
    >> How should i handle such situation?
    >>
    >> I find there is a 'strtok(s, ct)' function in the standard library.
    >> The first cll in a sequence has a non-NULL s.
    >> Each subsequent call, indicated by a NULL value of s.
    >> strtok returns NULL when no further token is found.
    >>
    >> Why indeed this function defined this way?
    >> Why not return a string array directly?
    >> Maybe this is the correct answer??

    >
    >
    > I find it highly improbably that an MS interview question was intended
    > to elicit a C response. Surely the desired response was to create and
    > return a C++ container object in some fashion.


    MS is over C++. They would probably want a C#/CLI/.NET managed class
    implemented in it's own Sandbox. :)
     
    santosh, Nov 6, 2007
    #7
  8. Lambda

    Lambda Guest

    On Nov 6, 6:02 pm, "" <>
    wrote:
    > On Nov 6, 2:20 am, Lambda <> wrote:
    >
    >
    >
    > > In an interview, I was asked:

    >
    > > Define a function to split a string with some token.

    >
    > > I said let the function return a string array.

    >
    > > He asked me how do i know the array size.
    > > Yes, I have to traverse the string to find how large the array is for
    > > the first time.
    > > And traverse the string when actually doing the split for the second
    > > time.

    >
    > > I think this is maybe not the correct answer.

    >
    > > How should i handle such situation?

    >
    > > I find there is a 'strtok(s, ct)' function in the standard library.
    > > The first cll in a sequence has a non-NULL s.
    > > Each subsequent call, indicated by a NULL value of s.
    > > strtok returns NULL when no further token is found.

    >
    > > Why indeed this function defined this way?
    > > Why not return a string array directly?
    > > Maybe this is the correct answer??

    >
    > I find it highly improbably that an MS interview question was intended
    > to elicit a C response. Surely the desired response was to create and
    > return a C++ container object in some fashion.



    This question is very simple with Java or C++, and I believe it's
    simple with C#.
    I can use List in Java or vector<string> in C++.

    When I asked him may i use that languages, he said that languages hide
    the details
    And insist on implementing with C :(

    I'm interviewed by MS only once, maybe this is not a typical question.
    >From my experience, MS like to ask detail questions.

    I must write the code in paper in a short time, after i discuss with
    him on
    what the function interface will be, there is little time for me to
    complete the program. :(
     
    Lambda, Nov 6, 2007
    #8
  9. Lambda wrote:
    > In an interview, I was asked:
    >
    > Define a function to split a string with some token.
    >
    > I said let the function return a string array.
    >
    > He asked me how do i know the array size.


    Noone seems to have comment on this so far, so here goes:

    As Richard Heathfield said elsethread, it is impossible to return a
    genuine array, though you can return a pointer pointing to the first
    element of an array (for further information, see chapter 6 of the
    comp.lang.c FAQ.)

    Such a pointer to the first element of an array of strings (each of
    which is really a pointer to the first element of an array of chars)
    would have type char **.

    When you return that pointer, it has lost all information about how big
    the array was. There are two main methods for communicating the size of
    an array pointed to by a pointer: 1) have some 'sentinel' value within
    the array which represents the last element, and 2) communicate the size
    of the array through some other channel, perhaps by returning a struct
    containing the pointer and a size_t size element.

    Examples of method 1) include:
    * Null-terminated strings. The '\0' character is the sentinel.
    * The GLib function g_strdupv():
    gchar **g_strdupv (gchar **str_array);
    gchar is some character type; g_strdupv copies a NULL-terminated array
    of strings of gchar. The NULL-pointer is the sentinel.

    Examples of method 2) include:
    * The qsort() standard library function.
    * The printf() standard library function. Here the length of the
    variable-length argument list is communicated through the format string.
    (An argument list isn't an array, but this is the same idea.)

    Each method has advantages and disadvantages.

    --
    Philip Potter pgp <at> doc.ic.ac.uk
     
    Philip Potter, Nov 6, 2007
    #9
  10. [comp.lang.c] Richard Heathfield <> wrote:

    > No, you don't need to do this in two passes. You can build the array
    > dynamically, extending it as and when required.


    I would think, though, that the work of traversing the string twice
    would be less than the probable extra calls to realloc() - would you
    agree?

    > How to tell the *caller* the array size bears thinking about. One way is to
    > wrap the whole thing in a struct, and either have your function accept a
    > pointer to such a struct or, perhaps better, allocate the memory for it
    > yourself and return a pointer to it.


    I would say that the best solution would be to write both functions
    (with the second delegating to the first) and allow the caller to
    invoke whichever function is more convenient.

    --
    C. Benson Manica | I appreciate all corrections, polite or otherwise.
    cbmanica(at)gmail.com |
    ----------------------| I do not currently read any posts posted through
    sdf.lonestar.org | Google groups, due to rampant unchecked spam.
     
    Christopher Benson-Manica, Nov 6, 2007
    #10
  11. Lambda

    Ben Pfaff Guest

    Lambda <> writes:

    > I find there is a 'strtok(s, ct)' function in the standard library.


    strtok() has at least these problems:

    * It merges adjacent delimiters. If you use a comma as your
    delimiter, then "a,,b,c" will be divided into three tokens,
    not four. This is often the wrong thing to do. In fact, it
    is only the right thing to do, in my experience, when the
    delimiter set contains white space (for dividing a string
    into "words") or it is known in advance that there will be
    no adjacent delimiters.

    * The identity of the delimiter is lost, because it is
    changed to a null terminator.

    * It modifies the string that it tokenizes. This is bad
    because it forces you to make a copy of the string if
    you want to use it later. It also means that you can't
    tokenize a string literal with it; this is not
    necessarily something you'd want to do all the time but
    it is surprising.

    * It can only be used once at a time. If a sequence of
    strtok() calls is ongoing and another one is started,
    the state of the first one is lost. This isn't a
    problem for small programs but it is easy to lose track
    of such things in hierarchies of nested functions in
    large programs. In other words, strtok() breaks
    encapsulation.

    --
    int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
    \n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
    );while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p\
    );}return 0;}
     
    Ben Pfaff, Nov 6, 2007
    #11
  12. Christopher Benson-Manica said:

    > [comp.lang.c] Richard Heathfield <> wrote:
    >
    >> No, you don't need to do this in two passes. You can build the array
    >> dynamically, extending it as and when required.

    >
    > I would think, though, that the work of traversing the string twice
    > would be less than the probable extra calls to realloc() - would you
    > agree?


    A two-pass algorithm lacks elegance (although of course we might do it in
    two passes anyway, for any of various reasons). I would certainly be
    swayed in the direction of single-pass if I had reason to think that the
    inputs were likely to be colossal (or volatile).

    <snip>

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
     
    Richard Heathfield, Nov 6, 2007
    #12
  13. Lambda

    santosh Guest

    On Wednesday 07 Nov 2007 12:10 am Richard Heathfield
    <> wrote in article
    <>:

    > Christopher Benson-Manica said:
    >
    >> [comp.lang.c] Richard Heathfield <> wrote:
    >>
    >>> No, you don't need to do this in two passes. You can build the array
    >>> dynamically, extending it as and when required.

    >>
    >> I would think, though, that the work of traversing the string twice
    >> would be less than the probable extra calls to realloc() - would you
    >> agree?

    >
    > A two-pass algorithm lacks elegance (although of course we might do it
    > in two passes anyway, for any of various reasons). I would certainly
    > be swayed in the direction of single-pass if I had reason to think
    > that the inputs were likely to be colossal (or volatile).


    If the input is volatile, (ignoring for the moment the likelyhood of
    this), I think even a one-pass algorithm won't be guaranteed to produce
    meaningful results.
     
    santosh, Nov 6, 2007
    #13
  14. santosh said:

    > On Wednesday 07 Nov 2007 12:10 am Richard Heathfield
    > <> wrote in article


    <snip>

    >> I would certainly
    >> be swayed in the direction of single-pass if I had reason to think
    >> that the inputs were likely to be colossal (or volatile).

    >
    > If the input is volatile, (ignoring for the moment the likelyhood of
    > this), I think even a one-pass algorithm won't be guaranteed to produce
    > meaningful results.


    Well, evidently 'volatile' was a poor word choice, given the existence of
    the C keyword of that name! To clarify: I'm thinking, for example, of data
    coming in over stdin or a network connection, where you can't simply "play
    back" the data whenever you like. (Also, it was a pretty lousy
    parenthetical comment for another reason: it widens the discussion away
    from mere strings, without saying so.)

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
     
    Richard Heathfield, Nov 6, 2007
    #14
  15. Lambda

    santosh Guest

    On Wednesday 07 Nov 2007 12:28 am Richard Heathfield
    <> wrote in article
    <>:

    > santosh said:
    >
    >> On Wednesday 07 Nov 2007 12:10 am Richard Heathfield
    >> <> wrote in article

    >
    > <snip>
    >
    >>> I would certainly
    >>> be swayed in the direction of single-pass if I had reason to think
    >>> that the inputs were likely to be colossal (or volatile).

    >>
    >> If the input is volatile, (ignoring for the moment the likelyhood of
    >> this), I think even a one-pass algorithm won't be guaranteed to
    >> produce meaningful results.

    >
    > Well, evidently 'volatile' was a poor word choice, given the existence
    > of the C keyword of that name! To clarify: I'm thinking, for example,
    > of data coming in over stdin or a network connection, where you can't
    > simply "play back" the data whenever you like. (Also, it was a pretty
    > lousy parenthetical comment for another reason: it widens the
    > discussion away from mere strings, without saying so.)


    OK. That makes sense. Yes, a two pass algorithm is not applicable unless
    you manage to buffer the data.
     
    santosh, Nov 6, 2007
    #15
  16. In article <fgq1s2$hof$>,
    Christopher Benson-Manica <> wrote:

    >> No, you don't need to do this in two passes. You can build the array
    >> dynamically, extending it as and when required.


    >I would think, though, that the work of traversing the string twice
    >would be less than the probable extra calls to realloc() - would you
    >agree?


    Not necessarily. You only call realloc() at most once per token,
    rather than once per character. And most implementations don't
    allocate the exact size requested, so most calls to realloc() are
    likely to be no-ops with a bit of overhead, even if you call realloc()
    for each token.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
     
    Richard Tobin, Nov 6, 2007
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gopal Krish

    Interview question

    Gopal Krish, Oct 22, 2004, in forum: ASP .Net
    Replies:
    10
    Views:
    682
    John Timney \(Microsoft MVP\)
    Oct 23, 2004
  2. Digital Puer
    Replies:
    17
    Views:
    4,116
    Andrew Thompson
    Dec 27, 2003
  3. Jerry

    An interview question

    Jerry, May 27, 2005, in forum: Java
    Replies:
    22
    Views:
    945
    Brooks Hagenow
    Jun 12, 2005
  4. Replies:
    9
    Views:
    462
    Andrey Tarasevich
    Jan 22, 2005
  5. reema
    Replies:
    0
    Views:
    293
    reema
    Aug 26, 2008
Loading...

Share This Page