Analogue of ReadLine() in C

Discussion in 'C Programming' started by МакÑим Фомин, Oct 19, 2011.

  1. I need function similar to ReadLine() in any other language.
    The program below fails[1] when data more then LAG_SIZE is passed

    #include <stdio.h>
    #include <stdbool.h>
    #include <string.h>
    #include <stdlib.h>

    char * read_line(void)
    {
    bool read_more = true;
    enum { LAG_SIZE = 32 };
    char *return_buf = NULL;
    char *new_line_pos = NULL;
    char tmp_buf[LAG_SIZE] = {'\0'};
    size_t size = 0;
    while (read_more && fgets(tmp_buf, LAG_SIZE, stdin)){
    char *tmp_realloc;
    size += LAG_SIZE;
    tmp_realloc = realloc(return_buf, size);
    if (!tmp_realloc){
    free(return_buf);
    return_buf = NULL;
    break;
    }
    return_buf = tmp_realloc;
    memset(return_buf + size - LAG_SIZE, '\0', LAG_SIZE);
    new_line_pos = strchr(tmp_buf, '\n');
    if (new_line_pos){
    *new_line_pos = '\0';
    read_more = false;
    }
    strcat(return_buf, tmp_buf);
    memset(tmp_buf, '\0', LAG_SIZE);
    }
    return return_buf;
    }

    void loop(void)
    {
    bool repeat = true;
    while (repeat){
    char *command;
    printf("Enter> ");
    command = read_line();
    if (command && !strcmp(command, "exit"))
    repeat = false;
    free(command);
    }
    }

    int main(void)
    {
    loop();
    return 0;
    }

    [1]: *** glibc detected *** ./a.out: realloc(): invalid next size:
    0x0804b008 ***
    МакÑим Фомин, Oct 19, 2011
    #1
    1. Advertising

  2. If one has problems with dynamical allocations of pointers in the main memory, please use static buffer first to dump buffer contents and define the DEBUG modes for all pointers!


    ======
    It is bad in C/C++ to use macros with nested operations in any argument!

    Macros should not affect arguments' variables values unless desired in the macros!
    88888 Dihedral, Oct 19, 2011
    #2
    1. Advertising

  3. МакÑим Фомин

    Kaz Kylheku Guest

    On 2011-10-19, 88888 Dihedral <> wrote:
    > If one has problems with dynamical allocations of pointers in the main memory, please use static buffer first to dump buffer contents and define the DEBUG modes for all pointers!
    >
    >
    >======
    > It is bad in C/C++ to use macros with nested operations in any argument!


    ITYM, "side effects in any argument".

    I do not agree. The obvious counterexample are properly designed macros which
    are carefully designed to evaluate their arguments exactly once, and have a
    proper sequence point after the evaluation of arguments.

    Removing this requirement sometimes make it easier to design a macro, and then
    the users of the macro have to be given a big warning.

    > Macros should not affect arguments' variables values unless desired in the macros!


    When it is acceptable for a computer operation to have an effect on data,
    such that the effect is not desired?
    Kaz Kylheku, Oct 19, 2011
    #3
  4. МакÑим Фомин

    Stefan Ram Guest

    China Blue Corn Chips <> writes:
    >char *p = 0; int m = 0, n = 0, c;
    >while (c=fgetc(file), c!='\n' && c!=EOF) {
    > if (n>=m) {m = 2*n+1; p = realloc(p, m);}
    > p[n++] = c;


    What if !realloc(p,m)?
    Stefan Ram, Oct 19, 2011
    #4
  5. МакÑим Фомин

    Ike Naar Guest

    On 2011-10-19, ?????? ????? <> wrote:
    > I need function similar to ReadLine() in any other language.
    > The program below fails[1] when data more then LAG_SIZE is passed
    >
    > #include <stdio.h>
    > #include <stdbool.h>
    > #include <string.h>
    > #include <stdlib.h>
    >
    > char * read_line(void)
    > {
    > bool read_more = true;
    > enum { LAG_SIZE = 32 };
    > char *return_buf = NULL;
    > char *new_line_pos = NULL;
    > char tmp_buf[LAG_SIZE] = {'\0'};
    > size_t size = 0;
    > while (read_more && fgets(tmp_buf, LAG_SIZE, stdin)){
    > char *tmp_realloc;
    > size += LAG_SIZE;
    > tmp_realloc = realloc(return_buf, size);
    > if (!tmp_realloc){
    > free(return_buf);
    > return_buf = NULL;
    > break;
    > }
    > return_buf = tmp_realloc;
    > memset(return_buf + size - LAG_SIZE, '\0', LAG_SIZE);
    > new_line_pos = strchr(tmp_buf, '\n');
    > if (new_line_pos){
    > *new_line_pos = '\0';
    > read_more = false;
    > }
    > strcat(return_buf, tmp_buf);
    > memset(tmp_buf, '\0', LAG_SIZE);
    > }
    > return return_buf;
    > }
    >
    > void loop(void)
    > {
    > bool repeat = true;
    > while (repeat){
    > char *command;
    > printf("Enter> ");
    > command = read_line();
    > if (command && !strcmp(command, "exit"))


    Is this really what you meant? This means that the program will
    loop forever when no more input can be read (e.g. when EOF is
    reached). I'd guess you want:

    if (command == NULL || !strcmp(command, "exit"))

    instead.

    > repeat = false;
    > free(command);
    > }
    > }
    >
    > int main(void)
    > {
    > loop();
    > return 0;
    > }
    >
    > [1]: *** glibc detected *** ./a.out: realloc(): invalid next size:
    > 0x0804b008 ***



    --

    SDF Public Access UNIX System - http://sdf.lonestar.org
    Ike Naar, Oct 19, 2011
    #5
  6. On Oct 19, 8:26 pm, МакÑим Фомин <> wrote:

    > I need function similar to ReadLine() in any other language.


    *what* other language? It certinly isn't present in all other
    langauges.

    > The program below fails[1] when data more then LAG_SIZE is passed


    so fix it

    <snip>
    Nick Keighley, Oct 20, 2011
    #6
  7. МакÑим Фомин

    Seebs Guest

    On 2011-10-20, China Blue Corn Chips <> wrote:
    > In article <-berlin.de>,
    > -berlin.de (Stefan Ram) wrote:
    >> China Blue Corn Chips <> writes:
    >> >char *p = 0; int m = 0, n = 0, c;
    >> >while (c=fgetc(file), c!='\n' && c!=EOF) {
    >> > if (n>=m) {m = 2*n+1; p = realloc(p, m);}
    >> > p[n++] = c;


    >> What if !realloc(p,m)?


    > Buy a bigger swap disk.


    This is crappy advice, because it's completely unrelated. There are
    tons of systems which, for sane and well-considered reasons, will cap
    allocations far below the size of physical memory or swap. 2*n+1 goes
    pathological pretty fast. Also, it's... ridiculous, frankly, to start
    at 0. That guarantees at least one realloc, and probably several for
    even a tiny string.

    Imagine that you're on a system which, quite reasonably, limits your
    32-bit code to 3GB of memory. You can have all 3GB, but you can't have
    more.

    Now imagine that you want to process a string which contains 1GB of
    data. Your algorithm will do ~30 reallocs, most of which will almost
    certainly be copying all that data, and then fail. Think it through;
    what happens? At any given time, your current length is a value of
    the form 2^n - 1. So what's it going to do for 2^n bytes of data,
    say, 1GB exactly?

    1. It reallocs until it has exactly one byte less than 1GB of data.
    2. It then tries to allocate exactly one byte less than 2GB of data,
    to copy that first GB into.

    But that's two bytes less than 3GB of data it needs to be able to handle,
    and I doubt your code and support libraries fit into 2 bytes.

    Depending, I'd say either start with a "likely" length or start with
    something around a page size, and reallocate up a little more conservatively.

    -s
    --
    Copyright 2011, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    I am not speaking for my employer, although they do rent some of my opinions.
    Seebs, Oct 20, 2011
    #7
  8. On Oct 20, 11:11 am, Nick Keighley <>
    wrote:
    > On Oct 19, 8:26 pm, МакÑим Фомин <> wrote:
    >
    > > I need function similar to ReadLine() in any other language.

    >
    > *what* other language? It certinly isn't present in all other
    > langauges.


    Yes, not all languages have. But many decent have.
    However, it is not an issue.

    >
    > > The program below fails[1] when data more then LAG_SIZE is passed

    >
    > so fix it
    >
    > <snip>


    I haven't not found any error in red_line() function (I consider calls
    to malloc, realloc and free legitimate), in fact I doubt that there is
    one.
    That's why I asked the question.
    МакÑим Фомин, Oct 20, 2011
    #8
  9. МакÑим Фомин

    James Kuyper Guest

    On 10/19/2011 03:26 PM, МакÑим Фомин wrote:
    > I need function similar to ReadLine() in any other language.
    > The program below fails[1] when data more then LAG_SIZE is passed

    ....
    > [1]: *** glibc detected *** ./a.out: realloc(): invalid next size:
    > 0x0804b008 ***


    I could not duplicate that symptom. Could you provide the exact command
    line you used to compile the program, and exact input line that triggers
    the failure?
    --
    James Kuyper
    James Kuyper, Oct 20, 2011
    #9
  10. In article <-september.org>,
    China Blue Corn Chips <> plaintively whind:
    ....
    >I'm not going to do it unless you pay me.


    That's what your girlfriends always tell you, isn't it?

    --
    Windows 95 n. (Win-doze): A 32 bit extension to a 16 bit user interface for
    an 8 bit operating system based on a 4 bit architecture from a 2 bit company
    that can't stand 1 bit of competition.

    Modern day upgrade --> Windows XP Professional x64: Windows is now a 64 bit
    tweak of a 32 bit extension to a 16 bit user interface for an 8 bit
    operating system based on a 4 bit architecture from a 2 bit company that
    can't stand 1 bit of competition.
    Kenny McCormack, Oct 20, 2011
    #10
  11. МакÑим Фомин

    BartC Guest

    "China Blue Corn Chips" <> wrote in message
    news:-september.org...
    > In article <>,
    > Seebs <> wrote:
    >
    >> On 2011-10-20, China Blue Corn Chips <> wrote:


    >> > Buy a bigger swap disk.

    >>
    >> This is crappy advice, because it's completely unrelated. There are

    >
    > The maximum line size is the file size. If the VM that can map in the
    > entire
    > file, it can allocate space for one line.


    Someone doing 'ReadLine' is probably interested in line-oriented files.

    A file containing with a single multi-GB line is likely not line-oriented!
    Or is not using the expected newline sequence.

    It would be acceptable then to put a cap on the longest length of one line.
    The user then needs to use a function such as 'Readfile' instead.

    --
    Bartc
    BartC, Oct 20, 2011
    #11
  12. On Oct 20, 3:31 pm, "BartC" <> wrote:
    >
    > Someone doing 'ReadLine' is probably interested in line-oriented files.
    >
    > A file containing with a single multi-GB line is likely not line-oriented!
    > Or is not using the expected newline sequence.
    >

    The problem is that it is impossible to define a reasonable line
    length.

    For instance a program I've just written generates csv files. Each
    line represents a "case', each column a "factor". The factors are DNA
    sequences which may or may not be present in the genes under
    consideration, the cases are the genes. If I look at 5-letter DNA
    sequences, that's 4^5 or 1024 factors. Each one has a real value
    associated with it, so about five characters for three decimal places
    of precision. That means that lines are weighing in at about 6000
    characters.


    --
    Fuzzy logic trees (build a tree from factors and cases, as described).
    http://www.malcolmmclean.site11.com/www
    Malcolm McLean, Oct 20, 2011
    #12
  13. In article <>,
    Malcolm McLean <> wrote:
    ....
    >sequences, that's 4^5 or 1024 factors. Each one has a real value
    >associated with it, so about five characters for three decimal places
    >of precision. That means that lines are weighing in at about 6000
    >characters.


    6000 is nothing. 10K is nothing. Even 1M is really nothing on today's
    hardware.

    This is one of those issues where in order to make it sound like a serious
    problem, people have to drag it out to an extreme (talking about 1G).

    --

    First of all, I do not appreciate your playing stupid here at all.

    - Thomas 'PointedEars' Lahn -
    Kenny McCormack, Oct 20, 2011
    #13
  14. Malcolm McLean <> writes:

    > On Oct 20, 3:31 pm, "BartC" <> wrote:
    >>
    >> Someone doing 'ReadLine' is probably interested in line-oriented files.
    >>
    >> A file containing with a single multi-GB line is likely not line-oriented!
    >> Or is not using the expected newline sequence.
    >>

    > The problem is that it is impossible to define a reasonable line
    > length.
    >
    > For instance a program I've just written generates csv files. Each
    > line represents a "case', each column a "factor". The factors are DNA
    > sequences which may or may not be present in the genes under
    > consideration, the cases are the genes. If I look at 5-letter DNA
    > sequences, that's 4^5 or 1024 factors. Each one has a real value
    > associated with it, so about five characters for three decimal places
    > of precision. That means that lines are weighing in at about 6000
    > characters.


    You say it's impossible, yet you've managed to do it! Maybe you meant
    that there can't be a limit imposed by the function. I agree that this
    is more of a problem, but I don't see why there is any need to do that.
    I'd want a general-purpose read_line function to have a maximum size
    parameter, and it should probably allow the buffer to be re-used since
    that is such a common pattern in line-oriented programs.

    --
    Ben.
    Ben Bacarisse, Oct 20, 2011
    #14
  15. On 19.10.2011 21:26, МакÑим Фомин wrote:
    > I need function similar to ReadLine() in any other language.
    > The program below fails[1] when data more then LAG_SIZE is passed
    >


    Reading linewise through a file is as simple as using fgets(). If the
    last character in the returned string is newline character, you got a
    whole line.

    This makes it possible to adapt to use cases. In your case, I would
    simply do something like:

    #include <stdio.h>
    #include <string.h>
    static char buffer[256];

    static void loop()
    {
    do {
    if (fgets(buffer, sizeof buffer, stdin) < 0) {
    /* read error. Might be eof or real error. */
    if (ferror(stdin)) perror("read error");
    return;
    }
    } while (strncmp(buffer, "exit", sizeof "exit" - 1));
    }

    int main() { loop(); }

    No dynamic allocation where it isn't necessary. It is not really
    necessary in I/O, because there is never a buffer big enough for that,
    so don't program as though there were!

    HTH,
    Markus
    Markus Wichmann, Oct 20, 2011
    #15
  16. Markus Wichmann <> writes:

    > On 19.10.2011 21:26, МакÑим Фомин wrote:
    >> I need function similar to ReadLine() in any other language.
    >> The program below fails[1] when data more then LAG_SIZE is passed
    >>

    >
    > Reading linewise through a file is as simple as using fgets(). If the
    > last character in the returned string is newline character, you got a
    > whole line.
    >
    > This makes it possible to adapt to use cases. In your case, I would
    > simply do something like:
    >
    > #include <stdio.h>
    > #include <string.h>
    > static char buffer[256];
    >
    > static void loop()
    > {
    > do {
    > if (fgets(buffer, sizeof buffer, stdin) < 0) {


    I think you intended to write == 0 here.

    > /* read error. Might be eof or real error. */
    > if (ferror(stdin)) perror("read error");
    > return;
    > }
    > } while (strncmp(buffer, "exit", sizeof "exit" - 1));
    > }
    >
    > int main() { loop(); }


    I am not sure how you decided that this suits the OP's use case. I
    think the original posted code was just a cut down example, so it's not
    obvious that silently splitting lines that are longer than 255
    characters suits the OP.

    > No dynamic allocation where it isn't necessary. It is not really
    > necessary in I/O, because there is never a buffer big enough for that,
    > so don't program as though there were!


    I think you'd be annoyed if, say, awk behaved like this. Given a big
    enough buffer, your shell might get away with it, but I expect it
    doesn't try.

    --
    Ben.
    Ben Bacarisse, Oct 20, 2011
    #16
  17. МакÑим Фомин

    Seebs Guest

    On 2011-10-20, China Blue Corn Chips <> wrote:
    > In article <>,
    > Seebs <> wrote:

    [all in reference to:]
    >> >> >char *p = 0; int m = 0, n = 0, c;
    >> >> >while (c=fgetc(file), c!='\n' && c!=EOF) {
    >> >> > if (n>=m) {m = 2*n+1; p = realloc(p, m);}
    >> >> > p[n++] = c;


    >> >> What if !realloc(p,m)?


    >> > Buy a bigger swap disk.


    >> This is crappy advice, because it's completely unrelated. There are


    > The maximum line size is the file size. If the VM that can map in the entire
    > file, it can allocate space for one line.


    Except, as I pointed out, this isn't true.

    > If you want to run on smaller
    > memories, adapt the code accordingly. I'm not interested in doing so.


    That's fine. But telling someone *else* to "buy a bigger swap disk" as
    a solution to a problem that cannot be fixed by buying a bigger swap disk
    remains crappy advice.

    > The total time and space is O(n). Changing the multiplier leaves you with O(n).
    > Starting with size>0 reduces the time and space by O(k), however
    > O(n)-O(k) = O(n).


    Except that:
    1. The thing it optimizes is actually O(log(n)).
    2. k is, in this case, quite likely to be the entirety of the actual data.

    > You're worrying about optimising at the wrong level which is a
    > waste of time unless you've done measurement showing this is a hot loop.


    There's such a thing as premature optimization, but that doesn't mean we
    should go out of our way to do things in obviously stupid ways. On a real
    system, with real data, your solution will take many times more real
    execution time than one which changes nothing except starting with n=64.

    And while it's a bad thing to go around making things fancier than you need
    to for no measured benefit, it's not a bad thing to make reasonable choices
    for starting values.

    > Also
    > realloc(0,n) is the same as malloc(n) so one initial malloc is no different than
    > a realloc on the first loop; but it requires more code.


    Right. I didn't suggest a fancier algorithm; I suggested starting n at some
    other value.

    > I used to write code that had to fit in 65K bytes. I don't any more. You can if
    > you want.


    This is a ridiculous bit of hyperbole. I'm not talking about crazy or
    unreasonable requirements; I'm talking about something which, for data sizes
    most people deal with on a daily basis, would blow up on most consumer
    machines today, when an even slightly cleverer algorithm wouldn't.

    >> Depending, I'd say either start with a "likely" length or start with
    >> something around a page size, and reallocate up a little more conservatively.


    > If the resize doesn't increase geometrically, the run time switches to O(n^2).
    > Otherwise it is O(n) + whatever code complexity you want to pretend you can beat
    > O(n).


    O(n) notation only gets you so far. There is more to writing good code than
    that.

    I guess what offends me about this isn't that it's lazy, but that it's not
    even lazy. It wouldn't be any more effort to do this in a way that would
    work dramatically better. If it were *any* more effort, I could agree that
    maybe it wasn't worth putting in that effort, but when it's *no effort at
    all*, that's ridiculous.

    -s
    --
    Copyright 2011, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    I am not speaking for my employer, although they do rent some of my opinions.
    Seebs, Oct 20, 2011
    #17
  18. МакÑим Фомин

    Ian Collins Guest

    On 10/21/11 02:31 AM, BartC wrote:
    > "China Blue Corn Chips"<> wrote in message
    > news:-september.org...
    >> In article<>,
    >> Seebs<> wrote:
    >>
    >>> On 2011-10-20, China Blue Corn Chips<> wrote:

    >
    >>>> Buy a bigger swap disk.
    >>>
    >>> This is crappy advice, because it's completely unrelated. There are

    >>
    >> The maximum line size is the file size. If the VM that can map in the
    >> entire
    >> file, it can allocate space for one line.

    >
    > Someone doing 'ReadLine' is probably interested in line-oriented files.
    >
    > A file containing with a single multi-GB line is likely not line-oriented!
    > Or is not using the expected newline sequence.
    >
    > It would be acceptable then to put a cap on the longest length of one line.
    > The user then needs to use a function such as 'Readfile' instead.


    Or consume the "line" bit by bit.

    --
    Ian Collins
    Ian Collins, Oct 20, 2011
    #18
  19. You will see some petty squabling.

    The reason that you code fails is that you can't generally realloc what youhaven't [m|c|0]alloc'd.

    I agree with others that you should first allocate something like the longest common line (whatever you deem that to be). I might pick 256 or 512 or 409 or 86 or 133. If wastage for multiple consecutive lines is a big problem, you will want to do some sort of your own memory management for your "Lines".

    You could allocate something extremely large and then copy it into a bufferthat is just barely big enough.

    Most libraries do not allocate exactly what you tell them to. If you know about the memory management method, you can allocate something reasonable initially and avoid doing the copy when you know (or believe) that it won't really give back any of the memory.

    So, if your libraries always allocate in multiples of 16, you could make your initial allocation be something like 48 or 80 or 128.

    It would, of course, be nice to have your function learn how big your lineswere likely to be by keeping internal statistics or by an argument as a hint.

    Ironically, any function that returns an allocated buffer will likely be least efficient in space use with small lines and least efficient in computational speed with large ones. If you don't care about running out of memory at 3% of your nominal capacity when you are processing a file with a bunch of short lines nor in taking 4 times as long as nominal when you have a very few very large lines the one-size-fits-all approach for a library would work fine.

    If you can't make your function smart, you can create two or three differntvariations so as to optimize whatever metric you'd like. Having one function that reads chat room squawks (which are typically less than 32 characters), another that reads C code (which have an average around 40 or 50 characters and a max that is defined), and another that reads paragraphs from a novel (where dialogue paragraphs might be short, but descriptions [especially in Russian] can go on for several pages) with different optimization criteria would seem like a good idea.

    For the ReadNovelLine function, it would seem reasonable to have an I/O argument that indicated what kind of a thing was expected and what you actually got. If you wanted to get cute, you could even have a secret circular array of the last so many line lengths. Do you want to do it or to do it inteligently?
    Michael Angelo Ravera, Oct 20, 2011
    #19
  20. МакÑим Фомин

    James Kuyper Guest

    On 10/20/2011 04:32 PM, Michael Angelo Ravera wrote:
    ....
    > The reason that you code fails is that you can't generally realloc
    > what you haven't [m|c|0]alloc'd.


    What's the 0 for? 0alloc'd?

    You are also allowed to realloc() a null pointer, or a pointer returned
    by a previous successful call to realloc() that has not yet been free()d
    or successfully realloc()d again. As far as I can see, those are the
    only two things this code ever realloc()s. Have I missed something?
    James Kuyper, Oct 20, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    13
    Views:
    562
    Andrey Ryabov
    Nov 18, 2007
  2. gavino
    Replies:
    4
    Views:
    528
    gavino
    Sep 20, 2010
  3. Sahil Tandon
    Replies:
    4
    Views:
    221
    Dan Sommers
    Sep 2, 2011
  4. Jean-Michel
    Replies:
    0
    Views:
    349
    Jean-Michel
    Dec 22, 2007
  5. DS.Highwind

    Moz/FF containsNode analogue for IE?

    DS.Highwind, Aug 7, 2007, in forum: Javascript
    Replies:
    2
    Views:
    133
    DS.Highwind
    Aug 8, 2007
Loading...

Share This Page