Analogue of ReadLine() in C

  • Thread starter МакÑим Фомин
  • Start date
Ð

МакÑим Фомин

I need function similar to ReadLine() in any other language.
The program below fails[1] when data more then LAG_SIZE is passed

#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <stdlib.h>

char * read_line(void)
{
bool read_more = true;
enum { LAG_SIZE = 32 };
char *return_buf = NULL;
char *new_line_pos = NULL;
char tmp_buf[LAG_SIZE] = {'\0'};
size_t size = 0;
while (read_more && fgets(tmp_buf, LAG_SIZE, stdin)){
char *tmp_realloc;
size += LAG_SIZE;
tmp_realloc = realloc(return_buf, size);
if (!tmp_realloc){
free(return_buf);
return_buf = NULL;
break;
}
return_buf = tmp_realloc;
memset(return_buf + size - LAG_SIZE, '\0', LAG_SIZE);
new_line_pos = strchr(tmp_buf, '\n');
if (new_line_pos){
*new_line_pos = '\0';
read_more = false;
}
strcat(return_buf, tmp_buf);
memset(tmp_buf, '\0', LAG_SIZE);
}
return return_buf;
}

void loop(void)
{
bool repeat = true;
while (repeat){
char *command;
printf("Enter> ");
command = read_line();
if (command && !strcmp(command, "exit"))
repeat = false;
free(command);
}
}

int main(void)
{
loop();
return 0;
}

[1]: *** glibc detected *** ./a.out: realloc(): invalid next size:
0x0804b008 ***
 
8

88888 Dihedral

If one has problems with dynamical allocations of pointers in the main memory, please use static buffer first to dump buffer contents and define the DEBUG modes for all pointers!


======
It is bad in C/C++ to use macros with nested operations in any argument!

Macros should not affect arguments' variables values unless desired in the macros!
 
K

Kaz Kylheku

If one has problems with dynamical allocations of pointers in the main memory, please use static buffer first to dump buffer contents and define the DEBUG modes for all pointers!


======
It is bad in C/C++ to use macros with nested operations in any argument!

ITYM, "side effects in any argument".

I do not agree. The obvious counterexample are properly designed macros which
are carefully designed to evaluate their arguments exactly once, and have a
proper sequence point after the evaluation of arguments.

Removing this requirement sometimes make it easier to design a macro, and then
the users of the macro have to be given a big warning.
Macros should not affect arguments' variables values unless desired in the macros!

When it is acceptable for a computer operation to have an effect on data,
such that the effect is not desired?
 
S

Stefan Ram

China Blue Corn Chips said:
char *p = 0; int m = 0, n = 0, c;
while (c=fgetc(file), c!='\n' && c!=EOF) {
if (n>=m) {m = 2*n+1; p = realloc(p, m);}
p[n++] = c;

What if !realloc(p,m)?
 
I

Ike Naar

I need function similar to ReadLine() in any other language.
The program below fails[1] when data more then LAG_SIZE is passed

#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <stdlib.h>

char * read_line(void)
{
bool read_more = true;
enum { LAG_SIZE = 32 };
char *return_buf = NULL;
char *new_line_pos = NULL;
char tmp_buf[LAG_SIZE] = {'\0'};
size_t size = 0;
while (read_more && fgets(tmp_buf, LAG_SIZE, stdin)){
char *tmp_realloc;
size += LAG_SIZE;
tmp_realloc = realloc(return_buf, size);
if (!tmp_realloc){
free(return_buf);
return_buf = NULL;
break;
}
return_buf = tmp_realloc;
memset(return_buf + size - LAG_SIZE, '\0', LAG_SIZE);
new_line_pos = strchr(tmp_buf, '\n');
if (new_line_pos){
*new_line_pos = '\0';
read_more = false;
}
strcat(return_buf, tmp_buf);
memset(tmp_buf, '\0', LAG_SIZE);
}
return return_buf;
}

void loop(void)
{
bool repeat = true;
while (repeat){
char *command;
printf("Enter> ");
command = read_line();
if (command && !strcmp(command, "exit"))

Is this really what you meant? This means that the program will
loop forever when no more input can be read (e.g. when EOF is
reached). I'd guess you want:

if (command == NULL || !strcmp(command, "exit"))

instead.
repeat = false;
free(command);
}
}

int main(void)
{
loop();
return 0;
}

[1]: *** glibc detected *** ./a.out: realloc(): invalid next size:
0x0804b008 ***
 
S

Seebs

China Blue Corn Chips said:
char *p = 0; int m = 0, n = 0, c;
while (c=fgetc(file), c!='\n' && c!=EOF) {
if (n>=m) {m = 2*n+1; p = realloc(p, m);}
p[n++] = c;
What if !realloc(p,m)?
Buy a bigger swap disk.

This is crappy advice, because it's completely unrelated. There are
tons of systems which, for sane and well-considered reasons, will cap
allocations far below the size of physical memory or swap. 2*n+1 goes
pathological pretty fast. Also, it's... ridiculous, frankly, to start
at 0. That guarantees at least one realloc, and probably several for
even a tiny string.

Imagine that you're on a system which, quite reasonably, limits your
32-bit code to 3GB of memory. You can have all 3GB, but you can't have
more.

Now imagine that you want to process a string which contains 1GB of
data. Your algorithm will do ~30 reallocs, most of which will almost
certainly be copying all that data, and then fail. Think it through;
what happens? At any given time, your current length is a value of
the form 2^n - 1. So what's it going to do for 2^n bytes of data,
say, 1GB exactly?

1. It reallocs until it has exactly one byte less than 1GB of data.
2. It then tries to allocate exactly one byte less than 2GB of data,
to copy that first GB into.

But that's two bytes less than 3GB of data it needs to be able to handle,
and I doubt your code and support libraries fit into 2 bytes.

Depending, I'd say either start with a "likely" length or start with
something around a page size, and reallocate up a little more conservatively.

-s
 
Ð

МакÑим Фомин

*what* other language? It certinly isn't present in all other
langauges.

Yes, not all languages have. But many decent have.
However, it is not an issue.
The program below fails[1] when data more then LAG_SIZE is passed

so fix it

<snip>

I haven't not found any error in red_line() function (I consider calls
to malloc, realloc and free legitimate), in fact I doubt that there is
one.
That's why I asked the question.
 
J

James Kuyper

I need function similar to ReadLine() in any other language.
The program below fails[1] when data more then LAG_SIZE is passed ....
[1]: *** glibc detected *** ./a.out: realloc(): invalid next size:
0x0804b008 ***

I could not duplicate that symptom. Could you provide the exact command
line you used to compile the program, and exact input line that triggers
the failure?
 
K

Kenny McCormack

China Blue Corn Chips said:
I'm not going to do it unless you pay me.

That's what your girlfriends always tell you, isn't it?

--
Windows 95 n. (Win-doze): A 32 bit extension to a 16 bit user interface for
an 8 bit operating system based on a 4 bit architecture from a 2 bit company
that can't stand 1 bit of competition.

Modern day upgrade --> Windows XP Professional x64: Windows is now a 64 bit
tweak of a 32 bit extension to a 16 bit user interface for an 8 bit
operating system based on a 4 bit architecture from a 2 bit company that
can't stand 1 bit of competition.
 
B

BartC

China Blue Corn Chips said:
The maximum line size is the file size. If the VM that can map in the
entire
file, it can allocate space for one line.

Someone doing 'ReadLine' is probably interested in line-oriented files.

A file containing with a single multi-GB line is likely not line-oriented!
Or is not using the expected newline sequence.

It would be acceptable then to put a cap on the longest length of one line.
The user then needs to use a function such as 'Readfile' instead.
 
M

Malcolm McLean

Someone doing 'ReadLine' is probably interested in line-oriented files.

A file containing with a single multi-GB line is likely not line-oriented!
Or is not using the expected newline sequence.
The problem is that it is impossible to define a reasonable line
length.

For instance a program I've just written generates csv files. Each
line represents a "case', each column a "factor". The factors are DNA
sequences which may or may not be present in the genes under
consideration, the cases are the genes. If I look at 5-letter DNA
sequences, that's 4^5 or 1024 factors. Each one has a real value
associated with it, so about five characters for three decimal places
of precision. That means that lines are weighing in at about 6000
characters.
 
K

Kenny McCormack

Malcolm McLean said:
sequences, that's 4^5 or 1024 factors. Each one has a real value
associated with it, so about five characters for three decimal places
of precision. That means that lines are weighing in at about 6000
characters.

6000 is nothing. 10K is nothing. Even 1M is really nothing on today's
hardware.

This is one of those issues where in order to make it sound like a serious
problem, people have to drag it out to an extreme (talking about 1G).
 
B

Ben Bacarisse

Malcolm McLean said:
The problem is that it is impossible to define a reasonable line
length.

For instance a program I've just written generates csv files. Each
line represents a "case', each column a "factor". The factors are DNA
sequences which may or may not be present in the genes under
consideration, the cases are the genes. If I look at 5-letter DNA
sequences, that's 4^5 or 1024 factors. Each one has a real value
associated with it, so about five characters for three decimal places
of precision. That means that lines are weighing in at about 6000
characters.

You say it's impossible, yet you've managed to do it! Maybe you meant
that there can't be a limit imposed by the function. I agree that this
is more of a problem, but I don't see why there is any need to do that.
I'd want a general-purpose read_line function to have a maximum size
parameter, and it should probably allow the buffer to be re-used since
that is such a common pattern in line-oriented programs.
 
M

Markus Wichmann

I need function similar to ReadLine() in any other language.
The program below fails[1] when data more then LAG_SIZE is passed

Reading linewise through a file is as simple as using fgets(). If the
last character in the returned string is newline character, you got a
whole line.

This makes it possible to adapt to use cases. In your case, I would
simply do something like:

#include <stdio.h>
#include <string.h>
static char buffer[256];

static void loop()
{
do {
if (fgets(buffer, sizeof buffer, stdin) < 0) {
/* read error. Might be eof or real error. */
if (ferror(stdin)) perror("read error");
return;
}
} while (strncmp(buffer, "exit", sizeof "exit" - 1));
}

int main() { loop(); }

No dynamic allocation where it isn't necessary. It is not really
necessary in I/O, because there is never a buffer big enough for that,
so don't program as though there were!

HTH,
Markus
 
B

Ben Bacarisse

Markus Wichmann said:
I need function similar to ReadLine() in any other language.
The program below fails[1] when data more then LAG_SIZE is passed

Reading linewise through a file is as simple as using fgets(). If the
last character in the returned string is newline character, you got a
whole line.

This makes it possible to adapt to use cases. In your case, I would
simply do something like:

#include <stdio.h>
#include <string.h>
static char buffer[256];

static void loop()
{
do {
if (fgets(buffer, sizeof buffer, stdin) < 0) {

I think you intended to write == 0 here.
/* read error. Might be eof or real error. */
if (ferror(stdin)) perror("read error");
return;
}
} while (strncmp(buffer, "exit", sizeof "exit" - 1));
}

int main() { loop(); }

I am not sure how you decided that this suits the OP's use case. I
think the original posted code was just a cut down example, so it's not
obvious that silently splitting lines that are longer than 255
characters suits the OP.
No dynamic allocation where it isn't necessary. It is not really
necessary in I/O, because there is never a buffer big enough for that,
so don't program as though there were!

I think you'd be annoyed if, say, awk behaved like this. Given a big
enough buffer, your shell might get away with it, but I expect it
doesn't try.
 
S

Seebs

char *p = 0; int m = 0, n = 0, c;
while (c=fgetc(file), c!='\n' && c!=EOF) {
if (n>=m) {m = 2*n+1; p = realloc(p, m);}
p[n++] = c;
What if !realloc(p,m)?
Buy a bigger swap disk.
This is crappy advice, because it's completely unrelated. There are
The maximum line size is the file size. If the VM that can map in the entire
file, it can allocate space for one line.

Except, as I pointed out, this isn't true.
If you want to run on smaller
memories, adapt the code accordingly. I'm not interested in doing so.

That's fine. But telling someone *else* to "buy a bigger swap disk" as
a solution to a problem that cannot be fixed by buying a bigger swap disk
remains crappy advice.
The total time and space is O(n). Changing the multiplier leaves you with O(n).
Starting with size>0 reduces the time and space by O(k), however
O(n)-O(k) = O(n).

Except that:
1. The thing it optimizes is actually O(log(n)).
2. k is, in this case, quite likely to be the entirety of the actual data.
You're worrying about optimising at the wrong level which is a
waste of time unless you've done measurement showing this is a hot loop.

There's such a thing as premature optimization, but that doesn't mean we
should go out of our way to do things in obviously stupid ways. On a real
system, with real data, your solution will take many times more real
execution time than one which changes nothing except starting with n=64.

And while it's a bad thing to go around making things fancier than you need
to for no measured benefit, it's not a bad thing to make reasonable choices
for starting values.
Also
realloc(0,n) is the same as malloc(n) so one initial malloc is no different than
a realloc on the first loop; but it requires more code.

Right. I didn't suggest a fancier algorithm; I suggested starting n at some
other value.
I used to write code that had to fit in 65K bytes. I don't any more. You can if
you want.

This is a ridiculous bit of hyperbole. I'm not talking about crazy or
unreasonable requirements; I'm talking about something which, for data sizes
most people deal with on a daily basis, would blow up on most consumer
machines today, when an even slightly cleverer algorithm wouldn't.
If the resize doesn't increase geometrically, the run time switches to O(n^2).
Otherwise it is O(n) + whatever code complexity you want to pretend you can beat
O(n).

O(n) notation only gets you so far. There is more to writing good code than
that.

I guess what offends me about this isn't that it's lazy, but that it's not
even lazy. It wouldn't be any more effort to do this in a way that would
work dramatically better. If it were *any* more effort, I could agree that
maybe it wasn't worth putting in that effort, but when it's *no effort at
all*, that's ridiculous.

-s
 
I

Ian Collins

Someone doing 'ReadLine' is probably interested in line-oriented files.

A file containing with a single multi-GB line is likely not line-oriented!
Or is not using the expected newline sequence.

It would be acceptable then to put a cap on the longest length of one line.
The user then needs to use a function such as 'Readfile' instead.

Or consume the "line" bit by bit.
 
M

Michael Angelo Ravera

You will see some petty squabling.

The reason that you code fails is that you can't generally realloc what youhaven't [m|c|0]alloc'd.

I agree with others that you should first allocate something like the longest common line (whatever you deem that to be). I might pick 256 or 512 or 409 or 86 or 133. If wastage for multiple consecutive lines is a big problem, you will want to do some sort of your own memory management for your "Lines".

You could allocate something extremely large and then copy it into a bufferthat is just barely big enough.

Most libraries do not allocate exactly what you tell them to. If you know about the memory management method, you can allocate something reasonable initially and avoid doing the copy when you know (or believe) that it won't really give back any of the memory.

So, if your libraries always allocate in multiples of 16, you could make your initial allocation be something like 48 or 80 or 128.

It would, of course, be nice to have your function learn how big your lineswere likely to be by keeping internal statistics or by an argument as a hint.

Ironically, any function that returns an allocated buffer will likely be least efficient in space use with small lines and least efficient in computational speed with large ones. If you don't care about running out of memory at 3% of your nominal capacity when you are processing a file with a bunch of short lines nor in taking 4 times as long as nominal when you have a very few very large lines the one-size-fits-all approach for a library would work fine.

If you can't make your function smart, you can create two or three differntvariations so as to optimize whatever metric you'd like. Having one function that reads chat room squawks (which are typically less than 32 characters), another that reads C code (which have an average around 40 or 50 characters and a max that is defined), and another that reads paragraphs from a novel (where dialogue paragraphs might be short, but descriptions [especially in Russian] can go on for several pages) with different optimization criteria would seem like a good idea.

For the ReadNovelLine function, it would seem reasonable to have an I/O argument that indicated what kind of a thing was expected and what you actually got. If you wanted to get cute, you could even have a secret circular array of the last so many line lengths. Do you want to do it or to do it inteligently?
 
J

James Kuyper

On 10/20/2011 04:32 PM, Michael Angelo Ravera wrote:
....
The reason that you code fails is that you can't generally realloc
what you haven't [m|c|0]alloc'd.

What's the 0 for? 0alloc'd?

You are also allowed to realloc() a null pointer, or a pointer returned
by a previous successful call to realloc() that has not yet been free()d
or successfully realloc()d again. As far as I can see, those are the
only two things this code ever realloc()s. Have I missed something?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top