String not printing data on next line despite \n in the string

S

SRK

Hi folks,
I am trying to read some data from a config file and want that data to
be printed in formatted way foe example if I have this string in the
config file - ABC PQR XYZ and want to display it like

ABC
PQR
XYZ

for that I have put the above mentioned string as ABC \nPQR \nXYZ

But instead of displaying the next string on next line, when I print
the data, it prints as ABC \nPQR \nXYZ.

Any help would be highly appriciated.

Thanks
SRK
 
C

Christof Warlich

SRK said:
Hi folks,
I am trying to read some data from a config file and want that data to
be printed in formatted way foe example if I have this string in the
config file - ABC PQR XYZ and want to display it like

ABC
PQR
XYZ
How is this:

Input file (tst.txt):

ABC PQR XYZ

Example code (tst):

#include <iostream>
int main() {
std::string word;
while(std::cin >> word) {
if(std::cin.eof()) break;
std::cout << word << std::endl;
}
}

Call as:

$ ./tst < tst.txt
 
S

SRK

Hi folks,
I am trying to read some data from a config file and want that data to
be printed in formatted way foe example if I have this string in the
config file - ABC PQR XYZ and want to display it like

ABC
PQR
XYZ

for that I have put the above mentioned string as ABC \nPQR \nXYZ

But instead of displaying the next string on next line, when I print
the data, it prints  as ABC \nPQR \nXYZ.

Any help would be highly appriciated.

Thanks
SRK

Let me mention that I am using FILE pointer for reading from the file
and using fgets for reading the entire line. I dont have to print the
string one by one, but I want to make it something like a menu and
send it on a socket.

thanks
SRK
 
B

BGB / cr88192

Paavo Helde said:
You need to convert two symbols (backslash and n) to a single symbol
(line-feed '\n').

If this is not very performance-critical, you can do it in-place:

std::string s = "ABC\\nPQR\\nXYZ\\n"; // contains backslash and n

std::string::size_type k = 0;
while((k=s.find("\\n", k))!=s.npos) {
s.replace(k, 2, "\n");
}

// s now contains linefeed characters instead.

or, maybe the good old char-pointers strategy:
char tb[256]; //or whatever is the maximum sane length
char *s, *t;

s=input; t=tb;
while(*s)
{
if((*s=='\\') && (*(s+1)=='n'))
{ *t++='\n'; s+=2; continue; }
*t++=*s++;
}
*t++=0;


granted, one can debate whether or not this is good style in C++, ... but,
should work ok.
 
B

BGB / cr88192

Paavo Helde said:
Paavo Helde said:
(e-mail address removed):

Hi folks,
I am trying to read some data from a config file and want that data
to be printed in formatted way foe example if I have this string in
the config file - ABC PQR XYZ and want to display it like

ABC
PQR
XYZ

for that I have put the above mentioned string as ABC \nPQR \nXYZ

But instead of displaying the next string on next line, when I print
the data, it prints as ABC \nPQR \nXYZ.

You need to convert two symbols (backslash and n) to a single symbol
(line-feed '\n').

If this is not very performance-critical, you can do it in-place:

std::string s = "ABC\\nPQR\\nXYZ\\n"; // contains backslash and n

std::string::size_type k = 0;
while((k=s.find("\\n", k))!=s.npos) {
s.replace(k, 2, "\n");
}

// s now contains linefeed characters instead.

or, maybe the good old char-pointers strategy:
char tb[256]; //or whatever is the maximum sane length

This is a buffer overrun error waiting to happen (or being exploited). At
least one should check the length, or allocate a buffer long enough. In
this case this should be easy.

but, it is worth noting that new/malloc and delete/free are not free
either...
so, the possibility of a buffer overflow may sometimes be justifiable in the
name of performance...

there is also a reason for a value like 256 rather than, say, 82.
we could declare, "well, no valid text file has > 80 characters per line",
and use 82 (allowing for a newline and a nul), but 256 adds a little more
padding.

granted, a 256 char line will overflow this buffer...

granted:
char *buf;
buf=(char *)malloc(strlen(input)+1);
is easy enough...


in practice, I usually use an alternative strategy I call a "rotating
allocator", where usually the potential cost of a buffer overflow is fairly
low, and a rotating allocator is not readily exploitable (there is little to
say where the string will be in memory), ...

This is a C solution, and assumes C strings (zero-terminated, no embedded
zeroes). Probably this is a harmless assumption, but nevertheless it is
slightly different, and makes the coding a bit more convenient (no extra
care needed when checking *(s+1)).

this is partly why I said "good old char pointers"...

The same algorithm can be made to work with std::string as well of
course, by extracting the C-style string pointer via the c_str() member
function:


std::string input = ...
std::string output(input.length(), '\0');

if (!input.empty()) {
const char* s = input.c_str();
char* t = &output[0];

// C-style algorithm here...

output.resize(t-&output[0]);
}

granted, but I guess the question then is whether or not someone is using
std::string...


admitted, yes, I am more of a C coder than a C++ one (I use C++ sometimes,
but a majority of my code is C), and I tend to prefer strategies which work
fairly well in both cases...

(not wanting to make debate here, but there are reasons to choose one or
another in different contexts, many not particularly relevant to the
language as seen/written by humans, and many not related to the "majority"
of projects).
 
B

Branimir Maksimovic

SRK said:
Let me mention that I am using FILE pointer for reading from the file
and using fgets for reading the entire line. I dont have to print the
string one by one, but I want to make it something like a menu and
send it on a socket.

thanks
SRK
Hm,
ifstream ifs("file.conf");

string tmp;
while(getline(ifs,tmp))
{
istringstream iss(tmp);
while(iss>>tmp)cout<<tmp<<'\n';
}

this code does not compile,,,, it is just example of one of ways you can
do that....

Greets
 
B

Branimir Maksimovic

Paavo said:
I cannot see any reason to knowingly leave a potential UB bug in the
program. God knows there are many of them already left unknowingly, no
reason to add one!

Hm, microsoft had practice to allow write in deallocated memory
in order for some important applications to work on windows.
I don;t see how this is problem. Besides that it is excellent
idea to provide COM interface to download binary code in such
environment from the internet..

Greets
 
B

Branimir Maksimovic

Branimir said:
Hm, microsoft had practice to allow write in deallocated memory
in order for some important applications to work on windows.
I don;t see how this is problem. Besides that it is excellent
idea to provide COM interface to download binary code in such
environment from the internet..
Also on 32 bit windows you can;t touch last bit in any pointer
because some directx driver use it to determine if it's handle
or pointer. So in 32 bit windows touching anything above 2gb is no no...

Greets
 
B

BGB / cr88192

Paavo Helde said:
I hope you are joking!

Now, seriously, the unexpected things like the size of input data come
from
the outside of the program, by definition. Input/output is typically slow
enough that a check for the input data size would cost next to nothing. I
see *no* justification of skipping that! Note that I do not advocate
dynamic allocation, but just a simple check and error return.

there are many cases where strings-based processing may need to be done
purely in memory, and in a performance-critical location.

consider for example, a program drives many parts of its logic via in-memory
command strings.
in such cases, allocating or freeing memory, or sometimes even basic sanity
checking (such as checking that a passed in pointer is not NULL, or that a
string is not too long and does not contain invalid characters, ...), can
risk notably slowing down the app.

for example, I have an x86 interpreter (an interpreter for 32-bit x86
machine code) where, of all things, the main opcode decoder, is based
primarily on strings-based logic code (although it is optimized some via
dynamically-built dispatch-tables and hashing).

another case of strings based logic is in many auto-codegen functions (which
use ASCII command-strings to generate machine code to further drive the app
logic, or build parts of the app's logic-code at runtime).

similarly, this kind of thing may allow many aspects of the apps' logic to
be "human readable" (or, at least as much as a big mass of ASCII characters
can be...), which is much nicer for debugging than having to sort through
binary data (for example, in the form of hexdumps or base64 dumps, ...).


similarly, both my object system and XML DOM code use lots of strings
handling code, and could also risk slowing things down (one may even end up
going so far as to pre-compute hash keys after noting that a notable amount
of time was going into simply recalculating the hash value during lookups).


granted, in the OP's case, performance is probably not all that important...

I cannot see any reason to knowingly leave a potential UB bug in the
program. God knows there are many of them already left unknowingly, no
reason to add one!

these can be "boundary conditions", and are normally weeded out elsewhere.

but, alas, there may be a lot of consideration as to whether it is better to
leave a possible bug, or fix it so that it doesn't risk crashing or posing a
possible security hole.

one can then use the debugger and test cases to determine how generally
reliable the code is (AKA: how many bits of bad data can escape through the
proper "safety nets", as well as how well everythings actually works), and
profilers to determine where optimization is needed.
 
J

James Kanze

Hm, microsoft had practice to allow write in deallocated
memory in order for some important applications to work on
windows.

In the earliest versions of C (pre-standard), the rule was that
the pointer to realloc had to be the last pointer that was
freed. In those days, it was considered acceptable to use freed
memory up until the next call to malloc.

In those days, of course, there was no multithreading, and
programs weren't connected to the internet.
I don;t see how this is problem.

Using a dangling pointer is a serious security hole.
 
B

BGB / cr88192

James Kanze said:
In the earliest versions of C (pre-standard), the rule was that
the pointer to realloc had to be the last pointer that was
freed. In those days, it was considered acceptable to use freed
memory up until the next call to malloc.

In those days, of course, there was no multithreading, and
programs weren't connected to the internet.


Using a dangling pointer is a serious security hole.

as well as a serious crash hazard, IMO...


granted, I am a little less concerned over buffer overflows, granted, they
may be a bit more of a worry if the app actually matters as far as security
goes (connected to the internet, getting input from "untrusted" sources,
....).

even then, it is not often "as bad" in practice, for example, for calls like
'fgets()' one supplies the maximum string length anyways (typically a few
chars less than the buffer size), so this much is self-limiting. one can
know the call will not return an oversize string, since it will be cut off
and returned on the next line.

in many other cases, one knows the code that both produces and accepts the
strings, and so can know that code further up the chain will not exceed the
limit.

as well, 256 is an "accepted" maximum string length (a tradition since long
past that something is seriously wrong if a string is longer than this).

much like how something is wrong if a line in a text file is longer than 80
chars, and it is usually best to limit output to 76 chars just to be safe...
(except in certain file formats, where longer lines tend to pop up a
lot...).


this does allow "some" safety with fixed-size char arrays, which is good
since these are one of the fastest ways I know of to implement certain
string operations.

 
B

BGB / cr88192

Paavo Helde said:
Why "few chars less"? Because you are not sure in the documentation?
Or in yourself?

had to go check the documentation...

actually, I had thought the N was the max number of chars to read, excluding
the '\n' and the 0.
apparently the N is adjusted automatically...

oh well...

doesn't matter too much if there is an occasional fgets around with an N of
254...

Accepted by who? I'm serving 10MB HTTP packets through std::string so I'm
sorry I have never heard of this convention. (There was a 256-char string
limitation in Turbo Pascal 3.3, but fortunately this is about 15 years in
history ;-)

"accepted" by traditional practice.

typically, constants like PATH_MAX, ... are 256.
it doesn't take long (for example, if one digs around in system headers),
before a string-length limit of 256 becomes a recurring pattern (even if
there are variations, for example, UNIX_MAX is 108, but I think this is
because of a general rule that (sizeof(sockaddr_storage)==128) or so, ...).

there are many other examples of this particular limit being in use.


it is an accepted limit, much like how i, j, and k, are accepted names for
integer variables, ...

granted, sometimes one wants a bigger limit though, and sometimes a bigger
limit is used (as there is no real technical reason for this particular
limit apart from convention), ...


I once wrote an HTTP server though, and requests with longer strings kept
comming from nowhere (mostly a string of repreating characters with some
garbage at the end), so in that case I made the limit 1024 and also put in a
limit check. (it can be noted that I think a lot of them were like 256 A's
followed by the garbage...).

luckily though, any buffer overflow exploits intended for one server are
likely to do little more than crash another...

It seems you are confusing the human interface with the program
interface.

either way, this limit is established, as a sort of rule of convention for
most well-formed text files.
it is much like how, by convention, a programmer should not write code with
lines longer than this limit.

Using fixed-size arrays does not mean you may skip the check if the data
fits in there. Actually, if the input is not verified and comes from
outside of the program, then it is ridiculous to not check its size. The
cost of doing that is zero, as compared to the time of getting the data
from outside into the program.

granted, external disk IO is usually measurable at around 20 MB/s IME.

however, there is a lot which often happens "within" the program, say, when
ones' app is divided up into lots of DLL's which do lots of their internal
communication via data serialized as strings, ...

one component will produce streams of text as its output, and another
component will parse them and follow embedded commands. many tasks may
involve many stages of processing of this sort (in addition to the use of
binary API's, ...).


nevermind that, in many of these cases, ANY unsafe input would be a security
risk, even if it does fit nicely into the buffers. the reason here being
that many of these facilities actually have access to features which are
either turing complete in their own right (yeah, this property tends to pop
up a lot...), or have access to code-generation machinery.

consider for example one has a text-stream "eval" mechanism. outside access
to eval is dangerous even if the text itself is well-formed, since eval will
generally allow whatever code hits it to much around with the app (unless of
course the eval is sandboxed, but I am assuming here it is not...).


similar goes if several components are connected via a stream in a
PostScript like format, and, say, some input goes over which fouls up the
command-interpreter, creates an infinite loop, or worse.

trivial example: "/foo {foo} def foo"

granted, this trivial case could be handled by detecting a stack overflow,
but in the general case, it would be difficult to secure even with input
validation...

I guess many viruses are in dept to guys like you when the "internally
safe" code somehow gets re-used and exploited in the wild.

or it could be just like expecting to check that pointers always point to
valid addressable memory (say, if one is using a garbage collector with the
ability to validate that a pointer is a heap pointer).

often, it would be too expensive, and too much of a hassle, to check these
things as a general matter of practice.

so, a tradeoff is made:
we assume that the caller is passing valid data, and typically check either
in code which is not likely to be a bottleneck, or where the "safety" of the
other end is not ensured.

typically, validity checking will be done when: performing file IO, dealing
with a network connection, or implementing or dealing with a public API.


if none of these is being done (for example, all this is stuff going on
purely internal to the app, which could happen easily enough) then there may
not be a need to validate.
 
S

stan

You're young and your history seems to stem from the micro world.
Limits date from a time when really long punch cards were hard to deal
with and they didn't fit the reader. :)
"accepted" by traditional practice.

typically, constants like PATH_MAX, ... are 256.

POSIX and most linux disagree, check linux/limits.h.
it doesn't take long (for example, if one digs around in system headers),
before a string-length limit of 256 becomes a recurring pattern (even if
there are variations, for example, UNIX_MAX is 108, but I think this is
because of a general rule that (sizeof(sockaddr_storage)==128) or so, ...).

there are many other examples of this particular limit being in use.

Calling this "accepted" or "common" is a stretch. Basic, Pascal, and
environments that represented strings with embedded size were limited
but even in research unix the constant was often larger even with real
core memory constraints. It's hard to find or imagine a modern
unix/linux/Gnu app with a 256 limit.

Windows has some real ideas about accepted string limits scattered
around randomly but I can't agree that either the concept of limits or
even a common 256 default exists in c and it's nonsense for c++.
it is an accepted limit, much like how i, j, and k, are accepted names for
integer variables, ...

Mostly habit from Fortran, where it was part of the language and not
just a convention, that was passed on through generations.
granted, sometimes one wants a bigger limit though, and sometimes a bigger
limit is used (as there is no real technical reason for this particular
limit apart from convention), ...

You've mentioned games many times, so maybe in that domain this is a
convention. In many other fields this doesn't wash, and it seems you
may be crossing up c and c++ to boot.
 
B

BGB / cr88192

stan said:
You're young and your history seems to stem from the micro world.
Limits date from a time when really long punch cards were hard to deal
with and they didn't fit the reader. :)

possibly, but not that young anymore...

but, yeah, admittedly AFAIK punch cards went out of style decades before I
was born (which was I guess during the high point of 5.25" floppies and the
IBM PC, which were themselves a rapidly dying technology by the time I was
really old enough to do much, in the days of the dying DOS and rise of
Windows...).

but, now, much time has passed, and age is setting in...

POSIX and most linux disagree, check linux/limits.h.

odd, I had seen PATH_MAX as 256, but then again, I am in Windows-land...

Calling this "accepted" or "common" is a stretch. Basic, Pascal, and
environments that represented strings with embedded size were limited
but even in research unix the constant was often larger even with real
core memory constraints. It's hard to find or imagine a modern
unix/linux/Gnu app with a 256 limit.

ok.


Windows has some real ideas about accepted string limits scattered
around randomly but I can't agree that either the concept of limits or
even a common 256 default exists in c and it's nonsense for c++.

granted.

it depends on usage I guess, since it is worth noting that a longer limit
usually means using up more space on the stack, and stack space is not
exactly free...

likewise, heap isn't exactly free either, and allocating/freeing memory can
hurt performance if done poorly (such as in a function which is called in a
loop).

as others have noted, it may not be a big issue if one is processing input
which comes from disk, but I guess it is a question of what and how much
comes from disk, and how much is being shoved around intra-app (say, for
inter-component communication, ...).

Mostly habit from Fortran, where it was part of the language and not
just a convention, that was passed on through generations.

yep.
granted, it is not good to defy traditions though, since usually things are
some particular way for a good reason...

You've mentioned games many times, so maybe in that domain this is a
convention. In many other fields this doesn't wash, and it seems you
may be crossing up c and c++ to boot.

I use both C and C++, though generally more C than C++.

note that, for example, in Quake 2, most string limits are shorter than
this, for example, 16 and 64 character string limits are common (for
example, QPATH_MAX is defined as 64, ...).


I also deal a lot with VM type stuff:
interpreters, JIT compilers, ... where basically the compiler may be working
(say, compiling code fragments, ...) at the same time as other parts of the
application are doing other tasks, ...

interpreting code, and wasting time in an interpreter, can easily kill
performance. an interpreter, for example, often has to cut a lot of corners
in an attempt to keep speed up (and, even then, interpreters still tend to
be rather slow, hence the usage of JIT in many cases, but then one needs to
have relatively fast compiler machinery, ...).


I typically use larger limits, but usually the 256-char limit is for a
single token (in parsing).

for buffers which may deal with globs of text, I usually use either larger
limits, expanding buffers, or size limit checks.

I usually use a limit of around 1024 or so for name-mangled tokens (say,
when the function name and signature are mangled together for linker-related
purposes, ...).

I think by convention though, PE/COFF informally has a limit of 256 here
(for valid function names), and the C standard has a limit around 32 (for a
minimum allowed implementation limit), where a longer name is not required
to be necessarily valid in a conforming compiler (the usual idea being that
the identifier would be truncated).

granted, this should not be a problem so long as one is not using the
MyFunctionNameIsDamnNearAWholeParagraph naming scheme...

granted, it is worth noting though that some bulk to names is usually added
as a result of using a naming convention which tends to add library and
subsystem prefixes to exported names (except public API functions, which
tend to have a much shorter prefix).

this is common in C and mixed C & C++ codebases, given the non-availability
of namespaces, ...


it is also common practice (in C) not to wrap strings in any sort of
container, since this makes things typically more awkward, and generally
hurts performance (say, due to added pointer indirections, function calls,
....).

or such...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top