std::getline() behaves differently between platforms?

J

JML

Hi,

I have some code which parses a text file and creates objects based on
what is in the text file. The code works just fine on Windows, but when
I compile it using XCode on OS X the parsing goes all wrong. Is there
some known differences with file handling on OS X?

My code is quite long, but one of the defect parts looks like this
(sorry about the indentation - I'm new to posting code on a newsgroup):

//Begin adding NPCs, exits and collision boxes
while ( std::getline(filestream, str) ) {

//Look for regular NPC
if (str == "[NPC]") {
std::cout << "Found NPC. \n";
std::string name;
int x, y, w, h, co_x, co_y, co_w, co_h;
filestream >> name >> x >> y >> w >> h >> co_x >> co_y >> co_w >> co_h;
m_NPCList.push_back( CActor( this, name, x, y, w, h, co_x, co_y, co_w,
co_h ) );

//Look for a path
bool foundPath = false;
std::getline(filestream, str); //Finish previous line
std::getline(filestream, str);

if ( str == "[PATH:LOOP]" ) {
std::cout << "Looped path!\n";
foundPath = true; m_NPCList[m_NPCList.size()-1].SetFollowMethod( 0 );
}

if ( foundPath ) {
std::getline(filestream, str);
while ( str != "[END]" ) {
int x, y;
std::string::size_type loc = str.find( " ", 0 );
std::istringstream x_string(str.substr(0, loc));
std::istringstream y_string(str.substr(loc+1, str.length()-1));
x_string >> x;
y_string >> y;
m_NPCList[m_NPCList.size()-1].AddPathNode( CPoint( x, y ) );
std::getline(filestream, str);
}
}
}
}

On Windows the code parses a file with this content just fine:
[NPC]
Batman 100 100 32 32 8 16 16 8
[PATH:LOOP]
100 100
200 100
200 200
100 200
[END]

But on OS X it goes wrong at around here:
std::getline(filestream, str); //Finish previous line
std::getline(filestream, str);
if ( str == "[PATH:LOOP]" ) {
 
H

hsmit.home

Hi,

I have some code which parses a text file and creates objects based on
what is in the text file. The code works just fine on Windows, but when
I compile it using XCode on OS X the parsing goes all wrong. Is there
some known differences with file handling on OS X?

My code is quite long, but one of the defect parts looks like this
(sorry about the indentation - I'm new to posting code on a newsgroup):

//Begin adding NPCs, exits and collision boxes
while ( std::getline(filestream, str) ) {

//Look for regular NPC
if (str == "[NPC]") {
std::cout << "Found NPC. \n";
std::string name;
int x, y, w, h, co_x, co_y, co_w, co_h;
filestream >> name >> x >> y >> w >> h >> co_x >> co_y >> co_w >> co_h;
m_NPCList.push_back( CActor( this, name, x, y, w, h, co_x, co_y, co_w,
co_h ) );

//Look for a path
bool foundPath = false;
std::getline(filestream, str); //Finish previous line
std::getline(filestream, str);

if ( str == "[PATH:LOOP]" ) {
std::cout << "Looped path!\n";
foundPath = true; m_NPCList[m_NPCList.size()-1].SetFollowMethod( 0 );

}

if ( foundPath ) {
std::getline(filestream, str);
while ( str != "[END]" ) {
int x, y;
std::string::size_type loc = str.find( " ", 0 );
std::istringstream x_string(str.substr(0, loc));
std::istringstream y_string(str.substr(loc+1, str.length()-1));
x_string >> x;
y_string >> y;
m_NPCList[m_NPCList.size()-1].AddPathNode( CPoint( x, y ) );
std::getline(filestream, str);

}
}
}
}

On Windows the code parses a file with this content just fine:
[NPC]
Batman 100 100 32 32 8 16 16 8
[PATH:LOOP]
100 100
200 100
200 200
100 200
[END]

But on OS X it goes wrong at around here:
std::getline(filestream, str); //Finish previous line
std::getline(filestream, str);
if ( str == "[PATH:LOOP]" ) {

I didn't read your entire message (busy at work right now), but,
generally these type of problems occur due to line terminator
differences between platforms. \r\n for windows and I'm not sure what
it is for MAC OS X (\n\r???, or simply \n). If you have a hex editor,
have a look at how the line terminators are set in the file.

You may also try opening the file in text mode and see what happens,
and then try opening it in binary mode and see what happens.

Now back to work...
 
O

Ole Nielsby

[...]
generally these type of problems occur due to line terminator
differences between platforms. \r\n for windows and I'm not
sure what it is for MAC OS X (\n\r???, or simply \n).

AFAIK it's \r\n for windows, \n for unix, and \r for Mac.

I can think of 3 options for making it work:

1. Demand that files be converted to the lf convention of the platform,
and open the files in text mode. (Drawback: files have to be converted
when moved between platforms)

2. Settle on one of the conventions and open the files in binary mode.
(Drawback: files must be authored in an editor that supports that
convention, or converted).

3. (what I'd prefer) Write a line-getter that copes with all 3 conventions.
(Drawback: slightly messy and bloated)
 
J

James Kanze

[...]
generally these type of problems occur due to line terminator
differences between platforms. \r\n for windows and I'm not
sure what it is for MAC OS X (\n\r???, or simply \n).
AFAIK it's \r\n for windows, \n for unix, and \r for Mac.

More correctly, it's CRLF for Windows, LF for Unix, and (was, at
least) CR for Mac. At the disk level. Within a program, the
line terminator is always '\n'. (Note that to add to the fun,
some Windows programs use CRLF as a line *separator*, not a line
*terminator*.)
I can think of 3 options for making it work:
1. Demand that files be converted to the lf convention of the platform,
and open the files in text mode. (Drawback: files have to be converted
when moved between platforms)
2. Settle on one of the conventions and open the files in binary mode.
(Drawback: files must be authored in an editor that supports that
convention, or converted).
3. (what I'd prefer) Write a line-getter that copes with all 3 conventions..
(Drawback: slightly messy and bloated)

In general, I would say that any serious program today that
handles text input generated by an editor should use 3 (except
that you probably don't need to worry about a special case for
Max). From experience, if I'm editing files, I'll use 2, simply
because Windows programs seem to be (on the average) more
tolerant about this than Unix programs. But if you're asking
your users to edit the files, then it's probably not an option:
while every serious developer I've ever met uses either emacs or
vim, neither are really very popular among everyday users (to
put it mildly).

Note that using 3 isn't anywhere near as messy and bloated as it
sounds. CR, as you mentionned, maps to '\r', and isspace('\r')
should return true. And since you'll normally want to trim
trailing whitespace from the lines anyway...

The real problem with 3 is that you can't use it reading from
cin.

(Note that 1 is fine *if* you're actually moving the files, as
individual files. FTP, also supports two modes. It becomes
more of a problem if you're transfering them in a compressed
archive, and downright impossible if you're reading them over a
remote mounted file system.)
 
J

JML

James said:
More correctly, it's CRLF for Windows, LF for Unix, and (was, at
least) CR for Mac. At the disk level. Within a program, the
line terminator is always '\n'. (Note that to add to the fun,
some Windows programs use CRLF as a line *separator*, not a line
*terminator*.)

I tried writing a new .txt file with an editor in OS X, and then the
file parses just fine on the OS X build. Yay for that, but it is not an
optimal solution of course. I think the solution must be to write a new
line-getter, that handles the different line terminators.
In general, I would say that any serious program today that
handles text input generated by an editor should use 3 (except
that you probably don't need to worry about a special case for
Max).
Note that using 3 isn't anywhere near as messy and bloated as it
sounds. CR, as you mentionned, maps to '\r', and isspace('\r')
should return true. And since you'll normally want to trim
trailing whitespace from the lines anyway...

As I'm not the most experienced guy in writing text parsers in C++,
could you guys give me some more concrete advice on how to create a
better line-getter? :)
 
B

Bo Persson

JML" <"marcus]FJERN[ wrote:
:: James Kanze wrote:
::: More correctly, it's CRLF for Windows, LF for Unix, and (was, at
::: least) CR for Mac. At the disk level. Within a program, the
::: line terminator is always '\n'. (Note that to add to the fun,
::: some Windows programs use CRLF as a line *separator*, not a line
::: *terminator*.)
::
:: I tried writing a new .txt file with an editor in OS X, and then
:: the file parses just fine on the OS X build. Yay for that, but it
:: is not an optimal solution of course. I think the solution must be
:: to write a new line-getter, that handles the different line
:: terminators.

Why?

Isn't the solution to have proper text files on each platform?

Note that, on some systems, the physical file doesn't store any
terminator at all. It uses a hidden length value instead. It might
also store the text in EBCDIC.

Even so, std::getline works with a '\n' line terminator.


Bo Persson
 
J

James Kanze

JML" <"marcus]FJERN[ wrote:
:: James Kanze wrote:
::: More correctly, it's CRLF for Windows, LF for Unix, and (was, at
::: least) CR for Mac. At the disk level. Within a program, the
::: line terminator is always '\n'. (Note that to add to the fun,
::: some Windows programs use CRLF as a line *separator*, not a line
::: *terminator*.)
:: I tried writing a new .txt file with an editor in OS X, and then
:: the file parses just fine on the OS X build. Yay for that, but it
:: is not an optimal solution of course. I think the solution must be
:: to write a new line-getter, that handles the different line
:: terminators.

Isn't the solution to have proper text files on each platform?

And what do you do with file systems that are mounted on two
different platforms. Most of the time I'm working under
Windows, the files are actually on a Unix machine somewhere,
being served up by Samba. And they're being read by Unix
machines at the same time.
Note that, on some systems, the physical file doesn't store any
terminator at all. It uses a hidden length value instead. It might
also store the text in EBCDIC.

True. You don't remote mount files systems with those, and the
file transfer program does (or should) take care of any mapping;
that's why FTP also has two modes.
Even so, std::getline works with a '\n' line terminator.

I'll admit that I don't see too much of a problem either. In
practice, there are two situations. If you've copied the files,
you should have remapped during the copy, so you'll have the
native separator. And remote mounting, in practice, means just
Unix and Windows: the Windows implementations I know of have no
problem with Unix line endings, and the Unix implementations
simply pass the CR up as a '\r', which is white space, and
should just get ignored. In practice, it's just not a problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Crossword 2
TF-IDF 1
Lexical Analysis on C++ 1
getline problem 4
Rearranging .ply file via C++ String Parsing 0
Crossword 14
Python code problem 2
Pyautogui, cv2 and cannot find image 0

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top