hidden characters in source code causing compiler grief

G

Gary

I believe somehow I got some hidden characters in my source file that
is causing it to not compile. I think they got in there by Word
Processor formatting codes or perhaps when I got the code from a
friend via email. Any easy way to see if they are in there, like a
turn on hidden characters or something?
 
J

jfbode1029

I believe somehow I got some hidden characters in my source file that
is causing it to not compile. I think they got in there by Word
Processor formatting codes or perhaps when I got the code from a
friend via email. Any easy way to see if they are in there, like a
turn on hidden characters or something?

Without knowing what editor you are using, it's pretty difficult to
tell you how to get it to show you non-printing characters.

Can you tell us exactly what errors you are getting from your
compiler?
 
F

Flash Gordon

Gary said:
I believe somehow I got some hidden characters in my source file that
is causing it to not compile. I think they got in there by Word
Processor formatting codes or perhaps when I got the code from a
friend via email. Any easy way to see if they are in there, like a
turn on hidden characters or something?

This is really a topic about the tools available on your system, and I
have no idea what those are. Many moons ago I used edt, and I'm sure
that would do the job, but I doubt you have it. vim or emacs could
probably do it as well. For the future, I would advice you to use wor
processors for word processing and text editors for editing text!

Also, I would advise you to ask in a group or mailing list dedicated to
whatever editor you like using or, alternatively, a group dedicated to
programming on your OS about what useful tools are available.

Of you could write a C program to scan the file, for which the isprint()
function might well be useful.
 
V

vlsidesign

Without knowing what editor you are using, it's pretty difficult to
tell you how to get it to show you non-printing characters.

Can you tell us exactly what errors you are getting from your
compiler?

Thanks.. I am trying to help a neighbors kid. He and myself are both
on Ubuntu. So I was trying to help him a little bit with his homework.
I think he may have used Open Office Word, or when I emailed him back,
he may have cut and paste it from email or something.

I used both the Text Editor in Ubuntu and vi (vim) from the shell. I
can't see any characters, but when I delete some lines, and re-type
some of the code in as I see it, some of the errors go away.

Here are snapshot of the errors before my mods:

prime2.c:3:9: error: #include expects "FILENAME" or <FILENAME>
prime2.c:4:9: error: #include expects "FILENAME" or <FILENAME>
prime2.c:5: error: stray ‘\302’ in program
prime2.c:5: error: stray ‘\240’ in program
prime2.c: In function ‘main’:
prime2.c:7: error: stray ‘\302’ in program
prime2.c:7: error: stray ‘\240’ in program
prime2.c:7: error: stray ‘\302’ in program
<snip>

I then delete that top two include lines and retype it in and it seems
to resolve some of the errors and I get:

prime.c:3: error: stray ‘\302’ in program
prime.c:3: error: stray ‘\240’ in program
prime.c: In function ‘main’:
prime.c:5: error: stray ‘\302’ in program
prime.c:5: error: stray ‘\240’ in program
prime.c:5: error: stray ‘\302’ in program
<snip>

Here is the top part of the code:
#include <stdio.h>
#include <stdbool.h>
int main(void)
{
  int    number;
  int    d;
  _Bool  isPrime; // set to 0 if number is non­prime
  int maxNum;
  printf("Enter max number: ");
<snip>
 
J

jameskuyper

vlsidesign said:
....
Thanks.. I am trying to help a neighbors kid. He and myself are both
on Ubuntu.

On most Unix-like systems, you can use the command "od -c filename" to
get a dump of the file where non-ASCII characters are displayed as
escape sequences.
 
V

vlsidesign

On most Unix-like systems, you can use the command "od -c filename" to
get a dump of the file where non-ASCII characters are displayed as
escape sequences.

whoa... cool.

gford@lws02:~/Documents/scratch$ od -c prime.c
0000000 # i n c l u d e < s t d i
o .
0000020 h > \n # i n c l u d e < s t
d
0000040 b o o l . h > \n i n t 302 240 m a
i
0000060 n ( v o i d ) \n { \n 302 240 302 240 i
n
0000100 t 302 240 302 240 302 240 302 240 n u m b e
r ;
0000120 \n 302 240 302 240 i n t 302 240 302 240 302 240 302
240
0000140 d ; \n 302 240 302 240 _ B o o l 302 240 302
240
0000160 i s P r i m e ; 302 240 / / 302 240 s
e
0000200 t 302 240 t o 302 240 0 302 240 i f 302 240 n
u
0000220 m b e r 302 240 i s 302 240 n o n 302 255
p
0000240 r i m e \n 302 240 302 240 i n t 302 240 m
a
0000260 x N u m ; \n 302 240 302 240 p r i n t
f
0000300 ( " E n t e r 302 240 m a x 302 240 n
u
 
V

vlsidesign

On most Unix-like systems, you can use the command "od -c filename" to
get a dump of the file where non-ASCII characters are displayed as
escape sequences.

whoa... cool.

gford@lws02:~/Documents/scratch$ od -c prime.c
0000000 # i n c l u d e < s t d i
o .
0000020 h > \n # i n c l u d e < s t
d
0000040 b o o l . h > \n i n t 302 240 m a
i
0000060 n ( v o i d ) \n { \n 302 240 302 240 i
n
0000100 t 302 240 302 240 302 240 302 240 n u m b e
r ;
0000120 \n 302 240 302 240 i n t 302 240 302 240 302 240 302
240
0000140 d ; \n 302 240 302 240 _ B o o l 302 240 302
240
0000160 i s P r i m e ; 302 240 / / 302 240 s
e
0000200 t 302 240 t o 302 240 0 302 240 i f 302 240 n
u
0000220 m b e r 302 240 i s 302 240 n o n 302 255
p
0000240 r i m e \n 302 240 302 240 i n t 302 240 m
a
0000260 x N u m ; \n 302 240 302 240 p r i n t
f
0000300 ( " E n t e r 302 240 m a x 302 240 n
u
 
R

Rich Webb

Thanks.. I am trying to help a neighbors kid. He and myself are both
on Ubuntu. So I was trying to help him a little bit with his homework.
I think he may have used Open Office Word, or when I emailed him back,
he may have cut and paste it from email or something.

I used both the Text Editor in Ubuntu and vi (vim) from the shell. I
can't see any characters, but when I delete some lines, and re-type
some of the code in as I see it, some of the errors go away.

Here are snapshot of the errors before my mods:

prime2.c:3:9: error: #include expects "FILENAME" or <FILENAME>
prime2.c:4:9: error: #include expects "FILENAME" or <FILENAME>
prime2.c:5: error: stray ‘\302’ in program
prime2.c:5: error: stray ‘\240’ in program
prime2.c: In function ‘main’:
prime2.c:7: error: stray ‘\302’ in program
prime2.c:7: error: stray ‘\240’ in program
prime2.c:7: error: stray ‘\302’ in program
[snip...snip...]

Well, to on-topic this for c.l.c, why not write a small filter app that
ingests the suspect files and then reports on (and optionally deletes)
any characters that aren't in the standard character set. Perhaps a
combination of isprint() and iscntrl()?
 
L

luserXtrog

Thanks.. I am trying to help a neighbors kid. He and myself are both
on Ubuntu. So I was trying to help him a little bit with his homework.
I think he may have used Open Office Word, or when I emailed him back,
he may have cut and paste it from email or something.

I used both the Text Editor in Ubuntu and vi (vim) from the shell. I
can't see any characters, but when I delete some lines, and re-type
some of the code in as I see it, some of the errors go away.

Here are snapshot of the errors before my mods:

prime2.c:3:9: error: #include expects "FILENAME" or <FILENAME>
prime2.c:4:9: error: #include expects "FILENAME" or <FILENAME>
prime2.c:5: error: stray ‘\302’ in program
prime2.c:5: error: stray ‘\240’ in program
prime2.c: In function ‘main’:
prime2.c:7: error: stray ‘\302’ in program
prime2.c:7: error: stray ‘\240’ in program
prime2.c:7: error: stray ‘\302’ in program
<snip>

I then delete that top two include lines and retype it in and it seems
to resolve some of the errors and I get:

prime.c:3: error: stray ‘\302’ in program
prime.c:3: error: stray ‘\240’ in program
prime.c: In function ‘main’:
prime.c:5: error: stray ‘\302’ in program
prime.c:5: error: stray ‘\240’ in program
prime.c:5: error: stray ‘\302’ in program
<snip>

Here is the top part of the code:
#include <stdio.h>
#include <stdbool.h>
int main(void)
{
  int    number;
  int    d;
  _Bool  isPrime; // set to 0 if number is non­prime
  int maxNum;
  printf("Enter max number: ");
<snip>

man iso_8859-1 identifies \240 as
NO-BREAK SPACE
and \302 as
LATIN CAPITAL LETTER A WITH CIRCUMFLEX

It seems likely that the file passed through the grubby hands of a
word processor. If one must be used, it may be found useful to select
a very primitive "terminal" font. This may help by making non-ASCII
character stick out like a sore thumb.

For that matter, they should stick out in vi. For me, \302 looks like
a lowercase y with 2 dots, and \240 looks like a Russian cursive d.

I don't see either of these on the corresponding snippet. Was it
pasted or retyped?
 
B

Ben Bacarisse

vlsidesign said:
whoa... cool.

gford@lws02:~/Documents/scratch$ od -c prime.c
0000000 # i n c l u d e < s t d i
o .
0000020 h > \n # i n c l u d e < s t
d
0000040 b o o l . h > \n i n t 302 240 m a

In these days of UTF-8 od -c falls a little short. The two bytes with
octal values 302 and 240 are, together, the UTF-8 encoding of U+00A0
otherwise know as "NO-BREAK SPACE".
 
V

vlsidesign

man iso_8859-1 identifies \240 as
NO-BREAK SPACE
and \302 as
LATIN CAPITAL LETTER A WITH CIRCUMFLEX

It seems likely that the file passed through the grubby hands of a
word processor. If one must be used, it may be found useful to select
a very primitive "terminal" font. This may help by making non-ASCII
character stick out like a sore thumb.

For that matter, they should stick out in vi. For me, \302 looks like
a lowercase y with 2 dots, and \240 looks like a Russian cursive d.

I don't see either of these on the corresponding snippet. Was it
pasted or retyped?

shell$ more prime.c //I more prime.c and then I copy and paste to
below

#include <stdio.h>
#include <stdbool.h>
int main(void)
{
  int    number;
  int    d;
  _Bool  isPrime; // set to 0 if number is non­prime
  int maxNum;
  printf("Enter max number: ");
<snip>
 
B

Ben Bacarisse

Many systems will show a space since the combination is the UTF-8
encoding of a no-breaking space.
shell$ more prime.c //I more prime.c and then I copy and paste to
below

#include <stdio.h>
#include <stdbool.h>
int main(void)
{
  int    number;
  int    d;
  _Bool  isPrime; // set to 0 if number is non­prime
  int maxNum;
  printf("Enter max number: ");

My news reader happens to show me all the no-break spaces in that
snippet because I have configured it to, but Ubuntu, being a UTF-8
based system, will probably keep showing you a space.

There are lots of ways to fix this code as a once-off but the key is
to avoid using any tools that can put unusual characters into your
code to start off with. I heard your neighbour used a word processor
to write the program. This is not a good idea. You know what a food
processor does to food, right?
 
L

luserXtrog

shell$ more prime.c    //I more prime.c and then I copy and paste to
below

Right on! I take it back. But do they show in vi? You should
be able to substitute them away :%s/^V302//g :%s/^V240//g
(where ^V is ctrl-V, of course:).
 
C

CBFalconer

Ben said:
.... snip ...


In these days of UTF-8 od -c falls a little short. The two bytes
with octal values 302 and 240 are, together, the UTF-8 encoding of
U+00A0 otherwise know as "NO-BREAK SPACE".

However, the point is that those characters have no business in C
source. The OPs source can probably get away with editing with
Notepad.
 
V

vlsidesign

vlsidesign said:
Many systems will show a space since the combination is the UTF-8
encoding of a no-breaking space.





My news reader happens to show me all the no-break spaces in that
snippet because I have configured it to, but Ubuntu, being a UTF-8
based system, will probably keep showing you a space.

There are lots of ways to fix this code as a once-off but the key is
to avoid using any tools that can put unusual characters into your
code to start off with.  I heard your neighbour used a word processor
to write the program.  This is not a good idea.  You know what a food
processor does to food, right?

Good point. I typed some words in text editor and saved it. Then I
typed the same letters in a word processor and then saved it. I then
showed him the size difference of each and explained the word
processor has some hidden formatting type codes, and that sort of
stuff, even though they both look the same.
 
K

Keith Thompson

vlsidesign said:
For me, when I am using vi in Ubuntu, I can't see them.

<OT>
env LANG=C vi filename.c

vi's behavior is affected by the current locale settings. Ubuntu's
default is to use UTF-8. Setting the environment variable "LANG" to
"C" changes this behavior, so the two-byte sequence that is the UTF-8
representation of NO-BREAK SPACE is no longer interpreted that way.

See also the "cat -A" or "cat -v" command ("man cat" for details).
</OT>
 
V

vlsidesign

Right on! I take it back. But do they show in vi? You should
be able to substitute them away :%s/^V302//g  :%s/^V240//g
(where ^V is ctrl-V, of course:).

I used this command from poster (thanks)
env LANG=C vi filename.c

and now it looks like this:

#include <stdio.h>
#include <stdbool.h>
int| main(void)
{
| | int| | | | number;
| | int| | | | d;
| | _Bool| | isPrime;| //| set| to| 0| if| number| is| non­prime
| | int| maxNum;
<snip>

I tried both your substitutions but they didn't work. I also tried the
pipe character and it doesn't work. I tried in vi mode, and env LANG=c
vi mode as well.
 
L

luserXtrog

I used this command from poster (thanks)
env LANG=C vi filename.c

and now it looks like this:

#include <stdio.h>
#include <stdbool.h>
int| main(void)
{
| | int| | | | number;
| | int| | | | d;
| | _Bool| | isPrime;| //| set| to| 0| if| number| is| non­prime
| | int| maxNum;
<snip>

I tried both your substitutions but they didn't work. I also tried the
pipe character and it doesn't work. I tried in vi mode, and env LANG=c
vi mode as well.

I at a loss to fathom what is going on here.
I cannot imagine whence the pipe (| vertical bar character) entered
either the file in question or the conversation as I have followed it.

Have you tried ed? Surely it's still limited to ascii by default?!

I was able to remove the bars from the snippet by pasting into vi (in
insert mode) and typing :%s/|//g

I do not intend the following question to be offensive, merely
desperate: did you happen to type the bar between the second and third
slash (/ forward slanting bar character)?

Violently scratching scalp...
 
B

Ben Bacarisse

vlsidesign said:
I used this command from poster (thanks)
env LANG=C vi filename.c

and now it looks like this:

#include <stdio.h>
#include <stdbool.h>
int| main(void)
{
| | int| | | | number;
| | int| | | | d;
| | _Bool| | isPrime;| //| set| to| 0| if| number| is| non­prime
| | int| maxNum;
<snip>

I tried both your substitutions but they didn't work. I also tried the
pipe character and it doesn't work. I tried in vi mode, and env LANG=c
vi mode as well.

tr -d '\302' <filename.c | tr '\240' ' ' >clean.c

will do it, but you'd have had that answer much faster in any unix
group. (My utf-8-dump utility can also so this but that is even more
off topic.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top