variable-length strings

A

Alan Curry

fclose(fp);
return 0;
}

// cc -Wall -Wextra cmp5.c -o cmp
$

Right here, where the C code ended... would have been a good time to start a
new post on another group.
I'm trying to get my head around what Alan Curry wrote about this:

"Octal 15 (ASCII CR) inserted before octal 12 (ASCII LF). Classical
text-mode FTP corruption."

Can someone say a few more words about what ftp may have done here?

FTP was invented when dinosaurs roamed the Internet. The dinosaurs growled,
roared, and screeched ASCII, which was presented to the users on hardcopy
terminals - typewriters, basically.

[If you're not old enough to have ever seen a typewriter, google up an image
of one now, or preferably a video of one in use.]

When text is written on a typewriter, there are 2 actions required at the end
of a line before starting the next line: move the carriage (that's the
printing thingy) to the left margin, and scroll the paper up.

[You may now ponder how the connection is formed between the ancient and
modern meanings of the word "scroll"]

ASCII was partially designed as a typewriter control language, and 2 of its
control codes are used to instruct the typewriter to perform its end-of-line
actions:

Character number 13 is "Carriage Return" (CR) - move the printing thingy to
the left.

Character number 10 is "Linefeed" (LF) - feed one more line of paper into the
machine.

The dinosaurs were a rowdy bunch, and they fought over how to properly store
text on non-typewriter-like media (like tapes and disks), but they were all
able to understand the typewriter-like ASCII-with-CRLF format. It was the
universal text representation. And it was incorporated into FTP as the
standard format for transmission of text files.

A bit later, UNIX was created. It was like a small mammal, hiding
underground, barely noticed and definitely not respected by the dinosaurs.
And the UNIX guys observed that text files on disk were becoming more
numerous than text printouts. And they said that soon typewriters as a user
interface would be obsolete, replaced by CRTs. and that eventually people
would send text to each other without even using paper.

In such a future it would make no sense to keep using that old pair of
typewriter commands to separate lines of text. It would be much better to
take "end this line, begin next line" as a single logical operation and allow
it to be represented by a single control code. They called this new control
code "NEWLINE". It was a revolutionary idea.

And the dinosaurs ignored it, and kept making up Internet protocols with
mandatory typewriter-style line endings.

When later generations of UNIX got big enough to climb out of their
underground tunnels and talk to the Internet, they had a problem: all of
their private text files and utilities were designed around the NEWLINE idea,
but all of the Internet protocols, including FTP, required CRLF. FTP clients
and servers on UNIX have to perform translation. When sending a file, they
transform it from NEWLINE format to CRLF format. When receiving it, they do
the opposite.

When sending a text file, UNIX FTP changes the file into the dinosaur
typewriter format by inserting a CR (13) before every NEWLINE (10). When
receiving a text, it changes each CRLF pair into NEWLINE. Note that it
doesn't have to actually look for CRLF pairs. It can just delete all the CRs,
since LF and NEWLINE have the same numerical value (10) and a CR that is not
part of a CRLF pair is not allowed.

When those transformations are applied to things that aren't actually text
files, the result can be a disaster, because bytes with value 13 can
disappear, or can be added before bytes with value 10. The transformation
isn't reversible when the input contains unpaired CR or LF bytes.

So your file got corrupted because you used ASCII mode FTP and you got caught
speaking ungrammatical dinosaur typewriter Esperanto.

Use binary mode. (Technically, in the language of FTP, "TYPE IMAGE", but most
clients call it "binary mode")

And yeah, you could have just googled "FTP binary" but I felt like writing a
story about dinosaurs anyway so here you go.
 
U

Uno

unsigned chars *are* what you should use to deal with binary data.

Your problem was that you were assigning the result of fgetc()
to an unsigned char object. Using a signed char or a plain char
would also have been wrong. fgetc() returns an int. Read your
documentation to see how that int value is generated.

I'm dropping comp.unix.shell from the cross-post.

Ok. So we have
int c;
unsigned char s;

My first guess on how to get this transferred faithfully would be

for (;;){
s = c;

My second guess is

for (;;){
s = (unsigned char)c;

, but that looks rather heavy-handed for something that wasn't broken
when we started.

I don't have an appropriate reference for fgetc and dinkumware has been
down.

What would I need to install to be able to man fgetc?
 
U

Uno


Thx, Morris.

If the end-of-file indicator for the input stream pointed to by stream
is not set and a next byte is present, the fgetc() function shall obtain
the next byte as an unsigned char converted to an int, from the input
stream pointed to by stream, and advance the associated file position
indicator for the stream (if defined). Since fgetc() operates on bytes,
reading a character consisting of multiple bytes (or "a multi-byte
character") may require multiple calls to fgetc().


What is the relationship between unsigned chars and ints? Both unsigned
and signed chars are subsets of ints, but why are unsigned chars the
representation of choice with binary data?
 
A

Angel

What is the relationship between unsigned chars and ints? Both unsigned
and signed chars are subsets of ints, but why are unsigned chars the
representation of choice with binary data?

Because with signed char, there may be trap representations (binary
numbers that are not a valid signed char) in arbitrary binary data.
 
K

Keith Thompson

Uno said:
What is the relationship between unsigned chars and ints? Both unsigned
and signed chars are subsets of ints, but why are unsigned chars the
representation of choice with binary data?

Because if you have a byte whose content is all-bits-one, it generally
makes more sense to think of it as 255 or 0xff than as -1.

If your binary data actually consists of signed bytes, then by all means
treat it as signed bytes, but that's fairly unusual.
 
S

Shao Miller

Because with signed char, there may be trap representations (binary
numbers that are not a valid signed char) in arbitrary binary data.

Is that consistent with 6.2.6.1p5? "Character type," below.

"Certain object representations need not represent a value of the
object type. If the stored value of an object has such a representation
and is read by an lvalue expression that does not have character type,
the behavior is undefined. If such a representation is produced by a
side effect that modifies all or any part of the object by an lvalue
expression that does not have character type, the behavior is undefined.
Such a representation is called a /trap representation/."
 
S

Shao Miller

What is the relationship between unsigned chars and ints? Both unsigned
and signed chars are subsets of ints, but why are unsigned chars the
representation of choice with binary data?

'unsigned char' has 'CHAR_BIT' bits. 'sizeof (unsigned char) == 1'.
'unsigned char' has no padding bits and no sign bit. Unsigned integer
types do not overflow from arithmetic operations. Because of these
guarantees, you can manipulate (unless 'const'-qualified) or inspect
_all_ the bits of _any_ object if you access the bytes of that object
using an lvalue with type 'unsigned char'.

On aspect of the relationship between 'unsigned char' and 'int' or
'unsigned int' is that the rank of 'int' and 'unsigned int' is greater
than the rank of 'unsigned char'. Another is that the range of values
for 'unsigned char' is a subset of the range of values for 'unsigned int'.
 
B

blmblm

[ snip ]
FTP was invented when dinosaurs roamed the Internet. The dinosaurs growled,
roared, and screeched ASCII, which was presented to the users on hardcopy
terminals - typewriters, basically.

That may be true of the dinosaurs that wanted to use this newfangled
file transfer protocol, but some of them (I'm thinking of the ones
made by IBM) mostly used another format, EBCDIC, for representing
character data. Just sayin' (after a rather long delay, for which
I apologize).
[If you're not old enough to have ever seen a typewriter, google up an image
of one now, or preferably a video of one in use.]

Good heavens -- are they really people who have never encountered a
typewriter at all? cue chorus of "I feel old", maybe ....

[ snip ]
The dinosaurs were a rowdy bunch, and they fought over how to properly store
text on non-typewriter-like media (like tapes and disks), but they were all
able to understand the typewriter-like ASCII-with-CRLF format. It was the
universal text representation. And it was incorporated into FTP as the
standard format for transmission of text files.

I just made a quick attempt to confirm my recollections about
whether the IBM 360/370 architecture included support for ASCII,
and -- it's kind of interesting that the Wikipedia articles say
that the 360 architecture included some features intended to
support ASCII (which had not been finalized), but that these were
not used, or not used widely, and the 370 architecture dropped at
least some of them. I wonder whether there was some consternation
when it became desirable to support FTP, it being ASCII-based.

(I'll leave the rest of this in because it's kind of charming.)
 
S

Stephen Sprunk

[If you're not old enough to have ever seen a typewriter, google up
an image of one now, or preferably a video of one in use.]

Good heavens -- are they really people who have never encountered a
typewriter at all? cue chorus of "I feel old", maybe ....

I'm in my early 30s, and the only _mechanical_ typewriter I've seen was
an antique originally owned by my grandfather, which my parents used in
college (long before I was born) and kept in storage for sentimental
reasons. I saw a few electronic typewriters in school, but they had all
disappeared by the time I graduated, and I can't recall having seen any
since then; kids just a few years younger than I likely wouldn't
remember them at all.

S
 
B

blmblm

S/360 included a mode bit that altered the behavior of a handful of
instructions that were sensitive to the character set. This was mostly
the decimal instructions which encoded the sign in a nibble of the
least significant byte. For example, in unpacked ("display") form,
the decimal digits were 0xf0-0xf9 in EBCDIC. Thus +123 could be
encoded as 0xf1, 0xf2, 0xf3. A -123 would be encoded 0xf1, xf2, 0xd3.
(Several other values were valid for the sign as well.) So obviously
the instructions that converted between unpacked to packed format were
sensitive to the character set.

In any event, it was a modest behavior change to a few instructions.

And ASCII wasn't really cooked by the time IBM started development of
the OS's, so they ended up sticking with EBCDIC.

That mode bit was reused to enable "Extended Control" mode on the
S/370, which instead changed to format of the PSW and a few other
things (like some of the fixed location in low-core), that was a
prerequisite to enabling virtual memory. So ASCII mode went entirely
away with the S/370.

In the last ~15 years the ISA has grown a number of instructions
designed to ease the processing of ASCII data. Unicode too, as well
as instructions for converting to/from little-endian, and things like
C-style strings.

Thanks for filling in some of the details! That all sounds pretty
consistent with what the Wikipedia articles say, for what that's
worth. Interesting stuff -- though possibly more so to those of who
actually remember those days.
 
B

blmblm

[If you're not old enough to have ever seen a typewriter, google up
an image of one now, or preferably a video of one in use.]

Good heavens -- are they really people who have never encountered a
typewriter at all? cue chorus of "I feel old", maybe ....

I'm in my early 30s, and the only _mechanical_ typewriter I've seen was
an antique originally owned by my grandfather, which my parents used in
college (long before I was born) and kept in storage for sentimental
reasons. I saw a few electronic typewriters in school, but they had all
disappeared by the time I graduated, and I can't recall having seen any
since then; kids just a few years younger than I likely wouldn't
remember them at all.

Probably typical .... At my CPOE the support staff say they still
want one typewriter, since it's the best tool for one or a few
form-filling-out jobs. I'd have thought maybe this would be true
at many offices, but maybe we're atypical!
 
I

Ian Collins

[If you're not old enough to have ever seen a typewriter, google up
an image of one now, or preferably a video of one in use.]

Good heavens -- are they really people who have never encountered a
typewriter at all? cue chorus of "I feel old", maybe ....

I'm in my early 30s, and the only _mechanical_ typewriter I've seen was
an antique originally owned by my grandfather, which my parents used in
college (long before I was born) and kept in storage for sentimental
reasons. I saw a few electronic typewriters in school, but they had all
disappeared by the time I graduated, and I can't recall having seen any
since then; kids just a few years younger than I likely wouldn't
remember them at all.

Probably typical .... At my CPOE the support staff say they still
want one typewriter, since it's the best tool for one or a few
form-filling-out jobs. I'd have thought maybe this would be true
at many offices, but maybe we're atypical!

These days people prefer to mess around for hours getting all the text
correctly lined up in their word processor. It's called progress :)
 
E

Eric Sosman

[...]
I'm in my early 30s, and the only _mechanical_ typewriter I've seen was
an antique originally owned by my grandfather, which my parents used in
college (long before I was born) and kept in storage for sentimental
reasons. I saw a few electronic typewriters in school, but they had all
disappeared by the time I graduated, and I can't recall having seen any
since then; kids just a few years younger than I likely wouldn't
remember them at all.

Kids a few years younger still wouldn't even notice the
misuse of the nominative case.
 
B

Ben Bacarisse

Azazel said:
[...]
I'm in my early 30s, and the only _mechanical_ typewriter I've seen
was an antique originally owned by my grandfather, which my parents
used in college (long before I was born) and kept in storage for
sentimental reasons. I saw a few electronic typewriters in school,
but they had all disappeared by the time I graduated, and I can't
recall having seen any since then; kids just a few years younger
than I likely wouldn't remember them at all.

Kids a few years younger still wouldn't even notice the
misuse of the nominative case.

OK, I'll bite. :) What misuse? "Kids just a few years younger than I
[am]" is correct.

<OT>
If the verb were there, then "I" would be unquestionably correct.
Without it, opinion is divided though not, I'd venture to say, 50/50.

The use of "I" forces "than" to be seen as a conjunction rather than a
preposition, and for some people that's fine -- the sentence is just an
elliptical form of the version with the verb ("am") present. For
others, it sounds stilted and old fashioned. I'd say that calling it
"misuse" is stretching the point but, which ever way you see it, I'd bet
that the younger the kids the _more_ likely they would be to spot is.
The trend is definitely away from anything that sounds so formal.

Without a following clause that needs a subject (i.e. when the verb "am"
is missing) the use of "me" forces "than" to be seen as a preposition.
This is nothing new (it's been happening for centuries) and I don't
think there can be any serious objection to it these days. In that
sense, "me" is better here and I'd bet that most people would prefer it
over "I".

It's interesting to note that this conjunction/preposition distinction
sometimes makes a lot of difference. In Henry VI, Part III the king
says:

Then why should they love Edward more than me?

Had he said "than I" it would have meant something else entirely.
</OT>
 
S

Stephen Sprunk

Azazel said:
On 6/21/2011 8:49 AM, Stephen Sprunk wrote:
kids just a few years younger than I likely wouldn't ...

Kids a few years younger still wouldn't even notice the
misuse of the nominative case.

OK, I'll bite. :) What misuse? "Kids just a few years younger than I
[am]" is correct.

<OT>
If the verb were there, then "I" would be unquestionably correct.
Without it, opinion is divided though not, I'd venture to say, 50/50.

The use of "I" forces "than" to be seen as a conjunction rather than a
preposition, and for some people that's fine -- the sentence is just an
elliptical form of the version with the verb ("am") present.

That is what was taught in my schools: the verb is implied and therefore
the nominative (aka subjective) case should be used.
For others, it sounds stilted and old fashioned. I'd say that calling
it "misuse" is stretching the point but, which ever way you see it,
I'd bet that the younger the kids the _more_ likely they would be to
spot is. The trend is definitely away from anything that sounds so
formal.

Using the accusative (aka objective) case is definitely more common, to
the consternation of English teachers across the land. I even use it
when the situation demands more informal language--but I do so with full
knowledge that it's "wrong". OTOH, most of my generation has no clue
how to speak or write formally when required.
Without a following clause that needs a subject (i.e. when the verb "am"
is missing) the use of "me" forces "than" to be seen as a preposition.
This is nothing new (it's been happening for centuries) and I don't
think there can be any serious objection to it these days. In that
sense, "me" is better here and I'd bet that most people would prefer it
over "I".

Cases in English have been declining (pun intended) for centuries, and
this last vestige will probably disappear in time as well.
It's interesting to note that this conjunction/preposition distinction
sometimes makes a lot of difference. In Henry VI, Part III the king
says:

Then why should they love Edward more than me?

Had he said "than I" it would have meant something else entirely.
</OT>

Yes; that's one of the few decent examples of why cases are important,
but since we've lost so much already, one could argue it's not worth
keeping them _only_ for pronouns. English's lack of consistency is one
of its biggest drawbacks.

S
 
B

blmblm

[ snip ]
These days people prefer to mess around for hours getting all the text
correctly lined up in their word processor. It's called progress :)

:) indeed -- but I *think* the jobs for which our support staff want
that typewriter involve actual paper forms for which a word processor
would not be a useful tool.
 
B

blmblm

(e-mail address removed) ha scritto:

[ snip ]
suppose you had to fill several dozens of forms, or worse several
hundreds (happened recently at my workplace).

they give you a bunch of paper forms, then you have to fill them, with
data that is stored digitally somewhere, or that you have to firstly
store for later usage (maybe some more forms are on the way).

see?

*Oh* .... All is clear now; thanks.

I *think* -- though I could be wrong -- that the typical use case
for our support staff involves, hm, I'm not sure what the term is,
but the physical form is not a single sheet of paper but a stack
of sheets, such that writing/typing on the top sheet results in
the writing/typing being copied onto the other sheets as well.
If one needed to fill out many of these, then .... It would
certainly be tempting to put a bit of work into doing something
that would allow you to get the information directly from some
digital source; it's just not clear to me how you could print onto
the multicopy forms, and if you couldn't, whether a substitute
would be acceptable.

In the long term -- and probably not so long, really -- the people
who are still using those multicopy forms will probably replace
them with something more compatible with the digital world. In the
short term, one does what one can, I guess.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,128
Latest member
ElwoodPhil
Top