Is this String class properly implemented?

R

Richard Herring

Tony said:
Richard said:
Tony said:
Richard Herring wrote:
In message <[email protected]>, Tony
Jerry Coffin wrote: [...]
The English alphabet has 26 characters. No more, no less.

Unfortunately statements like this weaken your point. By any
reasonable measure, the English alphabet contains at least 26
characters (upper and lower case).

Fine, upper and lower case then. But no umlauts or accent marks!

How naïve. My _English_ dictionary includes déjà vu, gâteau and many
other words with diacritics.

And how many variable names do you create with those foreign glyphs?
Hmm?

Who cares? I'm merely providing a counterexample to your sweeping
claim that the English alphabet has exactly 26 characters. Or even 52.

I meant letters, not characters.

That doesn't help you, since you need more than just those 26 or 52
letters to represent English words.
It should be obvious from the CONTEXT ("eye
on the ball" people!) that was what I meant.

It's irrelevant, since the real CONTEXT is not how many there are, but
whether you can write English with them.
Perhaps you are trying
opportunistically
?

to imply something different.

No, I'm not making a pedantic point about the difference between
"letter" and "character". Surely it should be obvious that I'm simply
(re-)stating the fact that ASCII's repertoire is insufficient to
represent even English.
 
R

Richard Herring

Tony said:
I used "Pure English" as that by which is
made up of only the 26 letters of the English alphabet.

"Pure English" is a language only spoken by True Scotsmen [tm].
 
T

Tony

Richard Herring said:
Tony said:
Richard said:
In message <[email protected]>, Tony
Richard Herring wrote:
In message <[email protected]>, Tony
Jerry Coffin wrote: [...]
The English alphabet has 26 characters. No more, no less.

Unfortunately statements like this weaken your point. By any
reasonable measure, the English alphabet contains at least 26
characters (upper and lower case).

Fine, upper and lower case then. But no umlauts or accent marks!

How naïve. My _English_ dictionary includes déjà vu, gâteau and many
other words with diacritics.

And how many variable names do you create with those foreign glyphs?
Hmm?

Who cares? I'm merely providing a counterexample to your sweeping
claim that the English alphabet has exactly 26 characters. Or even 52.

I meant letters, not characters.

That doesn't help you, since you need more than just those 26 or 52
letters to represent English words.

That's a strawman that conveniently avoids any context; The English alphabet
has exactly 26 letters.
It's irrelevant, since the real CONTEXT is not how many there are, but
whether you can write English with them.

No, you are wrong: the context is the context, no some contrived generality
you expect some dummy to believe.
No, I'm not making a pedantic point about the difference between "letter"
and "character".

Well you failed miserably then because you didn't say anything to that
effect: I did.
Surely it should be obvious that I'm simply (re-)stating the fact that
ASCII's repertoire is insufficient to represent even English.

ASCII is largely adequate: the English alphabet has 26 letters. I'm not
worried about the few unnaturalized foreign words that make it into
Webster's dictionary that have diacritics.
 
T

Tony

James said:
And I'm simply pointing out that that is false.

I don't believe you. Cuz I've been here off-and-on for years reading posts
and rarely do I find a non-ASCII character in a post (save for those
obsolete taglines).
Even in this
group, I sometimes have problems with postings, because the
installed fonts on my machines at work only support ISO 8859-1.

And I have to use OE-QuoteFix to respond to YOUR posts. But 26 letters are
still just 26 letters. (Of course 10 digits are understood also).
(At home, I use UTF-8, and everything works.) Which doesn't
have things like opening and closing quotes.

I agree: you foreignors are messing things up. ;)
But that's never the case for mine.

You mean your tagline? I think I may be noticing a trend toward being nice
and dropping those: even I hardly sign my posts anymnore (cuz it's stupid:
the newsreader will tell you who the post is from if you wanna know).
And I see quite a few
others as well where it's not the case. Even in English
language groups like this one.

You're talking about standard encoding designations and I'm simply talking
about the best language to program in and to program to.
I'm not sure what you mean by "it's not English".

It's not English because English has only 26 letters, without diacritics.
"Naïve" is a
perfectly good English word.

The naturalized word 'naive' has been accepted into the English language but
the way you encoded it is still a foreign word.
And English uses quotes and dashes
(which aren't available even in ISO 8859-1)

You mean like dash as a separate character from minus?
and other various
symbols like § not available in ASCII in its punctuation.

Symbols are not word elements. The code page concept is symbols.
Not
to mention that a lot of groups handle mathematical topics, and
mathematics uses a lot of special symbols.

Separate code pages.
And of course, not all groups use (only) English.

That of course is ignoring the context: it is a strawman argument (at best,
but surely it is just propoganda).
It has nothing to do with unnaturalized words (and I don't see
where "naïve" is unnaturalized). It has to do with recognizing
reality.

Reality is that 'naive' is a naturalized English word and your encoding is a
foreign word: it has everything to do with naturalization.
And what does the number of letters have to do with it?

Everything: I program in a spoken language and a programming language. I
chose my targets or at least know them: that is the context of the software
development.
French
also has only 26 letters.

That's misleading: French has diacritics, English does not.
You still put accents on some of
them, and you still use punctuation.

Strawman. You're trying to make a case for hieroglyphics as relevant. And to
me, if you want: I'm intuitive and like abstractions, but in a programming
paradigm, I don't want it wasting my life.
[...]
'naive' has been naturalized into the English language and
does not have/does not require (unless one feels romantic?) an
accent. You were taught French, not English.

Merriam-Webster disagrees with you.

Ah! I mentioned Webster long ago in this thread and discounted any
relavence: but you grasp onto that because that is all you have:
cutting-edge colloquialism as definition of the English language. And you're
wrong big-time for all perspectives including the most important one in this
NG: engineering practicality.
My point is that software should be usable.

I don't believe that that is your point at all: you have agenda, IMI (In My
Intuition).
[---]
Don't get into politics, cuz you suck at it. Life is too short
to get bogged down in Unicode just because a trivial few feel
that English should be bastardized with unnaturalized ideas
like 'naive' with a diacritic.

Or quotes. Or dashes.

Separate issue. Degree.
Or any number of other things.

Well why don't you list and number them (for progeny).
And that
"trivial few" includes the authors of all of the major
dictionaries I have access to.

Dictionaries are of course political things. Your dictionary defense is
quite bizarre. It's akin to offering hieroglyphics as an argument: lame.
If you don't know English well, that's your problem.

You mean if I don't want to accept bastardization/perversion it's my
problem.
[...]
Simplify your life: use English (for SW dev at least)!

If you've ever tried to understand English written by a
non-native speaker, you'll realize that it's much simpler to let
them use French (or German, when I worked there).

Exceptional case.
Communication
is an important part of software engineering, and communication
is vastly improved if people can use their native language.

Strawman/propoganda.
 
R

Richard Herring

Tony said:
Richard Herring said:
Tony said:
Richard Herring wrote:
In message <[email protected]>, Tony
Richard Herring wrote:
In message <[email protected]>, Tony
Jerry Coffin wrote: [...]
The English alphabet has 26 characters. No more, no less.

Unfortunately statements like this weaken your point. By any
reasonable measure, the English alphabet contains at least 26
characters (upper and lower case).

Fine, upper and lower case then. But no umlauts or accent marks!

How naïve. My _English_ dictionary includes déjà vu, gâteau and many
other words with diacritics.

And how many variable names do you create with those foreign glyphs?
Hmm?

Who cares? I'm merely providing a counterexample to your sweeping
claim that the English alphabet has exactly 26 characters. Or even 52.

I meant letters, not characters.

That doesn't help you, since you need more than just those 26 or 52
letters to represent English words.

That's a strawman

I think you need to check the definition of "strawman".
that conveniently avoids any context; The English alphabet
has exactly 26 letters.

(And the Welsh alphabet has 28, despite lacking J, K, Q, V, X, Z). So
what? 26 letters alone are not sufficient for writing English.
No, you are wrong: the context is the context, no some contrived generality
you expect some dummy to believe.

Nor is it what you want to redefine it to be, as any "dummy" can
discover by simply reading the thread. "7-bit ASCII is your friend".
Well you failed miserably then because you didn't say anything to that
effect: I did.

Eh? I failed miserably to say something I wasn't trying to say?
ASCII is largely adequate:

Largely. Thank you for that concession.
the English alphabet has 26 letters.

So you keep telling us.
I'm not
worried about the few unnaturalized foreign words that make it into
Webster's dictionary that have diacritics.
Fine; that's your choice. And if the customers for your software are
equally not worried that it can't cope with such words, that's even more
fine.

But _you_ don't get to define what's "unnaturalized", "foreign" or
"Pure English".

http://en.wikipedia.org/wiki/No_true_Scotsman
 
J

James Kanze

I don't believe you.

It's easy enough to verify. I often have problems with postings
because they contain characters which aren't present in ISO
8859-1 (which are the only encodings for which fonts are
installed on my machines at work).

[...]
I agree: you foreignors are messing things up. ;)

Opening and closing quotes are part of English. At least, part
of the English used by people who've gotten beyond kindergarden.
You mean your tagline?

I don't have a "tagline". In fact, I don't know what you mean
by a "tagline". My .sig uses accented characters, because it
contains my address. I'll also occasionally use characters
outside of the 96 basic characters in the body of my postings:
things like a section reference (§) when quoting the standard,
for example, or a non-breaking space.

If I had UTF-8 everywhere, I'd also quote correctly.

[...]
It's not English because English has only 26 letters, without
diacritics.

So the Merriam Webster Dictionary is not English (since it
contains diacritics on some words, and uses opening and closing
quotes, and a lot of other characters other than the 26
letters).
The naturalized word 'naive' has been accepted into the
English language but the way you encoded it is still a foreign
word.

Not according to Merriam Webster. But of course, you know more
about English than the standard dictionaries.
You mean like dash as a separate character from minus?

A minus sign, a hyphen, an n-dash and an m-dash are four
separate characters. Because I don't have the dashes in ISO
8859-1, I simulate them with -- and ---, but it's really a hack.
Symbols are not word elements. The code page concept is
symbols.

Nor are blanks. Are you saying that the encoding shouldn't
support blanks either?
Separate code pages.

What the hell is a "code page"?
Reality is that 'naive' is a naturalized English word and your
encoding is a foreign word:

Not according to any of the dictionaries I've consulted. All
give "naïve" as a perfectly correct, native American English
spelling.
Everything: I program in a spoken language and a programming
language. I chose my targets or at least know them: that is
the context of the software development.

The context of software development is that each programming
language defines a set of characters it accepts. Fortran used
the least, I believe---it was designed so that you could get six
6 bit characters in a word. C and C++ require close to a
million.
That's misleading: French has diacritics, English does not.

Your talk about letters is what is misleading. I'm just
pointing out that it's irrelevant.
[...]
'naive' has been naturalized into the English language and
does not have/does not require (unless one feels romantic?)
an accent. You were taught French, not English.
Merriam-Webster disagrees with you.
Ah! I mentioned Webster long ago in this thread and discounted
any relavence:

Merriam-Webster is irrelevant to what is correct American
English use?
[---]
If you don't know English well, that's your problem.
You mean if I don't want to accept bastardization/perversion
it's my problem.

I mean that if you don't want to accept generally accepted,
standard usage, it's your problem. A serious one, at that,
symptomatic of a serious social maladjustment.
[...]
I have to, because my comments where I work now have to be in
French, and French without accents is incomprehensible. The
need is less frequent in English, but it does occur.
Simplify your life: use English (for SW dev at least)!
If you've ever tried to understand English written by a
non-native speaker, you'll realize that it's much simpler to let
them use French (or German, when I worked there).
Exceptional case.

Native English speakers represent less than 5% of the world's
population, which means that being a native English speaker is
the exceptional case.
 
R

Richard Herring

In message
Not according to Merriam Webster. But of course, you know more
about English than the standard dictionaries.

And FWIW the standard British-English dictionaries agree with M-W on
this.
 
A

Alf P. Steinbach

* James Kanze:
So the Merriam Webster Dictionary is not English (since it
contains diacritics on some words, and uses opening and closing
quotes, and a lot of other characters other than the 26
letters).

I always found it a bit amusing that the English alphabet officially has only A
through Z, but that the language contains words like "mæneuver". :) And no, not
making that up. I last encountered that last week, reading Jack London's "White
Fang", I think it was (if it wasn't the other dog book).


Cheers,

- Alf
 
R

Richard Herring

Alf P. Steinbach said:
* James Kanze:

I always found it a bit amusing that the English alphabet officially
has only A through Z, but that the language contains words like
"mæneuver".

ITYM "manœuvre". HTH.
 
A

Alf P. Steinbach

* Richard Herring:
ITYM "manœuvre". HTH.

Thanks, possibly. But as I recall the speling in Jack London's novel started
with "mæ". Someone borrowed that book though, and I'm too lazy to check out
Oxford's or Merriam Webster (as I recall it's not in all dictionaries).

Cheers,

- Alf
 
R

Richard Herring

Alf P. Steinbach said:
* Richard Herring:

Thanks, possibly. But as I recall the speling in Jack London's novel
started with "mæ". Someone borrowed that book though, and I'm too lazy
to check out Oxford's or Merriam Webster (as I recall it's not in all
dictionaries).

I'd be surprised if it's in _any_.

"White Fang" at http://www.gutenberg.org/files/910/910.txt and "The
Call of the Wild" at http://www.gutenberg.org/files/215/215.txt each
contain one instance of "manoeuvre" and one "manoeuvred". Neither has
any words beginning with "mae".
 
R

Richard Herring

Alf P. Steinbach said:
* Richard Herring:

My edition is a paperback, I think Penguin. It has "æ".

You think. But by your own admission you lent it out and can't check...

Google's counter, for what that's worth, estimates ~400 "results" for
"maeneuver", many obviously from the same source. Compare that with ~4
million for "manœuvre" (in both cases, it doesn't seem to care whether
you type the ligature or separate letters).

I'd say that was entirely compatible with people remembering that the
word has a ligature, but not remembering which pairs of letters should
be joined.
 
A

Alf P. Steinbach

* Richard Herring:
You think. But by your own admission you lent it out and can't check...

What's the point of an insinuation like that?

I have not expressed any doubt about whether the book uses the 'æ' spelling, and
indicating otherwise is just dishonest (i.e., you are, above): I was not making
a touchy-feely think-that-perhaps-it-was-like-that /argument/, as you insinuate;
I was just reporting a *fact*.

I think the book edition I have is published by Penguin.

That printed book uses 'æ', while the online text you've found apparently
doesn't, presumably because it's ASCII text (note: ASCII doesn't have 'æ').

The word 'mæneuver', with 'æ', modulo speling, is in at least one main English
dictionary.

Google's counter, for what that's worth, estimates ~400 "results" for
"maeneuver", many obviously from the same source. Compare that with ~4
million for "manœuvre" (in both cases, it doesn't seem to care whether
you type the ligature or separate letters).

There's also probably a difference between British English and US English.

I'd say that was entirely compatible with people remembering that the
word has a ligature, but not remembering which pairs of letters should
be joined.

It's my impression that the old (original?) spelling used 'æ', but anyways, I
can't recall ever seeingn the word spelled with 'Å“'.


Cheers,

- Alf
 
A

Alf P. Steinbach

* Alf P. Steinbach:
* Richard Herring:

What's the point of an insinuation like that?

Wait a minute, sorry.

I was in wrong frame of mind because I very recently yet again had a certain guy
attempting to stick needles in my back so to speak. I don't understand that he
never learns but he doesn't, and I get sort of upset by having to punch him down
again and again. And then for a while, after such debacle, I feel very
suspicious about anything that might look like needles being waved behind me...

It may be that you're right, that as a Norwegian (we have æøå but no œ) I've
consistently misread an œ as a Norwegian æ.

Could be! :)

But anyways, the point was that the official alphabet of English, A through Z,
isn't sufficient to express all valid spellings of all English words...


Cheers & hth.,

- Alf
 
J

James Kanze

* Richard Herring:
Thanks, possibly. But as I recall the speling in Jack London's
novel started with "mæ". Someone borrowed that book though,
and I'm too lazy to check out Oxford's or Merriam Webster (as
I recall it's not in all dictionaries).

The word maneuver should be in any American English dictionary.
The American Heritage Dictionary also lists manoeuvre as a
"chiefly British variant". The word in French from which the
English derives is "manœuvre"; in this case, this spelling is
not (I believe) acceptable in the US, but it wouldn't surprise
me if it were acceptable, or even the preferred spelling, in
Great Britain.

If you saw mæneuver, it was a typo. (Such things do occur---I
remember one case where the typesetter went to extreme pains to
put a cedilla on the c in the French city of Mâcon. Which, of
course, has no cedilla on the c, but does have an accent
circumflex on the a, which the typesetter missed.)

But it's still the Encyclopædia Britannica (which dispite the
name, is published in Chicago).

All of which begs the question: what is a letter? (TeX provides
character encodings for various other ligatures, like fl or fi,
for example.) The French "standards" also speaks of 26 letters,
but not only has accents, but an obligatory œ (in many everyday
words, like cœur), which is in opposition to "oe" (in other
words, like coefficient). Both German and French collate
accented characters as if they were unaccented (up to a certain
point, at least), but Swedish (and maybe Norwegian) consider
them completely different letters, appended to the end of the
alphabet (and Spanish treats ll as if it were a different letter
than l, collating it after la-lz).
 
R

Richard Herring

Alf P. Steinbach said:
* Alf P. Steinbach:

Wait a minute, sorry.

OK. No insinuation intended, just clarifying the difference between
recollection and hard fact.
I was in wrong frame of mind because I very recently yet again had a
certain guy attempting to stick needles in my back so to speak. I don't
understand that he never learns but he doesn't, and I get sort of upset
by having to punch him down again and again. And then for a while,
after such debacle, I feel very suspicious about anything that might
look like needles being waved behind me...

It may be that you're right, that as a Norwegian (we have æøå but no
œ) I've consistently misread an œ as a Norwegian æ.

Could be! :)

But anyways, the point was that the official alphabet of English, A
through Z, isn't sufficient to express all valid spellings of all
English words...

Indeed. Another example is the use in some English words of a diaeresis
to mark a syllable break, e.g. "coöperate" instead of "co-operate".
 
R

Richard Herring

In message
All of which begs the question: what is a letter? (TeX provides
character encodings for various other ligatures, like fl or fi,
for example.) The French "standards" also speaks of 26 letters,
but not only has accents, but an obligatory Å“ (in many everyday
words, like cœur), which is in opposition to "oe" (in other
words, like coefficient). Both German and French collate
accented characters as if they were unaccented (up to a certain
point, at least), but Swedish (and maybe Norwegian) consider
them completely different letters, appended to the end of the
alphabet (and Spanish treats ll as if it were a different letter
than l, collating it after la-lz).
And any Welshman will tell you that his alphabet has 28 letters, despite
omitting J, K, Q, V, X, Z, because he collates ch, dd,ff, ll, ng, rh and
th as separate letters. And Hungarian has 40, which include gy, ly, ty
but not y iself (or 44 if you include the QWXY needed for foreign
words)...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,218
Latest member
JolieDenha

Latest Threads

Top