Remove Characters from basic_string

M

Mike Copeland

I have data I need to normalize - it's "name" data. For example, I
have the following: "Watts, J.C." I wish to (1) parse the "first name"
("J.C.") and adjust it to "JC". Essentially, I want to remove the
punctuation characters from the "first name" substring.
I've looked at the basic_string in C++, but I can't find the
functions that will _remove" character(s) from a value. There are
operators that are helpful in finding the position of the character I
wish to remove, but I can't see how to actually do is. Pleas advise.
TIA
 
U

utab

I have data I need to normalize - it's "name" data. For example, I
have the following: "Watts, J.C." I wish to (1) parse the "first name"
("J.C.") and adjust it to "JC". Essentially, I want to remove the
punctuation characters from the "first name" substring.
I've looked at the basic_string in C++, but I can't find the
functions that will _remove" character(s) from a value. There are
operators that are helpful in finding the position of the character I
wish to remove, but I can't see how to actually do is. Pleas advise.
TIA

Not perfect but can give you a start ;)

#include <cstdlib>
#include <iostream>
#include <list>
#include <map>
#include <string>

using namespace std;

int main()
{
string str("Watts, J.C.");
string punctuation(".;:"); // can add more
string::size_type index_punctuation, index = str.find_last_of(",");
string name(str.substr(index+1,string::npos));
cout << name << endl;
while((index_punctuation=name.find_first_of(punctuation))
!=string::npos)
name.erase(index_punctuation,1);
cout << name << endl;
return 0;
}

Rgds,
 
J

James Kanze

Aren't you missing the last name here? Or the [list of] titles?

More importantly, in this respect: the input isn't in the
required format. It should be "Tse-tung, M.", and "Winsor,
C.P.A.G". (I think. The original poster didn't really define
it precisely. But I don't think he allows titles.)
Missing the last name again?

Historical persons could be a general problem, since many of
them don't have last names (Clovis, Charlemagne, Alexander the
Great).
"Wrong" in what way?

Probably in many different ways:).
I am not arguing with your point that any software should be
tested. I am just making the point that the data the OP had
may have been required to be in a particular form, like
Christ, J.
Windsor, H.R.H.C.P.A.G
Sicilias, J.C.A.V.M.d.B.y.B.D.
You know how they write the names of rock groups, "Beatles,
The". There is no need to overcomplicate the OP's task.

Actually, I think that Andy has raised several important points:

The input format really does need to be specified more
precisely. While his examples obviously don't conform to
it, there are a lot of cases where it's not really clear
what should be accepted. Before writing a single line of
code, it's important to define what it should do.

The test suite must contain all of the "special" cases.

And probably the most important: think large. I have no real
problem with my name, but I consistently run into problems with
software requiring a telephone number in the format 3-3-4, or a
state or province in the address (including on forms with
pull-down menus for the country which inlude France, Germany,
etc.).
Speaking of getting it wrong, how many folks in US actually
think that "Santa" is the first name of that Christmas gift
delivery dude, and "Claus" is his last name?...

In the United States, his first name is Santa, and his last name
Claus. Of course, if they still wrote it "Saint Nicolas", the
situation would be different.
 
M

Mike Copeland

Mike said:
Aren't you missing the last name here? Or the [list of] titles?

More importantly, in this respect: the input isn't in the
required format. It should be "Tse-tung, M.", and "Winsor,
C.P.A.G". (I think. The original poster didn't really define
it precisely. But I don't think he allows titles.)

This has been an interesting exchange, but most of it involves
pathological situations. However, I _do_ have to deal with a variety of
titles ("DR", "MD", "PHD", "III", etc.) - but there is no standard for
how they're presented. For example, I might encounter "SMITH MD, JOHN",
"SMITH, JOHN MD", "SMITH, DR. JOHN", etc.
Therefore, since I can't control the _input_ (which comes in via
simple event entry forms), I must (hopefully) make the best of a bad
situation. <sigh...>
Bottom line here, I'm looking to parse out the "JOHN" token, but the
normalization code is having to deal with _many_ exceptions. My intent
is to produce a list of "valid" first names and genders, so that the
data entry application can do some sort of validation editing on these
name. This is especially true because my application system is managing
entrants to endurance events (marathons, 10K races, etc.) - which must
distinguish between genders (and ages) in its scoring and other
processing. 8<{{
 
J

James Kanze

Mike Copeland wrote:
I have data I need to normalize - it's "name" data. For example, I
have the following: "Watts, J.C." I wish to (1) parse the "first
name" ("J.C.") and adjust it to "JC". Essentially, I want to remove
the punctuation characters from the "first name" substring.
I've looked at the basic_string in C++, but I can't find the
functions that will _remove" character(s) from a value. There are
operators that are helpful in finding the position of the character I
wish to remove, but I can't see how to actually do is. Pleas advise.
TIA
Others have supplied you with C++ code, but can I ask you to feed it
some test cases, all rather well known people:
Mao Tse-tung
H.R.H Charles Philip Arthur George
Aren't you missing the last name here? Or the [list of]
titles?
More importantly, in this respect: the input isn't in the
required format. It should be "Tse-tung, M.", and "Winsor,
C.P.A.G". (I think. The original poster didn't really define
it precisely. But I don't think he allows titles.)
This has been an interesting exchange, but most of it involves
pathological situations.

What's pathological about the fact that my address doesn't
contain a "state or province"? The examples are perhaps a bit
extreme, but the basic problem is a fundamental one.
However, I _do_ have to deal with a variety of titles ("DR",
"MD", "PHD", "III", etc.) - but there is no standard for how
they're presented. For example, I might encounter "SMITH MD,
JOHN", "SMITH, JOHN MD", "SMITH, DR. JOHN", etc.
Therefore, since I can't control the _input_ (which comes in
via simple event entry forms), I must (hopefully) make the
best of a bad situation. <sigh...>

The forms have to be read somehow into the machine. Or else
they are already on line. In both cases, the input can be
controlled (but maybe not by you).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top