Extracting a substring

C

cpp_weenie

Given a std::string of the form "default(N)" where N is an integer number of
any length (e.g. the literal string might be "default(243)"), what is the
quickest way to extract the characters representing the integer into another
std::string? In the example above, I'd want to end up with a std:string
whose value is "243".

The substrings "default(" and ")" are invariant - they're always present in
the string I have to work with. Furthermore, it is guaranteed that there
will be at least one numeric character between the open and close
parentheses, and it is also guaranteed that there will be nothing but
numeric characters between the open and close parentheses.

I'm looking for the shortest possible standard-compliant and safe way of
making this substring extraction. (Yes, "shortest" and "quickest" as I've
used them here are subjective measures, but I'm sure people will intuitively
get the idea of what I'm trying to accomplish.)
 
C

cpp_weenie

cpp_weenie said:
Given a std::string of the form "default(N)" where N is an integer number of
any length (e.g. the literal string might be "default(243)"), what is the
quickest way to extract the characters representing the integer into another
std::string? In the example above, I'd want to end up with a std:string
whose value is "243".

The substrings "default(" and ")" are invariant - they're always present in
the string I have to work with. Furthermore, it is guaranteed that there
will be at least one numeric character between the open and close
parentheses, and it is also guaranteed that there will be nothing but
numeric characters between the open and close parentheses.

I'm looking for the shortest possible standard-compliant and safe way of
making this substring extraction. (Yes, "shortest" and "quickest" as I've
used them here are subjective measures, but I'm sure people will intuitively
get the idea of what I'm trying to accomplish.)

One thing missing from the original problem statement is that there can be
any number of characters after the closing parentheis. These characters can
be anything (including digits). Sorry.
 
C

cpp_weenie

cpp_weenie said:
number

One thing missing from the original problem statement is that there can be
any number of characters after the closing parentheis. These characters can
be anything (including digits). Sorry.

Well, to try and get the discussion started, here's the best I've come up
with:

Assume the original string is in str.

string new_str(str.begin()+8, str.begin()+str.find(')'));

Can anybody suggest an improvement on this?
 
M

Mike Wahler

cpp_weenie said:
number

One thing missing from the original problem statement is that there can be
any number of characters after the closing parentheis. These characters can
be anything (including digits). Sorry.


// between():
//
// Returns string found between first occurence
// of 'ldelim' and first occurence of 'rdelim'
//
// If either delimiter not found, returns empty string
std::string between(const std::string& s,
char ldelim, char rdelim)
{
static std::string::const_iterator b(s.begin());
static std::string::const_iterator e(s.end());
static std::string::const_iterator lp;
static std::string::const_iterator rp;

std::string result;

if((lp = std::find(b, e, ldelim)) != e)
if((rp = std::find(++lp, e, rdelim)) != e)
result = std::string(lp, rp);

return result;

}

int main()
{
const char LD('(');
const char RD(')');

std::string s("default(123)abc");
std::string b(between(s, LD, RD));
std::cout << b << '\n';
return 0;
}

Output:

123


Not thoroughly tested.

I'm not clear whether the presence the of string
"default" is required for validity of the found '('
character.

E.g. should

"abc(123)xyz" cause return of "123" or not?

The code above does.

HTH,
-Mike
 
C

cpp_weenie

// between():
//
// Returns string found between first occurence
// of 'ldelim' and first occurence of 'rdelim'
//
// If either delimiter not found, returns empty string
std::string between(const std::string& s,
char ldelim, char rdelim)
{
static std::string::const_iterator b(s.begin());
static std::string::const_iterator e(s.end());
static std::string::const_iterator lp;
static std::string::const_iterator rp;

std::string result;

if((lp = std::find(b, e, ldelim)) != e)
if((rp = std::find(++lp, e, rdelim)) != e)
result = std::string(lp, rp);

return result;

}

int main()
{
const char LD('(');
const char RD(')');

std::string s("default(123)abc");
std::string b(between(s, LD, RD));
std::cout << b << '\n';
return 0;
}

Output:

123


Not thoroughly tested.

I'm not clear whether the presence the of string
"default" is required for validity of the found '('
character.

E.g. should

"abc(123)xyz" cause return of "123" or not?

The code above does.

Thanks Mike, a good general solution! To answer your question, the leading
prefix is indeed always precisely "default(".
 
M

Mike Wahler

cpp_weenie said:
Thanks Mike, a good general solution! To answer your question, the leading
prefix is indeed always precisely "default(".

Well, what I was asking is if there could be any strings
(e.g. due to invalid input) with an integer between parentheses
like that, that are *not* preceded by "default", and if so,
should they be rejected, or return the integer?

If you feel you've got what you need, never mind, I'm
just trying to point out that many times we don't take
all possible 'bad' inputs into account. :)

-Mike
 
J

Jerry Coffin

Given a std::string of the form "default(N)" where N is an integer number of
any length (e.g. the literal string might be "default(243)"), what is the
quickest way to extract the characters representing the integer into another
std::string? In the example above, I'd want to end up with a std:string
whose value is "243".

The easiest way would probably be using more or less anacrhonistic code:

char output[256]
sscanf(input.c_str(), "default(%255[0123456789])", output);
std::string real_output(output);

I don't see any way to get anything that neat and clean using code that
most people would think of as being "real" C++, but I guess something
like this should work:

#include <string>
#include <iostream>

using std::string;

string get_default(string input, string open, string close) {
std::string::size_type start = input.find(open)+open.length();
std::string::size_type end = input.find(close, start);

return std::string(input, start, end-start);
}

used something like this:

std::string test_input("(bad output) default(1234567)a2w");

std::string default = get_default(test_input, "default(", ")");

If you're sure "default(" and ")" are truly constant and you'll never
want to use other delimiters, you could eliminate them as parameters:

std::string get_default(std::string input) {
std::string open("default(");
std::string close(")");

std::string::size_type start = input.find(open)+open.length();
std::string::size_type end = input.find(close, start);

return std::string(input, start, end-start);
}

in which case you'd use it like:

std::string default = get_default(test_input);
The substrings "default(" and ")" are invariant - they're always present in
the string I have to work with. Furthermore, it is guaranteed that there
will be at least one numeric character between the open and close
parentheses, and it is also guaranteed that there will be nothing but
numeric characters between the open and close parentheses.

I'm looking for the shortest possible standard-compliant and safe way of
making this substring extraction. (Yes, "shortest" and "quickest" as I've
used them here are subjective measures, but I'm sure people will intuitively
get the idea of what I'm trying to accomplish.)

You could certainly get shorter than these, but I doubt you could do so
to any substantial degree (then again, they're already short enough that
there really IS not substantial degree of "shorter").

Speed depends: if you expect to scan across a LOT of text looking for
"default(", (e.g. there might be 100K of garbage before "default(" )
then there are various advanced string searching algorithms that would
probably help performance (theoretically, std::string::find could
already use them, but I've yet to see that done). I doubt that applies
here though, so I won't try to go into more detail, at least for now.
 
P

Peestrus

cpp_weenie said:
Well, to try and get the discussion started, here's the best I've come up
with:

Assume the original string is in str.

string new_str(str.begin()+8, str.begin()+str.find(')'));

Can anybody suggest an improvement on this?

Yes:

string x ("default(123)xyz");
string digits (x, 8, x.find (')', 9) - 8);

This saves 9 comparisons in the find() compared to your method :)

Also I have a feeling that iterators and c_str() are potentially less
efficient than a direct constructor call (if say, the underlying
string buffer is not contiguous).

Davlet.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top