regex_replace()

F

Friedel Jantzen

Hi!
Using MSVS 2008, STL TR1 <regex>.

Is there a way to get to know the number of replacements done by
regex_replace(), or at least, whether there was a replacement at all?
Ok, I can compare if(output != input) after regex_replace(), but this is
wasted performance, if there is a better way.

If a regex_error is thrown, how can I get the position of the error in the
regular expression string?

I found that icase (ignore case) only works with A-Za-z, but e.g. not with
German Umlaute (Ää etc.), though I set my German user locale using
regex::imbue(). Am I missing something?

TIA,
Friedel
 
M

Michael Doubez

Hi!
Using MSVS 2008, STL TR1 <regex>.

Is there a way to get to know the number of replacements done by
regex_replace(), or at least, whether there was a replacement at all?
Ok, I can compare if(output != input) after regex_replace(), but this is
wasted performance, if there is a better way.

You can roll your own: regex_replace basically instantiate a
regex_iterator from the parameters an performs the replace. It
shouldn't be too hard.
If a regex_error is thrown, how can I get the position of the error in the
regular expression string?

AFAIS you cannot; and POSIX regcomp doesn't give more information
either.
You will need a regex format validator.
I found that icase (ignore case) only works with A-Za-z, but e.g. not with
German Umlaute (Ää etc.), though I set my German user locale using
regex::imbue(). Am I missing something?

This may not be implemented or handled correctly by the compiler.
 
J

Juha Nieminen

Michael Doubez said:
You will need a regex format validator.

You can use a regexp to validate a regexp string. There would be a
marvelous conceptual recursion there... :)
 
M

Michael Doubez

  You can use a regexp to validate a regexp string. There would be a
marvelous conceptual recursion there... :)

I hesitated to make the joke but IMO regex grammar is not powerful
enough to validate regex expression.

I actually tried to find one available on the net in C or C++ but,
strangely, none seem readily available (I didn't look too hard, just
googled a bit).
 
A

Alain Ketterlin

Michael Doubez said:
I hesitated to make the joke but IMO regex grammar is not powerful
enough to validate regex expression.

You're right, regular expression languages are not regular.

-- Alain.
 
F

Friedel Jantzen

You can use a regexp to validate a regexp string. There would be a
marvelous conceptual recursion there... :)

:)
A homespun validator could possibly not return the same position where the
engine has detected the error.

Friedel
 
F

Friedel Jantzen

Thank you for your reply!
....
You can roll your own: regex_replace basically instantiate a
regex_iterator from the parameters an performs the replace. It
shouldn't be too hard.

I wrote test code to do this, but as STL regex is new for me, I thought I
could have missed something and reinvent the wheel.
....
AFAIS you cannot; and POSIX regcomp doesn't give more information
either.
You will need a regex format validator.
:)


This may not be implemented or handled correctly by the compiler.

Yes, it looks somehow "premature" to me.

Thank you,
Friedel
 
R

Ralf Goertz

Michael Doubez wrote:

You could try toupper/tolower with your local and see if works on the
umlaut (and the eszett :) ).

I was about to tell you that there is no uppercase "ß". But then I
noticed the smiley which made me think that you knew. So I will refrain
from telling you.
 
L

Lasse Reichstein Nielsen

Juha Nieminen said:
You can use a regexp to validate a regexp string. There would be a
marvelous conceptual recursion there... :)

Nope, at least not by itself. The language of regexps not itself regular.
I don't know the exact details of TR1 regexps, but I doubt they can
check for matched parentheses.

/L
 
F

Friedel Jantzen

Am Wed, 11 May 2011 02:09:34 -0700 (PDT) schrieb Michael Doubez:
....
You could try toupper/tolower with your local and see if works on the
umlaut (and the eszett :) ).

Thank you for this hint.

cout << "User locale: " << locale("").name() << endl;//German_Germany.1252
setlocale(LC_ALL, "");

toupper() result:
toupper('ö', locale("")) == 'Ö'
(but toupper('ö') != 'Ö')

Replacing:
regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
string rxStr = "ö";
string replStr = "oe";
string input("Schönes Österreich");
regex rx;
rx.imbue(locale(""));
rx.assign(rxStr, rxFlags);
string output = regex_replace(input, rx, replStr);
// output == "Schoenes Österreich" --> capital Ö NOT replaced

I wonder if it works on e.g. a French system with sth. like é and É ?

Regards,
Friedel
 
R

Ralf Goertz

Friedel said:
Am Wed, 11 May 2011 02:09:34 -0700 (PDT) schrieb Michael Doubez:
...

Thank you for this hint.

cout << "User locale: " << locale("").name() << endl;//German_Germany.1252
setlocale(LC_ALL, "");

toupper() result:
toupper('ö', locale("")) == 'Ö'
(but toupper('ö') != 'Ö')

Replacing:
regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
string rxStr = "ö";
string replStr = "oe";
string input("Schönes Österreich");
regex rx;
rx.imbue(locale(""));
rx.assign(rxStr, rxFlags);
string output = regex_replace(input, rx, replStr);
// output == "Schoenes Österreich" --> capital Ö NOT replaced

I wonder if it works on e.g. a French system with sth. like é and É ?

If you use wstrings it should work (except for the toupper without
locale specification). Here I used boost under linux:


#include <iostream>
#include <string>
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

int main() {
ios::sync_with_stdio(false);
cout << "User locale: " << locale("").name() << endl;
setlocale(LC_ALL, "");
wcout.imbue(locale(""));

wcout<<L"toupper('ö', locale("")) == 'Ö': "<<boolalpha<<(toupper(L'ö',
locale(""))==L'Ö')<<endl;
wcout<<L"toupper('ö')==Ö: "<<boolalpha<<(toupper(L'ö')==L'Ö')<<endl;
regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
wstring rxStr = L"ö";
wstring replStr = L"oe";
wstring input(L"Schönes Österreich");
wregex rx;
rx.imbue(locale(""));
rx.assign(rxStr, rxFlags);
wstring output = regex_replace(input, rx, replStr);
wcout<<input<<L" -> "<<output<<endl;
}

output:

User locale: de_DE.UTF-8
toupper('ö', locale()) == 'Ö': true
toupper('ö')==Ö: false
Schönes Österreich -> Schoenes oesterreich
 
M

Michael Doubez

Am Wed, 11 May 2011 02:09:34 -0700 (PDT) schrieb Michael Doubez:
...


Thank you for this hint.

cout << "User locale: " << locale("").name() << endl;//German_Germany.1252
setlocale(LC_ALL, "");

toupper() result:
toupper('ö', locale("")) == 'Ö'
(but toupper('ö') != 'Ö')

Replacing:
regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
string rxStr = "ö";
string replStr = "oe";
string input("Schönes Österreich");
regex rx;
rx.imbue(locale(""));
rx.assign(rxStr, rxFlags);
string output = regex_replace(input, rx, replStr);
// output == "Schoenes Österreich" --> capital Ö NOT replaced

I wonder if it works on e.g. a French system with sth. like é and É ?

It works well enough on gcc version 4.3.3:

std::locale loc("");
std::cout<<"User locale: " << loc.name() << std::endl;
char const str[] = "àäâéèêëïîöôüû";
std::cout<<str<<std::endl;
for( char const * it = str; *it ; ++it )
{
std::cout<<toupper(*it, loc);
}
std::cout<<std::endl;

Output:
User locale: fr_FR
àäâéèêëïîöôüû
ÀÄÂÉÈÊËÏÎÖÔÜÛ

Deutsch locale is not installed on my system and I couldn't try it.
 
M

Michael Doubez

It works well enough on gcc version 4.3.3:
[snip]

Oups, you were talking about regex. Well, I don't have a recent
compiler on this machine (and no admin right) so I cannot test it
right now.
 
F

Friedel Jantzen

Thank you for testing.

Am Thu, 12 May 2011 10:12:20 +0200 schrieb Ralf Goertz:
...
If you use wstrings it should work (except for the toupper without
locale specification). Here I used boost under linux:
...

output:

User locale: de_DE.UTF-8
toupper('ö', locale()) == 'Ö': true
toupper('ö')==Ö: false
Schönes Österreich -> Schoenes oesterreich

Compiled with MS VS2008, on Windows Vista, the output is:

User locale: German_Germany.1252
toupper('ö', locale("")) == 'Ö': true
toupper('ö')==Ö: true
Schönes Österreich -> Schoenes Österreich

It looks like with this regex implementation (afaik MS lizensed it from
Dinkumware) icase does not work with wstring, too.
Interesting is, that toupper('ö')==Ö: true

Regards,
Friedel
 
J

Jorgen Grahn

AFAIS you cannot; and POSIX regcomp doesn't give more information
either.
You will need a regex format validator.

POSIX gives you *something* using regerror(3); I assume it's more than
"your regexp is broken" but less than "the problem is the backslash in
position 42".

Regexps are best used hard-coded anyway, rather than generated on the
fly or (worse) generated from user input. So this is usually not a
major problem.

/Jorgen
 
J

James Kanze

Am Wed, 11 May 2011 02:09:34 -0700 (PDT) schrieb Michael Doubez:
...
Thank you for this hint.
cout << "User locale: " << locale("").name() << endl;//German_Germany.1252
setlocale(LC_ALL, "");
toupper() result:
toupper('ö', locale("")) == 'Ö'
(but toupper('ö') != 'Ö')
Replacing:
regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
string rxStr = "ö";
string replStr = "oe";
string input("Schönes Österreich");
regex rx;
rx.imbue(locale(""));
rx.assign(rxStr, rxFlags);
string output = regex_replace(input, rx, replStr);
// output == "Schoenes Österreich" --> capital Ö NOT replaced

This one's tricky. It's why Unicode introduced title case: if
you ever really wanted to do this, what you'd what to get would
be: "SChoenes Oesterreich". Not sure what that might mean in
the context of regular expressions, however; you'd probably want
a flag stating whether substitution should use a) the case of
the original, b) title case if the original was upper case, or
c) context sensitive title case.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top