Remove unwanted text with regular expressions?

J

Jimi Hullegård

Hi

What is the easiest way to clean a text from everything that is unwanted?
Lets say I want

[abc {options}]

where the starting "[", the "abc" and the ending "]" are required, and the
{options} are optional, and can be any combination (and any order) of the
following:

height=n (that is, any number)
width=n{%} (that is, either a regular number, or a percentage)
color=#nnnnnn | color-name (that is, either a hexadecimal color code,
or one of a few predefined color
names)

and also, all options can be surrounded by ". All other text inside the
these tags should be removed

examples:

[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
 
J

Jimi Hullegård

What is the easiest way to clean a text from everything that is unwanted?
Lets say I want

[abc {options}]

where the starting "[", the "abc" and the ending "]" are required, and the
{options} are optional, and can be any combination (and any order) of the
following:

height=n (that is, any number)
width=n{%} (that is, either a regular number, or a percentage)
color=#nnnnnn | color-name (that is, either a hexadecimal color code,
or one of a few predefined
color names)

and also, all options can be surrounded by ". All other text inside the
these tags should be removed

examples:

[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
[abc bla bla] should become [abc]
[abc bla bla] should become [abc]

I accidently pressed send before I was finished... sorry...

examples:

[abc bla bla] should become [abc]
[abc color=red bla bla] should become [abc color="red"]
[abc bla size=50 bla] should become [abc size="50"]
[abc bla="test" width="30%" blabla height=30] should become [abc
width="30%" height="30"]
[abc bla="test" color="#ffff00" width="30%" blabla height=30] should become
[abc color="#ffff00" width="30%" height="30"]

Can someone help me with how to do this with java and regular expressions?
Is it possible with a single String.replaceAll()-call?

Regards
/Jimi
 
A

Andrew Thompson

[abc bla bla] should become [abc]
[abc color=red bla bla] should become [abc color="red"]
[abc bla size=50 bla] should become [abc size="50"]
[abc bla="test" width="30%" blabla height=30] should become [abc
width="30%" height="30"]
[abc bla="test" color="#ffff00" width="30%" blabla height=30] should become
[abc color="#ffff00" width="30%" height="30"]

Can someone help me with how to do this with java and regular expressions?
Is it possible with a single String.replaceAll()-call?

<wild ass guess>
I do not feel this problem can be solved with either
simple string replacement or RegEx's. I suspect you
need a DOM parser for this kind of processing.
</wild ass guess>
 
R

Roedy Green

[abc bla bla] should become [abc]

see http://mindprod.com/jgloss/regex.html

Use regexes to find the next [...] chunk, but not to do any detailed
analysis of the contents.

Then use the matching features to extract the good stuff from the
candidate chunk.

The append everything before the [ to a StringBuilder,
then append the good stuff you extracted, suitably decorated, and
repeat.

The key is to make sure you understand the difference between finding
and matching.

When regexes cause your brain to explode, split the text up into
pieces and subanalyse them separately.

If you sent me your real email, I will send you some code that does
this sort of thing. Ask for "seesort".
 
J

Jimi Hullegård

Roedy Green said:
[abc bla bla] should become [abc]

see http://mindprod.com/jgloss/regex.html

Use regexes to find the next [...] chunk, but not to do any detailed
analysis of the contents.

Then use the matching features to extract the good stuff from the
candidate chunk.

The append everything before the [ to a StringBuilder,
then append the good stuff you extracted, suitably decorated, and
repeat.

The key is to make sure you understand the difference between finding
and matching.

When regexes cause your brain to explode, split the text up into
pieces and subanalyse them separately.

Thanks for your help! I now have solved the problem, the way that you
suggested. And it works great :)

/Jimi
 
R

Roedy Green

Thanks for your help! I now have solved the problem, the way that you
suggested. And it works great :)

Glad to hear it. It warms the cockles of my heart when I can help
someone with general information rather than by solving the particular
problem for them and spoon feeding it to them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top