[...]
one might express *very sad* or *weeping* by writing the string ":-(("
this smiley code however starts with ":-(", the smiley code for *sad*
or *unhappy*...
OK solution here would be to sort the order of smileys tested against
by their code string length, here first check ":-((", then ":-("...
But I need some general parsing approach. Regex really necessary?
Pattern? Looks very complicated to me... I doubt I can use a pattern
for codes that range from "

" over "}:->" over "

" to (CIG)...
While I'm not an expert in using regular expressions, I know for a fact
they can in fact address the scenarios you're describing. It's a fairly
powerful language in its own right. You may well need some help from an
actual expert, or be prepared to spend a lot of time (days, at least,
longer depending on your own skills and specific needs) learning it well
enough to meet your needs. But it can do it.
Inasmuch as the problem itself is complicated, so too will the solution
be. I'm not convinced there's any way around that.
Is regex "necessary"? No, not at all. But at first blush, your problem
seems to be exactly what regular expressions were designed to do: find
specific patterns in strings and, optionally, replace those patterns with
new patterns (text). In that sense, doesn't it make sense to explore that
as a possible solution?
Pete
Yap it's worth a try. Hmm ok, I started out analyzing the structure of
the smiley codes.
They consist of:
1 1 1 1 1-2 1 <- number of
characters
[hair] - eyes - [subeyes] - [nose] - mouth - [beard]
OK. subeyes and nose are OPTIONAL. It seems only mouth has more than 1
char, but max 2. So this gives 6 positions/properties and max. 7
chars.
Here are the possible strings applying to each position:
hair = {"o", "O", ">", "}", "]", ")"} <-- hair optional!
eyes = {":", ";", "8"}
subeyes = {"'", ","} <-- subeyes optional!
nose = {"-"} <-- nose optional!
mouth = {")", "(", "s", "S", "d", "D",
"p", "P", "c", "C", "o", "O",
"#", "@", "*", "$", "|",
"))", "(("}
beard = {"="} <-- beard optional!
It basically ought to ignore all UPPER and lower case for letters so
both are valid. As you can see there is almost every regex special
character involved so the resulting pattern will look awkward (at
least to me).
Other than that I might design the codes containing only exactly 1
char per position, if that would simplyfy things or make it possible
at all. It would not be a problem to introduce an optional [subnose]
.... 1 1 1 1
.... - [nose] - [subnose] - mouth - beard
So the minimum smiley code length is 2, max is 7.
A fictuous (length 7) example pattern recognized could be: "};'O))="
This would be a bearded ("=") winking (";") weeping ("'") devil ("}")
very happy ("))") with an (UPPERCASE) pigs nose ("O") (-> makes sense
right? ;-) ). OK, now this would qualify as a potential pattern match.
If that matched the parsed string, I would check the actual map
TreeMap<String,ImageIcon> of smiley images actually available. If an
icon was found, I knew it's time to replace the string with that
image. Sounds easy...
However I have no idea how to construct the regex for this. I probably
don't have that much time to learn from scratch, I believe "Perl's"
pattern language can do a lot of veery complicated things that might
not even narrowly touch what I need.
Maybe some "expert" here might be able to construct the pattern or at
least can direct me to the right paragraphs at
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
Can anyone assist me please?
Help very much appreciated!
Karsten