M
mxyzplk
OK, so every way I've thought of doing this is really ugly. I'm using
Perl 5.8.4 and only have access to the stock libraries, mostly.
What I need to do is parse through a text file and perform some
transformations on embedded link structures for a wiki content
conversion. A "link" is defined as anything wrapped in double
brackets - [[<string>]], which can appear anywhere in a line of text
and multiple links can appear in a line of text.
1) If the link has a colon (":") in it, I need to strip out all
special characters and spaces (everything except [a-zA_Z0-9]) from the
portion before the colon but leave the part after the colon intact.
Examples:
[[Operation Intranet 2.0!:EvalHome|Eval Home]] -->
[[OperationIntranet20:EvalHome|Eval Home]]
[[UP Platform:Home|UP Platform]] --> [[UPPlatform:Home|UP Platform]]
2) If the link does not have a ":" in it, I need to insert the string
General: before the name of the page.
Examples:
[[Technical FAQs|Technical FAQs]] --> [[General:Technical FAQs|
Technical FAQs]]
[[Embedded - Top 5 content|Top 5 content]] [[General:Embedded - Top 5
content|Top 5 content]]
3) Special case - don't change if it is an image link or if it is an
external link (only single [] enclosure).
Examples:
[[Image:BIhouse.jpg]] --> [[Image:BIhouse.jpg]]
[http://spss.wikicities.com/wiki/SPSS_Wiki SPSS Wiki] --> [http://
spss.wikicities.com/wiki/SPSS_Wiki SPSS Wiki]
I expect this is similar to some HTML parsing requirements, but I've
been hunting through my O'Reilly Perl books and Googling and I'm
having trouble finding my way. Normal regexp replace appears not to
be the way to go and I'm having greediness issues. Ideas?
Thanks,
Ernest
Perl 5.8.4 and only have access to the stock libraries, mostly.
What I need to do is parse through a text file and perform some
transformations on embedded link structures for a wiki content
conversion. A "link" is defined as anything wrapped in double
brackets - [[<string>]], which can appear anywhere in a line of text
and multiple links can appear in a line of text.
1) If the link has a colon (":") in it, I need to strip out all
special characters and spaces (everything except [a-zA_Z0-9]) from the
portion before the colon but leave the part after the colon intact.
Examples:
[[Operation Intranet 2.0!:EvalHome|Eval Home]] -->
[[OperationIntranet20:EvalHome|Eval Home]]
[[UP Platform:Home|UP Platform]] --> [[UPPlatform:Home|UP Platform]]
2) If the link does not have a ":" in it, I need to insert the string
General: before the name of the page.
Examples:
[[Technical FAQs|Technical FAQs]] --> [[General:Technical FAQs|
Technical FAQs]]
[[Embedded - Top 5 content|Top 5 content]] [[General:Embedded - Top 5
content|Top 5 content]]
3) Special case - don't change if it is an image link or if it is an
external link (only single [] enclosure).
Examples:
[[Image:BIhouse.jpg]] --> [[Image:BIhouse.jpg]]
[http://spss.wikicities.com/wiki/SPSS_Wiki SPSS Wiki] --> [http://
spss.wikicities.com/wiki/SPSS_Wiki SPSS Wiki]
I expect this is similar to some HTML parsing requirements, but I've
been hunting through my O'Reilly Perl books and Googling and I'm
having trouble finding my way. Normal regexp replace appears not to
be the way to go and I'm having greediness issues. Ideas?
Thanks,
Ernest