MoinMoin WikiName and python regexes

A

Ara.T.Howard

hi-

i know nada about python so please forgive me if this is way off base. i'm
trying to fix a bug in MoinMoin whereby

WordsWithTwoCapsInARowLike
^^
^^
^^

do not become WikiNames. this is because the the wikiname pattern is
basically

/([A-Z][a-z]+){2,}/

but should be (IMHO)

/([A-Z]+[a-z]+){2,}/

however, the way the patterns are constructed like

word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s][%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
'u': config.chars_upper,
'l': config.chars_lower,
'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
}


and i'm not that familiar with python syntax. to me this looks like a map
used to bind variables into the regex - or is it binding into a string then
compiling that string into a regex - regexs don't seem to be literal objects
in pythong AFAIK... i'm thinking i need something like

word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
^
^
^
'u': config.chars_upper,
'l': config.chars_lower,
'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
}

and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
obviously the u is the char range (unicode?)... but what's the 's'?

i'm looking at

http://docs.python.org/lib/re-syntax.html
http://www.amk.ca/python/howto/regex/

and coming up dry. sorry i don't have more time to rtfm - just want to
implement this simple fix and get on to fcgi configuration! ;-)

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
D

Don

Ara.T.Howard said:
hi-

i know nada about python so please forgive me if this is way off base.
i'm trying to fix a bug in MoinMoin whereby

WordsWithTwoCapsInARowLike
^^
^^
^^

do not become WikiNames. this is because the the wikiname pattern is
basically
[snip]

PHPWiki has the same "feature", BTW. (Sorry, couldn't get MoinMoin to work
on Sourceforge, had to use PHPWiki).

-Don
 
D

deelan

Ara.T.Howard wrote:
(...)
and i'm not that familiar with python syntax. to me this looks like a map
used to bind variables into the regex - or is it binding into a string then
compiling that string into a regex - regexs don't seem to be literal
objects
in pythong AFAIK... i'm thinking i need something like

word_rule =
ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)'
% {
^
^
^
'u': config.chars_upper,
'l': config.chars_lower,
'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX +
'?') or '',
'parent': config.allow_subpages and (ur'(?:%s)?' %
re.escape(PARENT_PREFIX)) or '',
}

and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
obviously the u is the char range (unicode?)... but what's the 's'?

an example may help here:
'123'

that "s" tells python to convert the number as string. the form %(key)s
tells python to lookup a dictionary "key" and format the found value
into a string:
'123'

so in your code there's some keys named 'u', 'l', 'subpages', etc. and
their values are substitued into that big RE, replacing the
corresponding key names.

HTH.
 
T

Terry Reedy

Ara.T.Howard said:
i'm trying to fix a bug in MoinMoin whereby

A 'bug' is a discrepancy between promise (specification) and perfomance
(implementation). Have you really found such -- does MoinMoin not follow
the Wiki standard -- or are you just trying to customize MoinMoin to your
different specification.
WordsWithTwoCapsInARowLike
^^
do not become WikiNames.

Would your proposed change to make the above into an Wiki name also make
all-cap sequences like NATO, FTP, and API into WikiNames and do you really
want that? If WikiNum, appearing one place, were also mistyped as WikeNUm
(from holding down the shift key too long, which I do occasionally), should
the latter become a separate WikiName? I can certainly understand why the
Wike designers might have answered both questions 'No."

Terry J. Reedy
 
A

Ara.T.Howard

A 'bug' is a discrepancy between promise (specification) and perfomance
(implementation). Have you really found such -- does MoinMoin not follow
the Wiki standard -- or are you just trying to customize MoinMoin to your
different specification.

well, according to the specification at

http://moinmoin.wikiwikiweb.de/WikiName?highlight=(wikiname)

ThisIsAWikiName

there seems to be general agreement here

http://wikka.jsnx.com/WikiName
http://twiki.org/cgi-bin/view/TWiki/WikiWord

though not a wikis agree.

in moinmoin others have noted the inconsistency and filed a bug as noted in

http://moinmoin.wikiwikiweb.de/MoinMoinBugs/AllCapsInWikiName?highlight=(wikiname)

the problem being that the specification is simply vague here and does not
specifically prohibit AWikiName.
Would your proposed change to make the above into an Wiki name also make
all-cap sequences like NATO, FTP, and API into WikiNames

it wouldn't since

NATO !~ /^([A-Z]+[a-z]+){2,}$/
FTP !~ /^([A-Z]+[a-z]+){2,}$/
API !~ /^([A-Z]+[a-z]+){2,}$/

the pattern is

word = one, or more, upper case letters followed by one, or more, lower case
letters

wikiword = at least two words together

so

FOobar is not a link

but

AFooBar is
If WikiNum, appearing one place, were also mistyped as WikeNUm (from holding
down the shift key too long, which I do occasionally), should the latter
become a separate WikiName? I can certainly understand why the Wike
designers might have answered both questions 'No."

perhaps - it's just inconsistent the way it is now.

cheers.


-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
P

Paul Bredbury

Ara.T.Howard said:
i know nada about python so please forgive me if this is way off base. i'm
trying to fix a bug in MoinMoin whereby

WordsWithTwoCapsInARowLike

I don't think there is such a thing as the perfect "hyperlink vs
just-text" convention. In MoinMoin, you can force a custom link using e.g.:

[wiki:WebsiteSecurity this is the link text to WebsiteSecurity so call
it whatever you want such as WebsiteSecurities]

This custom linking, whilst obviously not ideal, solves the problems
mentioned at http://www.c2.com/cgi/wiki?WikiName

This seems better than producing endless confusing variations on the
"standard" (be it formal, actual, or simply obviously desired).

I'm not convinced of the usefulness of MoinMoin's "subpages" idea, while
we're on the (related) subject - they seem to create more problems than
they solve:
http://moinmoin.wikiwikiweb.de/HelpOnEditing/SubPages
 
B

Bengt Richter

hi-

i know nada about python so please forgive me if this is way off base. i'm
trying to fix a bug in MoinMoin whereby

WordsWithTwoCapsInARowLike
^^
^^
^^

do not become WikiNames. this is because the the wikiname pattern is
basically

/([A-Z][a-z]+){2,}/

but should be (IMHO)

/([A-Z]+[a-z]+){2,}/
That would take care of the example above, but does it change an official spec?
however, the way the patterns are constructed like

word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s][%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
'u': config.chars_upper,
'l': config.chars_lower,
'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
}


and i'm not that familiar with python syntax. to me this looks like a map
used to bind variables into the regex - or is it binding into a string then
compiling that string into a regex - regexs don't seem to be literal objects
in pythong AFAIK... i'm thinking i need something like

word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
^
^
^
'u': config.chars_upper,
'l': config.chars_lower,
'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
}

and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
obviously the u is the char range (unicode?)... but what's the 's'?
'u' doesn't stand for unicode here. It is the key to look up config.chars_upper from the dict. That could
be unicode, and probably is. The 's' is the final part of a formatting spec which says how to convert the
data looked up, and 's' is for string, which doesn't change string data (unless, and UIAM, a conversion to unicode is required).

All of the above is making use of the % operator of strings, as in the expression
fmt % data
where fmt is a string containing ordinary characters and formatting specs in the form
of substrings escaped by a leading character '%'. The formatting specs take two basic
alternative forms: %<spec> or %(name)<spec>. If any '%' is followed by a parenthesized name,
as in '%(u)s' it means that the data to be formatted is retrieved from data['u'] for the latter example.
If there is no parenthesized name, the data is retrieved from data where data must be a tuple and
i is the positional count of format specs in fmt. In some cases where there is no ambiguity,
and there is only one datum, data[0] may be written as the non-tuple value expression, e.g.,
instead of (123,) that data could be written as (123,)[0] or plain 123.

In the word_rule above, %(u)s uses 'u' as a key to get data from the dictionary { 'u': config.chars_upper, ...}
to substitute in the [%(u)s] as a string (that's what the 's' specifies), so config.chars_upper will
presumably have had a string value such as u'ABC..Z' and that will then be inserted in place of the %(u)s to
get u'...[ABC..Z]...' (if fmt is unicode, the resulting string will be unicode, UIAM)
See also
http://www.python.org/doc/current/lib/typesseq-strings.html
(which IMO should be easier to find, but if you click on the index square
at the top right of any library reference page, you can see a "%formatting" link)
and coming up dry. sorry i don't have more time to rtfm - just want to
implement this simple fix and get on to fcgi configuration! ;-)

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top