MoinMoin WikiName and python regexes

Discussion in 'Python' started by Ara.T.Howard, Jun 8, 2005.

  1. Ara.T.Howard

    Ara.T.Howard Guest

    hi-

    i know nada about python so please forgive me if this is way off base. i'm
    trying to fix a bug in MoinMoin whereby

    WordsWithTwoCapsInARowLike
    ^^
    ^^
    ^^

    do not become WikiNames. this is because the the wikiname pattern is
    basically

    /([A-Z][a-z]+){2,}/

    but should be (IMHO)

    /([A-Z]+[a-z]+){2,}/

    however, the way the patterns are constructed like

    word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s][%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
    'u': config.chars_upper,
    'l': config.chars_lower,
    'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
    'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
    }


    and i'm not that familiar with python syntax. to me this looks like a map
    used to bind variables into the regex - or is it binding into a string then
    compiling that string into a regex - regexs don't seem to be literal objects
    in pythong AFAIK... i'm thinking i need something like

    word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
    ^
    ^
    ^
    'u': config.chars_upper,
    'l': config.chars_lower,
    'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
    'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
    }

    and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
    obviously the u is the char range (unicode?)... but what's the 's'?

    i'm looking at

    http://docs.python.org/lib/re-syntax.html
    http://www.amk.ca/python/howto/regex/

    and coming up dry. sorry i don't have more time to rtfm - just want to
    implement this simple fix and get on to fcgi configuration! ;-)

    cheers.

    -a
    --
    ===============================================================================
    | email :: ara [dot] t [dot] howard [at] noaa [dot] gov
    | phone :: 303.497.6469
    | My religion is very simple. My religion is kindness.
    | --Tenzin Gyatso
    ===============================================================================
     
    Ara.T.Howard, Jun 8, 2005
    #1
    1. Advertising

  2. Ara.T.Howard

    Don Guest

    Ara.T.Howard wrote:

    >
    > hi-
    >
    > i know nada about python so please forgive me if this is way off base.
    > i'm trying to fix a bug in MoinMoin whereby
    >
    > WordsWithTwoCapsInARowLike
    > ^^
    > ^^
    > ^^
    >
    > do not become WikiNames. this is because the the wikiname pattern is
    > basically
    >

    [snip]

    PHPWiki has the same "feature", BTW. (Sorry, couldn't get MoinMoin to work
    on Sourceforge, had to use PHPWiki).

    -Don
     
    Don, Jun 8, 2005
    #2
    1. Advertising

  3. Ara.T.Howard

    deelan Guest

    Ara.T.Howard wrote:
    (...)
    > and i'm not that familiar with python syntax. to me this looks like a map
    > used to bind variables into the regex - or is it binding into a string then
    > compiling that string into a regex - regexs don't seem to be literal
    > objects
    > in pythong AFAIK... i'm thinking i need something like
    >
    > word_rule =
    > ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)'
    > % {
    > ^
    > ^
    > ^
    > 'u': config.chars_upper,
    > 'l': config.chars_lower,
    > 'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX +
    > '?') or '',
    > 'parent': config.allow_subpages and (ur'(?:%s)?' %
    > re.escape(PARENT_PREFIX)) or '',
    > }
    >
    > and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
    > obviously the u is the char range (unicode?)... but what's the 's'?


    an example may help here:

    >>> a = 123
    >>> '%04d' % a

    '0123'
    >>> '%f' % a

    '123.000000'
    >>> '%s' % a

    '123'

    that "s" tells python to convert the number as string. the form %(key)s
    tells python to lookup a dictionary "key" and format the found value
    into a string:

    >>> d = {'key': 123}
    >>> '%(key)s' % d

    '123'

    so in your code there's some keys named 'u', 'l', 'subpages', etc. and
    their values are substitued into that big RE, replacing the
    corresponding key names.

    HTH.

    --
    deelan <http://www.deelan.com/>
     
    deelan, Jun 8, 2005
    #3
  4. Ara.T.Howard

    Terry Reedy Guest

    "Ara.T.Howard" <> wrote in message
    news:p...
    > i'm trying to fix a bug in MoinMoin whereby


    A 'bug' is a discrepancy between promise (specification) and perfomance
    (implementation). Have you really found such -- does MoinMoin not follow
    the Wiki standard -- or are you just trying to customize MoinMoin to your
    different specification.

    > WordsWithTwoCapsInARowLike
    > ^^
    > do not become WikiNames.


    Would your proposed change to make the above into an Wiki name also make
    all-cap sequences like NATO, FTP, and API into WikiNames and do you really
    want that? If WikiNum, appearing one place, were also mistyped as WikeNUm
    (from holding down the shift key too long, which I do occasionally), should
    the latter become a separate WikiName? I can certainly understand why the
    Wike designers might have answered both questions 'No."

    Terry J. Reedy
     
    Terry Reedy, Jun 8, 2005
    #4
  5. Ara.T.Howard

    Ara.T.Howard Guest

    On Wed, 8 Jun 2005, Terry Reedy wrote:

    >
    > "Ara.T.Howard" <> wrote in message
    > news:p...
    >> i'm trying to fix a bug in MoinMoin whereby

    >
    > A 'bug' is a discrepancy between promise (specification) and perfomance
    > (implementation). Have you really found such -- does MoinMoin not follow
    > the Wiki standard -- or are you just trying to customize MoinMoin to your
    > different specification.


    well, according to the specification at

    http://moinmoin.wikiwikiweb.de/WikiName?highlight=(wikiname)

    ThisIsAWikiName

    there seems to be general agreement here

    http://wikka.jsnx.com/WikiName
    http://twiki.org/cgi-bin/view/TWiki/WikiWord

    though not a wikis agree.

    in moinmoin others have noted the inconsistency and filed a bug as noted in

    http://moinmoin.wikiwikiweb.de/MoinMoinBugs/AllCapsInWikiName?highlight=(wikiname)

    the problem being that the specification is simply vague here and does not
    specifically prohibit AWikiName.

    >
    >> WordsWithTwoCapsInARowLike
    >> ^^
    >> do not become WikiNames.

    >
    > Would your proposed change to make the above into an Wiki name also make
    > all-cap sequences like NATO, FTP, and API into WikiNames


    it wouldn't since

    NATO !~ /^([A-Z]+[a-z]+){2,}$/
    FTP !~ /^([A-Z]+[a-z]+){2,}$/
    API !~ /^([A-Z]+[a-z]+){2,}$/

    the pattern is

    word = one, or more, upper case letters followed by one, or more, lower case
    letters

    wikiword = at least two words together

    so

    FOobar is not a link

    but

    AFooBar is

    > If WikiNum, appearing one place, were also mistyped as WikeNUm (from holding
    > down the shift key too long, which I do occasionally), should the latter
    > become a separate WikiName? I can certainly understand why the Wike
    > designers might have answered both questions 'No."


    perhaps - it's just inconsistent the way it is now.

    cheers.


    -a
    --
    ===============================================================================
    | email :: ara [dot] t [dot] howard [at] noaa [dot] gov
    | phone :: 303.497.6469
    | My religion is very simple. My religion is kindness.
    | --Tenzin Gyatso
    ===============================================================================
     
    Ara.T.Howard, Jun 8, 2005
    #5
  6. Ara.T.Howard wrote:
    > i know nada about python so please forgive me if this is way off base. i'm
    > trying to fix a bug in MoinMoin whereby
    >
    > WordsWithTwoCapsInARowLike


    I don't think there is such a thing as the perfect "hyperlink vs
    just-text" convention. In MoinMoin, you can force a custom link using e.g.:

    [wiki:WebsiteSecurity this is the link text to WebsiteSecurity so call
    it whatever you want such as WebsiteSecurities]

    This custom linking, whilst obviously not ideal, solves the problems
    mentioned at http://www.c2.com/cgi/wiki?WikiName

    This seems better than producing endless confusing variations on the
    "standard" (be it formal, actual, or simply obviously desired).

    I'm not convinced of the usefulness of MoinMoin's "subpages" idea, while
    we're on the (related) subject - they seem to create more problems than
    they solve:
    http://moinmoin.wikiwikiweb.de/HelpOnEditing/SubPages
     
    Paul Bredbury, Jun 9, 2005
    #6
  7. On Wed, 8 Jun 2005 09:49:51 -0600, "Ara.T.Howard" <> wrote:

    >
    >hi-
    >
    >i know nada about python so please forgive me if this is way off base. i'm
    >trying to fix a bug in MoinMoin whereby
    >
    > WordsWithTwoCapsInARowLike
    > ^^
    > ^^
    > ^^
    >
    >do not become WikiNames. this is because the the wikiname pattern is
    >basically
    >
    > /([A-Z][a-z]+){2,}/
    >
    >but should be (IMHO)
    >
    > /([A-Z]+[a-z]+){2,}/

    That would take care of the example above, but does it change an official spec?

    >
    >however, the way the patterns are constructed like
    >
    > word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s][%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
    > 'u': config.chars_upper,
    > 'l': config.chars_lower,
    > 'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
    > 'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
    > }
    >
    >
    >and i'm not that familiar with python syntax. to me this looks like a map
    >used to bind variables into the regex - or is it binding into a string then
    >compiling that string into a regex - regexs don't seem to be literal objects
    >in pythong AFAIK... i'm thinking i need something like
    >
    > word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
    > ^
    > ^
    > ^
    > 'u': config.chars_upper,
    > 'l': config.chars_lower,
    > 'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
    > 'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
    > }
    >
    >and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
    >obviously the u is the char range (unicode?)... but what's the 's'?

    'u' doesn't stand for unicode here. It is the key to look up config.chars_upper from the dict. That could
    be unicode, and probably is. The 's' is the final part of a formatting spec which says how to convert the
    data looked up, and 's' is for string, which doesn't change string data (unless, and UIAM, a conversion to unicode is required).

    All of the above is making use of the % operator of strings, as in the expression
    fmt % data
    where fmt is a string containing ordinary characters and formatting specs in the form
    of substrings escaped by a leading character '%'. The formatting specs take two basic
    alternative forms: %<spec> or %(name)<spec>. If any '%' is followed by a parenthesized name,
    as in '%(u)s' it means that the data to be formatted is retrieved from data['u'] for the latter example.
    If there is no parenthesized name, the data is retrieved from data where data must be a tuple and
    i is the positional count of format specs in fmt. In some cases where there is no ambiguity,
    and there is only one datum, data[0] may be written as the non-tuple value expression, e.g.,
    instead of (123,) that data could be written as (123,)[0] or plain 123.

    In the word_rule above, %(u)s uses 'u' as a key to get data from the dictionary { 'u': config.chars_upper, ...}
    to substitute in the [%(u)s] as a string (that's what the 's' specifies), so config.chars_upper will
    presumably have had a string value such as u'ABC..Z' and that will then be inserted in place of the %(u)s to
    get u'...[ABC..Z]...' (if fmt is unicode, the resulting string will be unicode, UIAM)

    >
    >i'm looking at
    >
    > http://docs.python.org/lib/re-syntax.html
    > http://www.amk.ca/python/howto/regex/
    >

    See also
    http://www.python.org/doc/current/lib/typesseq-strings.html
    (which IMO should be easier to find, but if you click on the index square
    at the top right of any library reference page, you can see a "%formatting" link)

    >and coming up dry. sorry i don't have more time to rtfm - just want to
    >implement this simple fix and get on to fcgi configuration! ;-)
    >
    >cheers.
    >
    >-a
    >--
    >===============================================================================
    >| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
    >| phone :: 303.497.6469
    >| My religion is very simple. My religion is kindness.
    >| --Tenzin Gyatso
    >===============================================================================
    >


    Regards,
    Bengt Richter
     
    Bengt Richter, Jun 26, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. asdf sdf
    Replies:
    2
    Views:
    596
    asdf sdf
    May 19, 2004
  2. Paul Rubin

    OT: MoinMoin and Mediawiki?

    Paul Rubin, Jan 11, 2005, in forum: Python
    Replies:
    23
    Views:
    9,161
    Paul Rubin
    Jan 12, 2005
  3. Unicode and MoinMoin

    , Feb 27, 2006, in forum: Python
    Replies:
    2
    Views:
    399
    Fredrik Lundh
    Feb 27, 2006
  4. Replies:
    4
    Views:
    12,953
  5. eGenix Team: M.-A. Lemburg
    Replies:
    0
    Views:
    870
    eGenix Team: M.-A. Lemburg
    Mar 31, 2008
Loading...

Share This Page