Regular Expression Help please!

Discussion in 'ASP General' started by Giles, Nov 1, 2009.

  1. Giles

    Giles Guest

    My (VB/ASP) site parses pseudocode created by authors. For example, the
    author's HTML might contain
    [start small padded box] This content displays in a box [end small padded
    box]
    The bits in square brackets are then replaced with appropriate HTML to
    create a border around the text.

    PageHTML=replace(PageHTML,"[start small padded box]","<div
    style='width:200px; padding:4px; border:1px solid #000'>")
    PageHTML=replace(PageHTML,"[end small padded box]","</div>")

    The problem is, the spaces might (or might not) be &nbsp; due to the
    authoring interface
    [start small&nbsp;padded box] This content displays in a box [end&nbsp;small
    padded&nbsp;box]

    Is there a regular expression that can turn &nbsp; (if they exist) into
    spaces? (Prior to applying the pseudocode conversion)

    function deNBSP(s,html)
    ?
    ?
    end function
    PageHTML=deNBSP("[start small padded box]", PageHTML)
    PageHTML=deNBSP("[end small padded box]", PageHTML)

    The pseudocode phrases can be quite long, and have a lot of spaces, I was
    hoping a RegExp would be quicker than looping through, using replace() for
    every permutation of space - &nbsp; in a phrase.

    Thanks if you can help, or advise a different strategy.
     
    Giles, Nov 1, 2009
    #1
    1. Advertising

  2. Giles

    Bob Barrows Guest

    Giles wrote:

    > The problem is, the spaces might (or might not) be &nbsp; due to the
    > authoring interface
    > [start small&nbsp;padded box] This content displays in a box
    > [end&nbsp;small padded&nbsp;box]
    >
    > Is there a regular expression that can turn &nbsp; (if they exist)
    > into spaces? (Prior to applying the pseudocode conversion)
    >

    ..

    A simple call to Replace should do this - no need for regex.
    s = Replace(s,"&nbsp;", " ")

    --
    Microsoft MVP - ASP/ASP.NET - 2004-2007
    Please reply to the newsgroup. This email account is my spam trap so I
    don't check it very often. If you must reply off-line, then remove the
    "NO SPAM"
     
    Bob Barrows, Nov 1, 2009
    #2
    1. Advertising

  3. Giles

    Giles Guest

    > Giles wrote:
    >
    >> The problem is, the spaces might (or might not) be &nbsp; due to the
    >> authoring interface
    >> [start small&nbsp;padded box] This content displays in a box
    >> [end&nbsp;small padded&nbsp;box]
    >>
    >> Is there a regular expression that can turn &nbsp; (if they exist)
    >> into spaces? (Prior to applying the pseudocode conversion)

    > .
    > Bob Barrows wrote:
    >
    > A simple call to Replace should do this - no need for regex.
    > s = Replace(s,"&nbsp;", " ")


    Thanks Bob, but that would change all the nbsp's in the PageHTML, not just
    the ones in the pseudocode phrases. The page may contain other necessary
    nbsp's. It's the pseudocode phrase that varies:
    I am trying to find a way around doing -
    PageHTML=replace(PageHTML,"[start&nbsp;small padded box]","[start small
    padded box]")
    PageHTML=replace(PageHTML,"[start small&nbsp;padded box]","[start small
    padded box]")
    PageHTML=replace(PageHTML,"[start small padded&nbsp;box]","[start small
    padded box]")
    PageHTML=replace(PageHTML,"[start&nbsp;small&nbsp;padded box]","[start small
    padded box]")
    PageHTML=replace(PageHTML,"[start small&nbsp;padded&nbsp;box]","[start small
    padded box]")
    PageHTML=replace(PageHTML,"[start&nbsp;small&nbsp;padded&nbsp;box]","[start
    small padded box]")
     
    Giles, Nov 1, 2009
    #3
  4. Giles

    Bob Barrows Guest

    Giles wrote:
    >> Giles wrote:
    >>
    >>> The problem is, the spaces might (or might not) be &nbsp; due to the
    >>> authoring interface
    >>> [start small&nbsp;padded box] This content displays in a box
    >>> [end&nbsp;small padded&nbsp;box]
    >>>
    >>> Is there a regular expression that can turn &nbsp; (if they exist)
    >>> into spaces? (Prior to applying the pseudocode conversion)

    >> .
    >> Bob Barrows wrote:
    >>
    >> A simple call to Replace should do this - no need for regex.
    >> s = Replace(s,"&nbsp;", " ")

    >
    > Thanks Bob, but that would change all the nbsp's in the PageHTML, not
    > just the ones in the pseudocode phrases. The page may contain other
    > necessary nbsp's. It's the pseudocode phrase that varies:
    > I am trying to find a way around doing -
    > PageHTML=replace(PageHTML,"[start&nbsp;small padded box]","[start
    > small padded box]")
    > PageHTML=replace(PageHTML,"[start small&nbsp;padded box]","[start
    > small padded box]")
    > PageHTML=replace(PageHTML,"[start small padded&nbsp;box]","[start
    > small padded box]")
    > PageHTML=replace(PageHTML,"[start&nbsp;small&nbsp;padded
    > box]","[start small padded box]")
    > PageHTML=replace(PageHTML,"[start
    > small&nbsp;padded&nbsp;box]","[start small padded box]")
    > PageHTML=replace(PageHTML,"[start&nbsp;small&nbsp;padded&nbsp;box]","[start
    > small padded box]")


    Then you will need regex. Unfortunately, I'm not fluent in regular
    expressions so all I can do is suggest you go to the documentation.
    Hopefully someone else will jump in and help you out.

    --
    Microsoft MVP - ASP/ASP.NET - 2004-2007
    Please reply to the newsgroup. This email account is my spam trap so I
    don't check it very often. If you must reply off-line, then remove the
    "NO SPAM"
     
    Bob Barrows, Nov 1, 2009
    #4
  5. Giles

    Evertjan. Guest

    Bob Barrows wrote on 01 nov 2009 in
    microsoft.public.inetserver.asp.general:

    > Giles wrote:
    >>> Giles wrote:
    >>>
    >>>> The problem is, the spaces might (or might not) be &nbsp; due to
    >>>> the authoring interface
    >>>> [start small&nbsp;padded box] This content displays in a box
    >>>> [end&nbsp;small padded&nbsp;box]
    >>>>
    >>>> Is there a regular expression that can turn &nbsp; (if they exist)
    >>>> into spaces? (Prior to applying the pseudocode conversion)
    >>> .
    >>> Bob Barrows wrote:
    >>>
    >>> A simple call to Replace should do this - no need for regex.
    >>> s = Replace(s,"&nbsp;", " ")

    >>
    >> Thanks Bob, but that would change all the nbsp's in the PageHTML, not
    >> just the ones in the pseudocode phrases. The page may contain other
    >> necessary nbsp's. It's the pseudocode phrase that varies:
    >> I am trying to find a way around doing -
    >> PageHTML=replace(PageHTML,"[start&nbsp;small padded box]","[start
    >> small padded box]")
    >> PageHTML=replace(PageHTML,"[start small&nbsp;padded box]","[start
    >> small padded box]")
    >> PageHTML=replace(PageHTML,"[start small padded&nbsp;box]","[start
    >> small padded box]")
    >> PageHTML=replace(PageHTML,"[start&nbsp;small&nbsp;padded
    >> box]","[start small padded box]")
    >> PageHTML=replace(PageHTML,"[start
    >> small&nbsp;padded&nbsp;box]","[start small padded box]")
    >> PageHTML=replace(PageHTML,"[start&nbsp;small&nbsp;padded&nbsp;box]","[
    >> start small padded box]")

    >
    > Then you will need regex. Unfortunately, I'm not fluent in regular
    > expressions so all I can do is suggest you go to the documentation.
    > Hopefully someone else will jump in and help you out.


    Perhaps I can help you out with Regex,
    but I do not know what you mean by "pseudocode phrases".

    Let us just define a string called PageHTML [i am not interested in the
    final purpose], I suppose pars of that string with well defined start and
    ends need to be purged of a certain substring.

    Please define the start and end of such substrings.


    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)
     
    Evertjan., Nov 1, 2009
    #5
  6. Giles

    Giles Guest

    "Evertjan." <> wrote in message
    news:Xns9CB6D34ED8CC4eejj99@194.109.133.242...
    > Bob Barrows wrote on 01 nov 2009 in
    > microsoft.public.inetserver.asp.general:
    >
    >> Giles wrote:
    >>>> Giles wrote:
    >>>>
    >>>>> The problem is, the spaces might (or might not) be &nbsp; due to
    >>>>> the authoring interface
    >>>>> [start small&nbsp;padded box] This content displays in a box
    >>>>> [end&nbsp;small padded&nbsp;box]
    >>>>>
    >>>>> Is there a regular expression that can turn &nbsp; (if they exist)
    >>>>> into spaces? (Prior to applying the pseudocode conversion)
    >>>> .
    >>>> Bob Barrows wrote:
    >>>>
    >>>> A simple call to Replace should do this - no need for regex.
    >>>> s = Replace(s,"&nbsp;", " ")
    >>>
    >>> Thanks Bob, but that would change all the nbsp's in the PageHTML, not
    >>> just the ones in the pseudocode phrases. The page may contain other
    >>> necessary nbsp's. It's the pseudocode phrase that varies:
    >>> I am trying to find a way around doing -
    >>> PageHTML=replace(PageHTML,"[start&nbsp;small padded box]","[start
    >>> small padded box]")
    >>> PageHTML=replace(PageHTML,"[start small&nbsp;padded box]","[start
    >>> small padded box]")
    >>> PageHTML=replace(PageHTML,"[start small padded&nbsp;box]","[start
    >>> small padded box]")
    >>> PageHTML=replace(PageHTML,"[start&nbsp;small&nbsp;padded
    >>> box]","[start small padded box]")
    >>> PageHTML=replace(PageHTML,"[start
    >>> small&nbsp;padded&nbsp;box]","[start small padded box]")
    >>> PageHTML=replace(PageHTML,"[start&nbsp;small&nbsp;padded&nbsp;box]","[
    >>> start small padded box]")

    >>
    >> Then you will need regex. Unfortunately, I'm not fluent in regular
    >> expressions so all I can do is suggest you go to the documentation.
    >> Hopefully someone else will jump in and help you out.

    >
    > Perhaps I can help you out with Regex,
    > but I do not know what you mean by "pseudocode phrases".
    >
    > Let us just define a string called PageHTML [i am not interested in the
    > final purpose], I suppose pars of that string with well defined start and
    > ends need to be purged of a certain substring.
    >
    > Please define the start and end of such substrings.
    > --
    > Evertjan.


    Thank you Evertjan
    Each pseudocode phrase is a sub-string within PageHTML that starts with
    Open-Square-Bracket [, and ends with Close-Square-Bracket, ].
    It can contain any number of words, separated by spaces.
    Some of the "spaces" might be &nbsp;
    It needs to be purged of &nbsp; each occurrence being replaced by a space.

    e.g.
    [word1 word2 word3 word4] - is OK
    [word1&nbsp;word2] - needs converting to [word1 word2]

    Examples are
    [podcast lecture.mp3]

    [movie /flv/demo.flv width=400 height=300]

    <b>Quiz</b><br />
    [mcq start]
    Questions here...
    [mcq end]
     
    Giles, Nov 1, 2009
    #6
  7. Giles

    Evertjan. Guest

    Giles wrote on 01 nov 2009 in microsoft.public.inetserver.asp.general:

    > Thank you Evertjan
    > Each pseudocode phrase is a sub-string within PageHTML that starts
    > with Open-Square-Bracket [, and ends with Close-Square-Bracket, ].
    > It can contain any number of words, separated by spaces.
    > Some of the "spaces" might be &nbsp;
    > It needs to be purged of &nbsp; each occurrence being replaced by a
    > space.


    Could be done like this,
    I use a Javascript function for simplicity:

    ==============================================
    <% 'vbs

    PageHTML = "[word1 word2 word3 word4]z&nbsp;z" &_
    "[word5&nbsp;word6]z&nbsp;z" &_
    "[word7&nbsp;word8&nbsp;word9]"

    PageHTML = replaceNbsp(PageHTML)

    Response.write PageHTML

    %>

    <script language='javascript' runat='server'>
    function replaceNbsp(s) {
    return s.replace(/(\[.*?\])/g,function(a)
    {return a.replace(/&nbsp;/g,' ');});
    };
    </script>
    ==============================================

    You will need view-source to see that the &nbsp;
    outside the [...] are not touched.

    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)
     
    Evertjan., Nov 1, 2009
    #7
  8. Giles

    Giles Guest

    > Giles wrote on 01 nov 2009 in microsoft.public.inetserver.asp.general:
    >
    >> Thank you Evertjan
    >> Each pseudocode phrase is a sub-string within PageHTML that starts
    >> with Open-Square-Bracket [, and ends with Close-Square-Bracket, ].
    >> It can contain any number of words, separated by spaces.
    >> Some of the "spaces" might be &nbsp;
    >> It needs to be purged of &nbsp; each occurrence being replaced by a
    >> space.

    >
    > Could be done like this,
    > I use a Javascript function for simplicity:
    >
    > ==============================================
    > <% 'vbs
    >
    > PageHTML = "[word1 word2 word3 word4]z&nbsp;z" &_
    > "[word5&nbsp;word6]z&nbsp;z" &_
    > "[word7&nbsp;word8&nbsp;word9]"
    >
    > PageHTML = replaceNbsp(PageHTML)
    >
    > Response.write PageHTML
    >
    > %>
    >
    > <script language='javascript' runat='server'>
    > function replaceNbsp(s) {
    > return s.replace(/(\[.*?\])/g,function(a)
    > {return a.replace(/&nbsp;/g,' ');});
    > };
    > </script>
    > ==============================================
    >
    > You will need view-source to see that the &nbsp;
    > outside the [...] are not touched.
    >
    > --
    > Evertjan.


    Perfect. Your help is very much appreciated, thank you Evertjan
     
    Giles, Nov 2, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,330
  2. dotnetprogram

    Regular Expression help please...

    dotnetprogram, Dec 17, 2003, in forum: ASP .Net
    Replies:
    6
    Views:
    2,071
    Rajesh.V
    Dec 17, 2003
  3. Replies:
    3
    Views:
    347
  4. Replies:
    2
    Views:
    388
    =?Utf-8?B?R3VmZmE=?=
    Apr 21, 2006
  5. KK
    Replies:
    2
    Views:
    646
    Big Brian
    Oct 14, 2003
Loading...

Share This Page