Convert CDATA expression to Javascript RegExp

Discussion in 'XML' started by Max, Feb 13, 2007.

  1. Max

    Max Guest

    Hello everyone!

    Can anyone help me to convert the CDATA expression "CDATA ::= (Char* -
    (Char* ']]>' Char*)" to Javascript Regular Expression?

    Thanks,

    Max
     
    Max, Feb 13, 2007
    #1
    1. Advertising

  2. Translation to English: A CDATA's value can contain any legal XML
    characters except the three-character sequence ]]> (which is used to
    terminate the value.

    I don't do Javascript, so you'll have to translate it the rest of the
    way yourself.


    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
     
    Joseph Kesselman, Feb 13, 2007
    #2
    1. Advertising

  3. Max

    Guest

    On 13 Feb, 17:38, Max <> wrote:
    > Hello everyone!
    >
    > Can anyone help me to convert the CDATA expression "CDATA ::= (Char* -
    > (Char* ']]>' Char*)" to Javascript Regular Expression?
    >
    > Thanks,
    >
    > Max


    Doing regular expressions that end with a string of characters is
    slightly involved. You need to do something like:

    /([^\]]*|][^\]]|]][^>]|]]?$)*/

    Not the easiest thing to see! Maybe the best thing is to break it
    into it's component parts. e.g.:

    var no_bracket = "[^\]]*";
    var one_bracket = "][^\]]";
    var two_brackets = "]][^>]";
    var end_bracket = "]]?$";

    var expr = "/(" + no_bracket + "|" + one_bracket + "|" + two_bracket +
    + "|" + end_bracket + ")*/";

    I'll admit I haven't tested it, but hopefully it gives you an idea!
    (The $ anchor may not work where it is. In which case try \Z in its
    place.)

    HTH,

    Pete.
    --
    =============================================
    Pete Cordell
    Tech-Know-Ware Ltd
    for XML to C++ data binding visit
    http://www.tech-know-ware.com/lmx
    http://www.codalogic.com/lmx
    (or http://www.xml2cpp.com)
    =============================================
     
    , Feb 13, 2007
    #3
  4. Max

    Guest

    On 13 Feb, 20:38, wrote:
    > On 13 Feb, 17:38, Max <> wrote:
    >
    > > Hello everyone!

    >
    > > Can anyone help me to convert the CDATA expression "CDATA ::= (Char* -
    > > (Char* ']]>' Char*)" to Javascript Regular Expression?

    >
    > > Thanks,

    >
    > > Max

    >
    > Doing regular expressions that end with a string of characters is
    > slightly involved. You need to do something like:
    >
    > /([^\]]*|][^\]]|]][^>]|]]?$)*/
    >
    > Not the easiest thing to see! Maybe the best thing is to break it
    > into it's component parts. e.g.:
    >
    > var no_bracket = "[^\]]*";
    > var one_bracket = "][^\]]";
    > var two_brackets = "]][^>]";
    > var end_bracket = "]]?$";
    >
    > var expr = "/(" + no_bracket + "|" + one_bracket + "|" + two_bracket +
    > + "|" + end_bracket + ")*/";
    >
    > I'll admit I haven't tested it, but hopefully it gives you an idea!
    > (The $ anchor may not work where it is. In which case try \Z in its
    > place.)


    I was thinking more about this over night. The details of the regular
    expression depend on what input string you want to apply the matching
    on. If you could give an idea of the types of strings you want the
    match to be applied (e.g. whole XML message, or element text etc) to
    it might be possible to have a better pattern.

    Pete.
    --
    =============================================
    Pete Cordell
    Tech-Know-Ware Ltd
    for XML to C++ data binding visit
    http://www.tech-know-ware.com/lmx
    http://www.codalogic.com/lmx
    (or http://www.xml2cpp.com)
    =============================================
     
    , Feb 14, 2007
    #4
  5. Max

    Max Guest

    Hello Pete!

    I have written this regular expression:

    <!\\[CDATA\\[(((?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]|[\\uE000-\\uFFFD]|[\\u10000-\\u10FFFF])*?)(]]>(?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]|[\\uE000-\\uFFFD]|[\\u10000-\\u10FFFF])*?)*)]]>

    I break it into these component parts:

    XParser.CHAR =
    "(?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]|[\\uE000-\\uFFFD]|[\\u10000-\\u10FFFF])";
    XParser.CDSTART = "<!\\[CDATA\\[";
    XParser.CDATA = "((" + XParser.CHAR + "*?)(]]>" + XParser.CHAR + "*?)*)";
    XParser.CDEND = "]]>";
    XParser.CDSECT = XParser.CDSTART + XParser.CDATA + XParser.CDEND;

    XML code example:

    <![CDATA[this child is of <<<>nodeType CDATA]]>

    The problem is been born expanding the simple regular expression for
    CDATA ('(" + XParser.CHAR + "*?)') with the feature to capture more
    markup ']]>'.
    But in this way it capture also two or more CDSECT...

    Example:
    1 Tag: <![CDATA[this child is of <<<>nodeType CDATA]]>
    Capture: this child is of <<<>nodeType CDATA

    2 Tag: <![CDATA[this child is of <<<>nodeType CDATA]]><![CDATA[this
    child is of <<<>nodeType CDATA]]>
    Capture: this child is of <<<>nodeType CDATA]]><![CDATA[this child is of
    <<<>nodeType CDATA

    Is it possible to resolve this?

    Thanks in advance,

    Max
     
    Max, Feb 14, 2007
    #5
  6. This sounds like it's really a Javascript programming question rather
    than an XML question, since the question is how to express something in
    that language's reg-exp syntax rather than what to express. So you might
    get better answers by asking in a Javascript newsgroup than here.

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
     
    Joseph Kesselman, Feb 14, 2007
    #6
  7. (After all, most of us just use an existing XML parser and let *it* deal
    with syntax.)

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
     
    Joseph Kesselman, Feb 14, 2007
    #7
  8. Max

    Guest

    On 14 Feb, 14:59, Max <> wrote:
    > Hello Pete!
    >
    > I have written this regular expression:
    >
    > <!\\[CDATA\\[(((?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]|[\\uE000-\\uFFF­D]|[\\u10000-\\u10FFFF])*?)(]]>(?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]­|[\\uE000-\\uFFFD]|[\\u10000-\\u10FFFF])*?)*)]]>
    >
    > I break it into these component parts:
    >
    > XParser.CHAR =
    > "(?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]|[\\uE000-\\uFFFD]|[\\u10000-\­\u10FFFF])";
    > XParser.CDSTART = "<!\\[CDATA\\[";
    > XParser.CDATA = "((" + XParser.CHAR + "*?)(]]>" + XParser.CHAR + "*?)*)";
    > XParser.CDEND = "]]>";
    > XParser.CDSECT = XParser.CDSTART + XParser.CDATA + XParser.CDEND;
    >
    > XML code example:
    >
    > <![CDATA[this child is of <<<>nodeType CDATA]]>
    >
    > The problem is been born expanding the simple regular expression for
    > CDATA ('(" + XParser.CHAR + "*?)') with the feature to capture more
    > markup ']]>'.
    > But in this way it capture also two or more CDSECT...
    >
    > Example:
    > 1 Tag: <![CDATA[this child is of <<<>nodeType CDATA]]>
    > Capture: this child is of <<<>nodeType CDATA
    >
    > 2 Tag: <![CDATA[this child is of <<<>nodeType CDATA]]><![CDATA[this
    > child is of <<<>nodeType CDATA]]>
    > Capture: this child is of <<<>nodeType CDATA]]><![CDATA[this child is of
    > <<<>nodeType CDATA
    >
    > Is it possible to resolve this?
    >
    > Thanks in advance,
    >
    > Max


    Hi Max,

    In this case I think you need to rework your XParser.CDATA rule along
    the lines of the following:

    // You could write these using a similar approach to your XParser.CHAR
    if you prefer
    var no_bracket = "[^\\]]*";
    var one_bracket = "][^\\]]";
    var two_brackets = "]][^>]";

    XParser.CDATA = "(" + no_bracket + "|" + one_bracket + "|" +
    two_bracket + ")*" + "]*";

    The logic is basically:

    if( current char is not ] ||
    current char is ] AND next char is NOT ] ||
    current char is ] and the next char is ] and the next one is NOT
    > )

    then OK;

    which is more easily understood as:

    if( current char is not ] ) then OK;
    else if( current char is ] AND next char is NOT ] ) then OK;
    else if( current char is ] and the next char is ] and the next one is
    NOT > ) then OK;

    The end just allow any number of ] characters if necessary.

    HTH,

    Pete.
    --
    =============================================
    Pete Cordell
    Tech-Know-Ware Ltd
    for XML to C++ data binding visit
    http://www.tech-know-ware.com/lmx
    (or http://www.xml2cpp.com)
    =============================================
     
    , Feb 14, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Davison
    Replies:
    1
    Views:
    597
    Hal Rosser
    Jul 7, 2004
  2. Edwin G. Castro
    Replies:
    3
    Views:
    3,721
    Edwin G. Castro
    Sep 17, 2004
  3. Replies:
    3
    Views:
    764
    Joe Kesselman
    Mar 6, 2006
  4. Joao Silva
    Replies:
    16
    Views:
    379
    7stud --
    Aug 21, 2009
  5. Uldis  Bojars
    Replies:
    2
    Views:
    196
    Janwillem Borleffs
    Dec 17, 2006
Loading...

Share This Page