REgular expression to match a XML tag

Discussion in 'Javascript' started by Karthik, Nov 2, 2007.

  1. Karthik

    Karthik Guest

    Hi All,

    I am trying to match an XML tag using JS regular expressions. The
    pattern I am using is

    pattern="/(<" + tagname + ">)" + "(*)" + "(<." + tagname +
    ">/g)";

    where I want to replace the tagname variable with the name of the tag
    which I want to search for. Unfortunately this doesn't work. If I
    replace the tagname variable with the actual tag's name it works.
    Any idea how to fix this issue?

    If any of you could post a script that could do this it would be
    great.

    Thanks
    Karthik
     
    Karthik, Nov 2, 2007
    #1
    1. Advertising

  2. Karthik

    Karthik Guest

    Hi All,

    MOdified the pattern to
    var patt="(<" + tagname + ">)" + "(*)" + "(<." + tagname +
    ">)";

    without the intial / and ending /g still no go...
    On Nov 1, 11:28 pm, Karthik <> wrote:
    > Hi All,
    >
    > I am trying to match an XML tag using JS regular expressions. The
    > pattern I am using is
    >
    > pattern="/(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    > "&gt;/g)";
    >
    > where I want to replace the tagname variable with the name of the tag
    > which I want to search for. Unfortunately this doesn't work. If I
    > replace the tagname variable with the actual tag's name it works.
    > Any idea how to fix this issue?
    >
    > If any of you could post a script that could do this it would be
    > great.
    >
    > Thanks
    > Karthik
     
    Karthik, Nov 2, 2007
    #2
    1. Advertising

  3. Karthik

    Karthik Guest

    On Nov 1, 11:43 pm, Karthik <> wrote:
    > Hi All,
    >
    > MOdified the pattern to
    > var patt="(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    > "&gt;)";
    >
    > without the intial / and ending /g still no go...
    > On Nov 1, 11:28 pm, Karthik <> wrote:
    >
    > > Hi All,

    >
    > > I am trying to match an XML tag using JS regular expressions. The
    > > pattern I am using is

    >
    > > pattern="/(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    > > "&gt;/g)";

    >
    > > where I want to replace the tagname variable with the name of the tag
    > > which I want to search for. Unfortunately this doesn't work. If I
    > > replace the tagname variable with the actual tag's name it works.
    > > Any idea how to fix this issue?

    >
    > > If any of you could post a script that could do this it would be
    > > great.

    >
    > > Thanks
    > > Karthik


    Here is the full script...
    here str is just a temporary storage, Actually I will be applying the
    pattern on the source of the HTML page of the "current window"
    object.

    <html>
    <body>

    <script type="text/javascript">
    var tagname="ContentId";
    var result="";
    var str = "&lt;ContentId&gt;12345&lt;/ContentId&gt;";
    var patt="(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    "&gt;)";
    //var patt=/(&lt;ContentId&gt;)([\d]*)/g
    document.write(patt + " &nbsp PAttern <BR>");
    document.write(str + "<BR>");
    var patt2=new RegExp(patt);

    result=patt2.exec(str);
    document.write(result + " Result &nbsp <BR>");
    document.write(RegExp.$2);
    </script>

    </body>
    </html>
     
    Karthik, Nov 2, 2007
    #3
  4. Karthik

    Karthik Guest

    On Nov 1, 11:51 pm, Karthik <> wrote:
    > On Nov 1, 11:43 pm, Karthik <> wrote:
    >
    >
    >
    > > Hi All,

    >
    > > MOdified the pattern to
    > > var patt="(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    > > "&gt;)";

    >
    > > without the intial / and ending /g still no go...
    > > On Nov 1, 11:28 pm, Karthik <> wrote:

    >
    > > > Hi All,

    >
    > > > I am trying to match an XML tag using JS regular expressions. The
    > > > pattern I am using is

    >
    > > > pattern="/(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    > > > "&gt;/g)";

    >
    > > > where I want to replace the tagname variable with the name of the tag
    > > > which I want to search for. Unfortunately this doesn't work. If I
    > > > replace the tagname variable with the actual tag's name it works.
    > > > Any idea how to fix this issue?

    >
    > > > If any of you could post a script that could do this it would be
    > > > great.

    >
    > > > Thanks
    > > > Karthik

    >
    > Here is the full script...
    > here str is just a temporary storage, Actually I will be applying the
    > pattern on the source of the HTML page of the "current window"
    > object.
    >
    > <html>
    > <body>
    >
    > <script type="text/javascript">
    > var tagname="ContentId";
    > var result="";
    > var str = "&lt;ContentId&gt;12345&lt;/ContentId&gt;";
    > var patt="(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    > "&gt;)";
    > //var patt=/(&lt;ContentId&gt;)([\d]*)/g
    > document.write(patt + " &nbsp PAttern <BR>");
    > document.write(str + "<BR>");
    > var patt2=new RegExp(patt);
    >
    > result=patt2.exec(str);
    > document.write(result + " Result &nbsp <BR>");
    > document.write(RegExp.$2);
    > </script>
    >
    > </body>
    > </html>


    Got the expression...

    here it is...
    var regexpr= new RegExp("(&lt;" + tagname + "&gt;)([A-Z]*[[a-z]*[0-9]*)
    (&lt;." + tagname + "&gt;)");
    apply a exec of this pattern on any string/html source/xml file, it
    will fetch you the values between the tags..
    one word of warning though if the tag has got child tags, it will
    retrieve all the child tags also :)

    Thanks
    Karthik
     
    Karthik, Nov 2, 2007
    #4
  5. Karthik

    Jeremy Guest

    Karthik wrote:
    > On Nov 1, 11:51 pm, Karthik <> wrote:
    >> On Nov 1, 11:43 pm, Karthik <> wrote:
    >>
    >>
    >>
    >>> Hi All,
    >>> MOdified the pattern to
    >>> var patt="(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    >>> "&gt;)";
    >>> without the intial / and ending /g still no go...
    >>> On Nov 1, 11:28 pm, Karthik <> wrote:
    >>>> Hi All,
    >>>> I am trying to match an XML tag using JS regular expressions. The
    >>>> pattern I am using is
    >>>> pattern="/(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    >>>> "&gt;/g)";
    >>>> where I want to replace the tagname variable with the name of the tag
    >>>> which I want to search for. Unfortunately this doesn't work. If I
    >>>> replace the tagname variable with the actual tag's name it works.
    >>>> Any idea how to fix this issue?
    >>>> If any of you could post a script that could do this it would be
    >>>> great.
    >>>> Thanks
    >>>> Karthik

    >> Here is the full script...
    >> here str is just a temporary storage, Actually I will be applying the
    >> pattern on the source of the HTML page of the "current window"
    >> object.
    >>
    >> <html>
    >> <body>
    >>
    >> <script type="text/javascript">
    >> var tagname="ContentId";
    >> var result="";
    >> var str = "&lt;ContentId&gt;12345&lt;/ContentId&gt;";
    >> var patt="(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    >> "&gt;)";
    >> //var patt=/(&lt;ContentId&gt;)([\d]*)/g
    >> document.write(patt + " &nbsp PAttern <BR>");
    >> document.write(str + "<BR>");
    >> var patt2=new RegExp(patt);
    >>
    >> result=patt2.exec(str);
    >> document.write(result + " Result &nbsp <BR>");
    >> document.write(RegExp.$2);
    >> </script>
    >>
    >> </body>
    >> </html>

    >
    > Got the expression...
    >
    > here it is...
    > var regexpr= new RegExp("(&lt;" + tagname + "&gt;)([A-Z]*[[a-z]*[0-9]*)
    > (&lt;." + tagname + "&gt;)");
    > apply a exec of this pattern on any string/html source/xml file, it
    > will fetch you the values between the tags..
    > one word of warning though if the tag has got child tags, it will
    > retrieve all the child tags also :)
    >
    > Thanks
    > Karthik
    >


    Using regular expressions alone will never really get you a robust
    parser. For example, "<foo>bar<afoo>" would match your current
    expression, even though <afoo> doesn't close <foo>.

    You want to search through the current document for a certain tag?
    Wouldn't it be easier to use DOM for this purpose?

    Jeremy
     
    Jeremy, Nov 2, 2007
    #5
  6. Karthik wrote:

    > var regexpr= new RegExp("(&lt;" + tagname + "&gt;)([A-Z]*[[a-z]*[0-9]*)
    > (&lt;." + tagname + "&gt;)");
    > apply a exec of this pattern on any string/html source/xml file, it
    > will fetch you the values between the tags..
    > one word of warning though if the tag has got child tags, it will
    > retrieve all the child tags also :)


    And that's only the very beginning :)

    Take a look at

    http://groups.google.com/group/comp.lang.perl.misc/browse_frm/thread/795b006db41efc7b/

    to get idea about the complexity of real XML string parsing.

    Do yourself a favour and load it into the XML parser.

    --
    Bart
     
    Bart Van der Donck, Nov 3, 2007
    #6
  7. Karthik wrote:
    > On Nov 1, 11:51 pm, Karthik <> wrote:
    >> On Nov 1, 11:43 pm, Karthik <> wrote:
    >>> MOdified the pattern to
    >>> var patt="(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    >>> "&gt;)";
    >>> without the intial / and ending /g still no go...
    >>> On Nov 1, 11:28 pm, Karthik <> wrote:
    >>>> Hi All,
    >>>> I am trying to match an XML tag using JS regular expressions. The
    >>>> pattern I am using is
    >>>> pattern="/(&lt;" + tagname + "&gt;)" + "(*)" + "(&lt;." + tagname +
    >>>> "&gt;/g)";
    >>>> where I want to replace the tagname variable with the name of the tag
    >>>> which I want to search for. Unfortunately this doesn't work. If I
    >>>> replace the tagname variable with the actual tag's name it works.
    >>>> Any idea how to fix this issue?
    >>>> If any of you could post a script that could do this it would be
    >>>> great.
    >>>> [...]

    >
    > Got the expression...


    Not at all, you don't.

    > here it is...
    > var regexpr= new RegExp("(&lt;" + tagname + "&gt;)([A-Z]*[[a-z]*[0-9]*)
    > (&lt;." + tagname + "&gt;)");
    > apply a exec of this pattern on any string/html source/xml file, it
    > will fetch you the values between the tags..


    Only if the content is ASCII-alphanumeric. XML, however, is UTF-8-safe.

    > one word of warning though if the tag has got child tags, it will

    ^^^^^^^^^^^^^^^^^^^^^^^^^^
    > retrieve all the child tags also :)

    ^^^^^^^^^^
    http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.1 (esp. the last,
    green-colored paragraph)

    It will _not_ match any child _elements_, as you have explicitly excluded
    their start tags from the content of the `tagname' element, assuming that
    the double `[' was but a typo (if it was not, the expression would match `['
    in the content as well). Why you escape `<' and `>' remains a mystery;
    further assuming that you use it within an XHTML `script' element (where
    declaring it as CDATA would have sufficed to avoid the character entity
    references), the possible match would be

    <foo>abc<bar>def</bar>ghi</foo>
    ^^^^^^^^^^

    However, that match is discarded because `ar' does not match `fo'.

    The Chomsky hierarchy, taught in computer science classes, tells us that
    it is usually not possible to use (only) a regular grammar, such as the one
    regular expressions are based on, to parse a context-free language, such as
    SGML-based markup. Because every regular language is context-free, but not
    every context-free language is regular.

    Therefore, only if you need to parse the markup as such instead of accessing
    the corresponding DOM objects, you are looking for a non-deterministic
    pushdown automaton (which can parse those languages), implemented as an XML
    parser (such as DOMParser in Gecko-based UAs), instead. If you don't want
    to use such an external API, it is possible to combine the efficiency of
    regular expression matching with the reliability of an NPDA in your code.

    http://en.wikipedia.org/wiki/Chomsky_hierarchy


    PointedEars
    --
    Anyone who slaps a 'this page is best viewed with Browser X' label on
    a Web page appears to be yearning for the bad old days, before the Web,
    when you had very little chance of reading a document written on another
    computer, another word processor, or another network. -- Tim Berners-Lee
     
    Thomas 'PointedEars' Lahn, Nov 8, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. championsleeper
    Replies:
    6
    Views:
    1,041
    championsleeper
    Apr 6, 2004
  2. Liang
    Replies:
    2
    Views:
    1,740
  3. VSK
    Replies:
    2
    Views:
    2,356
  4. Replies:
    4
    Views:
    746
  5. shruds
    Replies:
    1
    Views:
    896
    John C. Bollinger
    Jan 27, 2006
Loading...

Share This Page