JAVA Regular Expression

Discussion in 'Java' started by Remko Silvis, Mar 3, 2006.

  1. Remko Silvis

    Remko Silvis Guest

    Hi can anyone help me understand the following
    I have a regular expression that has to match for example:

    <format zipped="false" encrypted="false" />

    My expression:
    Pattern singleTag = Pattern.compile("< *(\\w+) *(\\w+\\=\"\\w+\" *)*/>");
    matches only 3 Groups

    group(0) = <format zipped="false" encrypted="false" />
    group(1) = format
    group(2) = encrypted="false"

    it skippes the 'zipped="false"' argument.
    When i change the regexp to fit this string exactly,
    like this:
    Pattern singleTag = Pattern.compile("< *(\\w+) *(\\w+\\=\"\\w+\"
    *)(\\w+\\=\"\\w+\" *)/>");
    i get the four groups i expect:

    group(0) = <format zipped="false" encrypted="false" />
    group(1) = format
    group(2) = zipped="false"
    group(4) = encrypted="false"

    it seams that repeating a group '()*' does not create a new group, but
    overwrites the previous one.

    Can anyone confirm this?

    Thanx,
    Remko

    --
    Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
    Remko Silvis, Mar 3, 2006
    #1
    1. Advertising

  2. Remko Silvis wrote:
    > Hi can anyone help me understand the following
    > I have a regular expression that has to match for example:
    >
    > <format zipped="false" encrypted="false" />
    >
    > My expression:
    > Pattern singleTag = Pattern.compile("< *(\\w+) *(\\w+\\=\"\\w+\"
    > *)*/>"); matches only 3 Groups
    >
    > group(0) = <format zipped="false" encrypted="false" />
    > group(1) = format
    > group(2) = encrypted="false"


    Your regexp has only two brackets so you only get two groups (plus group 0
    for the whole match).

    > it skippes the 'zipped="false"' argument.
    > When i change the regexp to fit this string exactly,
    > like this:
    > Pattern singleTag = Pattern.compile("< *(\\w+) *(\\w+\\=\"\\w+\"
    > *)(\\w+\\=\"\\w+\" *)/>");
    > i get the four groups i expect:
    >
    > group(0) = <format zipped="false" encrypted="false" />
    > group(1) = format
    > group(2) = zipped="false"
    > group(4) = encrypted="false"
    >
    > it seams that repeating a group '()*' does not create a new group, but
    > overwrites the previous one.
    >
    > Can anyone confirm this?


    This is standard regexp bahavior.

    Two solutions:

    1. If you know there are only two attributes you can change your regexp to
    match two attributes and directly access them.

    2. If there are 0 to n attributes (most likely there are) the easiest is
    to create a regexp that will match the whole <...> expression and have a
    group extract all attributes. Then create a second regexp which
    recognizes a single attribute and use that to iterate over the group that
    recognized all attributes.

    Kind regards

    robert
    Robert Klemme, Mar 3, 2006
    #2
    1. Advertising

  3. "Remko Silvis" <> wrote in news:eek:p.s5tx5zfhicvu55
    @studentbak.biol.rug.nl:

    > Hi can anyone help me understand the following
    > I have a regular expression that has to match for example:
    >
    > <format zipped="false" encrypted="false" />


    Why are you using regular expressions to parse XML, when there are an
    abundance of available XML-specific parsers (DOM, SAX, STAX) available
    for you to use, which would very likely be simpler and more reliable?

    Cheers
    GRB
    Greg R. Broderick, Mar 3, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,269
  2. Chen Yang

    regular expression in java

    Chen Yang, Oct 4, 2003, in forum: Java
    Replies:
    6
    Views:
    556
    Roedy Green
    Oct 4, 2003
  3. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    831
    Alan Moore
    Dec 2, 2005
  4. GIMME
    Replies:
    3
    Views:
    11,921
    vforvikash
    Dec 29, 2008
  5. joes
    Replies:
    2
    Views:
    993
    Daniel Pitts
    May 25, 2007
Loading...

Share This Page