xsd regular expression query

Discussion in 'XML' started by rorymo@gmail.com, Mar 30, 2007.

  1. Guest

    I have a regular expression that allows only certain characters to be
    valid in an xml doc as follows:

    <xs:pattern value="^[ dgHhMmstyf,\-\./:;\\]*" />

    What I want to do is also allow any unicode character that is enclosed
    in single quotes to also be valid, no matter where they appear. I
    tried the following:

    <xs:pattern value="^[ dgHhMmstyf,\-\./:;\\]*('*)*" />

    But this only works if the characters in quotes appear after the other
    text.

    Any help would be much appreciated.

    Regards
    Rory
    , Mar 30, 2007
    #1
    1. Advertising

  2. * wrote in comp.text.xml:
    >I have a regular expression that allows only certain characters to be
    >valid in an xml doc as follows:
    >
    ><xs:pattern value="^[ dgHhMmstyf,\-\./:;\\]*" />
    >
    >What I want to do is also allow any unicode character that is enclosed
    >in single quotes to also be valid, no matter where they appear. I
    >tried the following:
    >
    ><xs:pattern value="^[ dgHhMmstyf,\-\./:;\\]*('*)*" />
    >
    >But this only works if the characters in quotes appear after the other
    >text.


    You are looking for "(a character from a certain set, or the character '
    followed by zero or more characters except the character ' followed by
    the character ') zero or more times", i.e. something like

    ([a-z]|'[^']*')*

    if the first set of characters was a-z.
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
    68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
    Bjoern Hoehrmann, Mar 30, 2007
    #2
    1. Advertising

  3. Guest

    On Mar 30, 2:02 pm, Bjoern Hoehrmann <> wrote:
    > * wrote in comp.text.xml:
    >
    > >I have a regular expression that allows only certain characters to be
    > >valid in an xml doc as follows:

    >
    > ><xs:pattern value="^[ dgHhMmstyf,\-\./:;\\]*" />

    >
    > >What I want to do is also allow any unicode character that is enclosed
    > >in single quotes to also be valid, no matter where they appear. I
    > >tried the following:

    >
    > ><xs:pattern value="^[ dgHhMmstyf,\-\./:;\\]*('*)*" />

    >
    > >But this only works if the characters in quotes appear after the other
    > >text.

    >
    > You are looking for "(a character from a certain set, or the character '
    > followed by zero or more characters except the character ' followed by
    > the character ') zero or more times", i.e. something like
    >
    > ([a-z]|'[^']*')*
    >
    > if the first set of characters was a-z.
    > --
    > Björn Höhrmann · mailto: ·http://bjoern.hoehrmann.de
    > Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
    > 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/


    Not quite - I want to allow the character(s) in quotes to be able to
    appear with the character(s) from the specified set but not in any
    particular place i.e. not necessarily at the beginning or end. For
    example, I want the following to be valid:

    dddd, d. MMMM yyyy
    dddd, d. MMMM MyYear
    MyDay, d. MMMM yyyy
    dddd, d. MyMonth yyyy

    so MyDay, MyYear and MyMonth appear in single quotes in the xml file
    and are valid.
    , Mar 30, 2007
    #3
  4. Guest

    On 30 Mar, 14:30, wrote:
    > On Mar 30, 2:02 pm, Bjoern Hoehrmann <> wrote:
    >
    >
    >
    >
    >
    > > * wrote in comp.text.xml:

    >
    > > >I have a regular expression that allows only certain characters to be
    > > >valid in an xml doc as follows:

    >
    > > ><xs:pattern value="^[ dgHhMmstyf,\-\./:;\\]*" />

    >
    > > >What I want to do is also allow any unicode character that is enclosed
    > > >in single quotes to also be valid, no matter where they appear. I
    > > >tried the following:

    >
    > > ><xs:pattern value="^[ dgHhMmstyf,\-\./:;\\]*('*)*" />

    >
    > > >But this only works if the characters in quotes appear after the other
    > > >text.

    >
    > > You are looking for "(a character from a certain set, or the character '
    > > followed by zero or more characters except the character ' followed by
    > > the character ') zero or more times", i.e. something like

    >
    > > ([a-z]|'[^']*')*

    >
    > > if the first set of characters was a-z.
    > > --
    > > Björn Höhrmann · mailto: ·http://bjoern.hoehrmann.de
    > > Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
    > > 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

    >
    > Not quite - I want to allow the character(s) in quotes to be able to
    > appear with the character(s) from the specified set but not in any
    > particular place i.e. not necessarily at the beginning or end. For
    > example, I want the following to be valid:
    >
    > dddd, d. MMMM yyyy
    > dddd, d. MMMM MyYear
    > MyDay, d. MMMM yyyy
    > dddd, d. MyMonth yyyy
    >
    > so MyDay, MyYear and MyMonth appear in single quotes in the xml file
    > and are valid.- Hide quoted text -
    >
    > - Show quoted text -


    It looks to me that Björn's pattern achieves this. (Which filling in
    the dots would be:

    (^[ dgHhMmstyf,\-\./:;\\]|'[^']*')*

    Can you do a test program and show some examples where it does not
    work?

    Cheers,

    Pete.
    --
    =============================================
    Pete Cordell
    Tech-Know-Ware Ltd
    for XML to C++ data binding visit
    http://www.tech-know-ware.com/lmx/
    http://www.codalogic.com/lmx/
    =============================================
    , Apr 3, 2007
    #4
  5. Guest

    On Apr 3, 12:46 pm, wrote:
    > On 30 Mar, 14:30, wrote:
    >
    >
    >
    >
    >
    > > On Mar 30, 2:02 pm, Bjoern Hoehrmann <> wrote:

    >
    > > > * wrote in comp.text.xml:

    >
    > > > >I have a regular expression that allows only certain characters to be
    > > > >valid in an xml doc as follows:

    >
    > > > ><xs:pattern value="^[ dgHhMmstyf,\-\./:;\\]*" />

    >
    > > > >What I want to do is also allow any unicode character that is enclosed
    > > > >in single quotes to also be valid, no matter where they appear. I
    > > > >tried the following:

    >
    > > > ><xs:pattern value="^[ dgHhMmstyf,\-\./:;\\]*('*)*" />

    >
    > > > >But this only works if the characters in quotes appear after the other
    > > > >text.

    >
    > > > You are looking for "(a character from a certain set, or the character '
    > > > followed by zero or more characters except the character ' followed by
    > > > the character ') zero or more times", i.e. something like

    >
    > > > ([a-z]|'[^']*')*

    >
    > > > if the first set of characters was a-z.
    > > > --
    > > > Björn Höhrmann · mailto: ·http://bjoern.hoehrmann.de
    > > > Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
    > > > 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

    >
    > > Not quite - I want to allow the character(s) in quotes to be able to
    > > appear with the character(s) from the specified set but not in any
    > > particular place i.e. not necessarily at the beginning or end. For
    > > example, I want the following to be valid:

    >
    > > dddd, d. MMMM yyyy
    > > dddd, d. MMMM MyYear
    > > MyDay, d. MMMM yyyy
    > > dddd, d. MyMonth yyyy

    >
    > > so MyDay, MyYear and MyMonth appear in single quotes in the xml file
    > > and are valid.- Hide quoted text -

    >
    > > - Show quoted text -

    >
    > It looks to me that Björn's pattern achieves this. (Which filling in
    > the dots would be:
    >
    > (^[ dgHhMmstyf,\-\./:;\\]|'[^']*')*
    >
    > Can you do a test program and show some examples where it does not
    > work?
    >
    > Cheers,
    >
    > Pete.
    > --
    > =============================================
    > Pete Cordell
    > Tech-Know-Ware Ltd
    > for XML to C++ data binding visit
    > http://www.tech-know-ware.com/lmx/
    > http://www.codalogic.com/lmx/
    > =============================================- Hide quoted text -
    >
    > - Show quoted text -


    The pipe(|) in you regular expression means 'or' right? If so then you
    can only have the characters dgHhMmstyf,\-\./:;\\ OR characters in
    quotes but not both in the same string so you can have:

    dddd, d. MMMM yyyy
    OR
    'ab$olutN0n$en$e'
    But not

    dddd, d. MMMM yyyy 'ab$olutN0n$en$e'

    Does this make sense?
    , Apr 4, 2007
    #5
  6. Guest

    On 4 Apr, 17:08, wrote:
    > On Apr 3, 12:46 pm, wrote:
    >

    ....
    > > It looks to me that Björn's pattern achieves this. (Which filling in
    > > the dots would be:

    >
    > > (^[ dgHhMmstyf,\-\./:;\\]|'[^']*')*

    >
    > > Can you do a test program and show some examples where it does not
    > > work?

    >
    > The pipe(|) in you regular expression means 'or' right? If so then you
    > can only have the characters dgHhMmstyf,\-\./:;\\ OR characters in
    > quotes but not both in the same string so you can have:
    >
    > dddd, d. MMMM yyyy
    > OR
    > 'ab$olutN0n$en$e'
    > But not
    >
    > dddd, d. MMMM yyyy 'ab$olutN0n$en$e'
    >
    > Does this make sense?


    But that's then wrapped in a set of brackets and a *. So you can have
    any number of occurences of:

    dddd, d. MMMM yyyy
    OR
    'ab$olutN0n$en$e'

    which means you can have:

    dddd, d. MMMM yyyy 'ab$olutN0n$en$e'

    Note the pattern mentioned earlier is wrong. To match Bjorn's
    suggestion it should not have had the first ^, so it becomes:

    ([ dgHhMmstyf,\-\./:;\\]|'[^']*')*

    I recommend loading the pattern into whatever is the most convenient
    regular expression parser for you such as Perl, PHP, Java, C#,
    Javascript, and developing a number of test cases. Then check that
    each one passes or fails as is appropriate. Then where there's
    errors, list them here.

    If you don't have a convenient regular expression engine locally you
    might be able to download a pure regular expression tester, or do
    something on the web such as: http://www.nvcc.edu/home/drodgers/ceu/resources/test_regexp.asp

    BTW - when testing a regular expression destined for a schema on a
    normal regular expression engine, remember to put ^ and $ anchors
    around it.

    HTH,

    Pete.
    --
    =============================================
    Pete Cordell
    Tech-Know-Ware Ltd
    for XML to C++ data binding visit
    http://www.tech-know-ware.com/lmx/
    http://www.codalogic.com/lmx/
    =============================================
    , Apr 4, 2007
    #6
  7. Guest

    On Apr 4, 5:40 pm, wrote:
    > On 4 Apr, 17:08, wrote:
    >
    >
    >
    >
    >
    > > On Apr 3, 12:46 pm, wrote:

    >
    > ...
    > > > It looks to me that Björn's pattern achieves this. (Which filling in
    > > > the dots would be:

    >
    > > > (^[ dgHhMmstyf,\-\./:;\\]|'[^']*')*

    >
    > > > Can you do a test program and show some examples where it does not
    > > > work?

    >
    > > The pipe(|) in you regular expression means 'or' right? If so then you
    > > can only have the characters dgHhMmstyf,\-\./:;\\ OR characters in
    > > quotes but not both in the same string so you can have:

    >
    > > dddd, d. MMMM yyyy
    > > OR
    > > 'ab$olutN0n$en$e'
    > > But not

    >
    > > dddd, d. MMMM yyyy 'ab$olutN0n$en$e'

    >
    > > Does this make sense?

    >
    > But that's then wrapped in a set of brackets and a *. So you can have
    > any number of occurences of:
    >
    > dddd, d. MMMM yyyy
    > OR
    > 'ab$olutN0n$en$e'
    >
    > which means you can have:
    >
    > dddd, d. MMMM yyyy 'ab$olutN0n$en$e'
    >
    > Note the pattern mentioned earlier is wrong. To match Bjorn's
    > suggestion it should not have had the first ^, so it becomes:
    >
    > ([ dgHhMmstyf,\-\./:;\\]|'[^']*')*
    >
    > I recommend loading the pattern into whatever is the most convenient
    > regular expression parser for you such as Perl, PHP, Java, C#,
    > Javascript, and developing a number of test cases. Then check that
    > each one passes or fails as is appropriate. Then where there's
    > errors, list them here.
    >
    > If you don't have a convenient regular expression engine locally you
    > might be able to download a pure regular expression tester, or do
    > something on the web such as:http://www.nvcc.edu/home/drodgers/ceu/resources/test_regexp.asp
    >
    > BTW - when testing a regular expression destined for a schema on a
    > normal regular expression engine, remember to put ^ and $ anchors
    > around it.
    >
    > HTH,
    >
    > Pete.
    > --
    > =============================================
    > Pete Cordell
    > Tech-Know-Ware Ltd
    > for XML to C++ data binding visithttp://www.tech-know-ware.com/lmx/http://www.codalogic.com/lmx/
    > =============================================- Hide quoted text -
    >
    > - Show quoted text -


    I have it working now. Thanks a lot to all who replied.

    -Rory.
    , Apr 5, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. lonelyplanet999

    Regular Expression Query

    lonelyplanet999, Aug 14, 2003, in forum: Perl
    Replies:
    1
    Views:
    1,812
  2. VSK
    Replies:
    2
    Views:
    2,272
  3. Sriv Chakravarthy
    Replies:
    0
    Views:
    780
    Sriv Chakravarthy
    Sep 18, 2003
  4. MBow
    Replies:
    1
    Views:
    467
    Martin Honnen
    Jan 15, 2004
  5. Martin Biddiscombe

    Regular expression query

    Martin Biddiscombe, Feb 3, 2006, in forum: Python
    Replies:
    4
    Views:
    361
    bruno at modulix
    Feb 6, 2006
Loading...

Share This Page