Excluding values in the xsd

Discussion in 'XML' started by dick.deneer@donkeydevelopment.com, Mar 21, 2007.

  1. Guest

    I have a XML which specifies a Cobol copybook member. The XML is
    checked against a XSD.
    One of the xml attributes is the Cobol fieldname. The xsd constraints
    the value of this attribute to be greater then zero and less then 31.
    Now I want to include another check. The value must not be one of the
    Cobol reserved words. So I have a list with reserved words (like SUM,
    ACCEPT, COMPUTE, etc).
    How can I specify this excluded values in the xsd so that the XML
    vaidation will return errors if one of the reserved words is used in
    the attribute value.
    DickD
    , Mar 21, 2007
    #1
    1. Advertising

  2. wrote:
    > I have a XML which specifies a Cobol copybook member. The XML is
    > checked against a XSD.
    > One of the xml attributes is the Cobol fieldname. The xsd constraints
    > the value of this attribute to be greater then zero and less then 31.
    > Now I want to include another check. The value must not be one of the
    > Cobol reserved words. So I have a list with reserved words (like SUM,
    > ACCEPT, COMPUTE, etc).
    > How can I specify this excluded values in the xsd so that the XML
    > vaidation will return errors if one of the reserved words is used in
    > the attribute value.


    You can enumerate those reserved words e.g.
    <xs:simpleType name="reserved-word">
    <xs:restriction base="xs:string">
    <xs:enumeration value="ACCEPT"/>
    <xs:enumeration value="COMPUTE"/>
    <xs:enumeration value="SUM"/>
    <!-- add further values here -->
    </xs:restriction>
    </xs:simpleType>




    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Mar 22, 2007
    #2
    1. Advertising

  3. Joseph Kesselman, Mar 22, 2007
    #3
  4. Martin Honnen wrote:
    > wrote:
    >> I have a XML which specifies a Cobol copybook member. The XML is
    >> checked against a XSD.
    >> One of the xml attributes is the Cobol fieldname. The xsd constraints
    >> the value of this attribute to be greater then zero and less then 31.
    >> Now I want to include another check. The value must not be one of the
    >> Cobol reserved words. So I have a list with reserved words (like SUM,
    >> ACCEPT, COMPUTE, etc).
    >> How can I specify this excluded values in the xsd so that the XML
    >> vaidation will return errors if one of the reserved words is used in
    >> the attribute value.

    >
    > You can enumerate those reserved words e.g.


    Sorry, I misread your request, enumeration helps if you want to allow
    all reserved words, but not if you want to disallow them.


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Mar 22, 2007
    #4
  5. Guest

    On 22 mrt, 15:42, Joseph Kesselman <> wrote:
    > wrote:
    > > How can I specify this excluded values in the xsd so that the XML
    > > vaidation will return errors if one of the reserved words is used in
    > > the attribute value.

    >
    > Use a regular expression to describe the attribute's acceptable values.
    >
    > http://www.w3.org/TR/xmlschema-2/#rf-patternhttp://www.w3.org/TR/xmlschema-2/#regexs
    >
    > --
    > Joe Kesselman / Beware the fury of a patient man. -- John Dryden


    The problem is that any value can be accepted, except the list of
    reserved words.
    In regex it is not easy to negate an expression. There is not
    something like ^(SUM,COMPUTE,DATA).
    After a long internet search I found an expression the matched my
    needs.
    Here is the java code:
    String s2 = "perfOrm";
    String regex = "^(?:(?!^(?im:accept|accept-encoding|from|to|perform|
    sub)$)[\\w-])*$";
    System.out.println(s2 + " matches " + regex + " =
    "+s2.matches(regex));

    The exclude values in this example are arbritary.
    But: ... this kind of expression is not supported by Xerces or any
    other parser.
    I found that the XML Schema specifcations talk about level 1 regex
    support.

    So if anyone has a idea to solve this ??
    Regards
    Dick Deneer
    , Mar 22, 2007
    #5
  6. wrote:
    > So if anyone has a idea to solve this ??


    I think Schema's supported regular expressions can be presuaded to do
    it, though the expression may be painfully ugly.

    If you aren't happy with that, implement the check in the application
    rather than in schema.

    Remember, the schema is only an initial sanity check on syntax and
    overall structure of the document. It is NOT intended to capture and
    check all possible semantic constraints. Some checking will still have
    to be implemented in the application.

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
    Joseph Kesselman, Mar 22, 2007
    #6
  7. Guest

    > I think Schema's supported regular expressions can be presuaded to do
    > it, though the expression may be painfully ugly.


    I think it is not possible. Please convince me :)

    Regards
    Dick Deneer
    , Mar 23, 2007
    #7
  8. writes:

    > I think it is not possible. Please convince me :)


    Regular languages are closed under complementation. So, you can be
    sure it is possible: there _is_ a regular expression that matches
    everything except a finite set of words. If you want to exclude, e.g.,
    "if" and "else", you can go:

    ([^i]|i[^f]|if.|[^e]|e[^l]|el[^s]|els[^e]|else.).*

    (I'm not sure about the regexp syntax for schemas). It may be a real
    pain. I don't know if there's an easier way to get the same result.

    -- Alain.
    Alain Ketterlin, Mar 23, 2007
    #8
  9. Guest

    On 23 mrt, 13:42, Alain Ketterlin <-strasbg.fr> wrote:
    > writes:
    > > I think it is not possible. Please convince me :)

    >
    > Regular languages are closed under complementation. So, you can be
    > sure it is possible: there _is_ a regular expression that matches
    > everything except a finite set of words. If you want to exclude, e.g.,
    > "if" and "else", you can go:
    >
    > ([^i]|i[^f]|if.|[^e]|e[^l]|el[^s]|els[^e]|else.).*
    >
    > (I'm not sure about the regexp syntax for schemas). It may be a real
    > pain. I don't know if there's an easier way to get the same result.
    >
    > -- Alain.


    Alain,

    I tested your expression and it always returns true, whatever
    (including if and else) I type.
    Do I miss something?
    , Mar 23, 2007
    #9
  10. writes:

    >> ([^i]|i[^f]|if.|[^e]|e[^l]|el[^s]|els[^e]|else.).*


    > I tested your expression and it always returns true, whatever
    > (including if and else) I type.
    > Do I miss something?


    I did :) I went to fast. You have to 1) include trailing chars in the
    alternative, 2) group prefixes to exclude, 3) take care of strict
    prefixes. Something like:

    ([^ie].*|i|i[^f].*|if.+|e|e[^l].*|el|el[^s].*|els|els[^e].*|else.+)

    May get really hairy with lots of keywords. Be careful with common
    prefixes, like "if" and "int":

    ([^i].*|i|i[^nf].*|if.+|in[^t].*|int.+)

    I stop here, in fear of writing nonsense. The basic idea is simple:

    1) draw a trie (lexicographic tree) containing all the words
    2) add one alternative for each path to a non leaf node (i,el,els)
    3) add one alternative for each path out of a node (either
    leaf or non-leaf), i.e., a path that starts "in" the tree and "exits"
    the tree at some point (i[^f].*,if.+ etc.)

    (It basically amounts in reverting the output of a deterministic
    finite automaton.)

    -- Alain.

    P/S: BTW, I just discovered grep --colour... Useful in such cases.
    Alain Ketterlin, Mar 24, 2007
    #10
  11. * Alain Ketterlin wrote in comp.text.xml:
    >Regular languages are closed under complementation. So, you can be
    >sure it is possible: there _is_ a regular expression that matches
    >everything except a finite set of words. If you want to exclude, e.g.,
    >"if" and "else", you can go:
    >
    > ([^i]|i[^f]|if.|[^e]|e[^l]|el[^s]|els[^e]|else.).*
    >
    >(I'm not sure about the regexp syntax for schemas). It may be a real
    >pain. I don't know if there's an easier way to get the same result.


    You created not(if) or not(else) which matches if and else, you need to
    create not(if) and not(else), i.e. the intersection of two regular ex-
    pressions. I suppose there is a painful way in XML Schema to specify
    multiple regular expressions a string must match, and inverting a group
    is simple (abc -> not(a) .* or a not(b) .* or ab not(c) or abc .+). It
    would be better to compute the intersection of the regular expressions.
    There may be finite state automata tools that support that. I am about
    to release a tool that can do it aswell.
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
    68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
    Bjoern Hoehrmann, Mar 24, 2007
    #11
  12. Guest

    On 24 mrt, 08:24, Alain Ketterlin <-strasbg.fr> wrote:
    > writes:
    > >> ([^i]|i[^f]|if.|[^e]|e[^l]|el[^s]|els[^e]|else.).*

    > > I tested your expression and it always returns true, whatever
    > > (including if and else) I type.
    > > Do I miss something?

    >
    > I did :) I went to fast. You have to 1) include trailing chars in the
    > alternative, 2) group prefixes to exclude, 3) take care of strict
    > prefixes. Something like:
    >
    > ([^ie].*|i|i[^f].*|if.+|e|e[^l].*|el|el[^s].*|els|els[^e].*|else.+)
    >
    > May get really hairy with lots of keywords. Be careful with common
    > prefixes, like "if" and "int":
    >
    > ([^i].*|i|i[^nf].*|if.+|in[^t].*|int.+)
    >
    > I stop here, in fear of writing nonsense. The basic idea is simple:
    >
    > 1) draw a trie (lexicographic tree) containing all the words
    > 2) add one alternative for each path to a non leaf node (i,el,els)
    > 3) add one alternative for each path out of a node (either
    > leaf or non-leaf), i.e., a path that starts "in" the tree and "exits"
    > the tree at some point (i[^f].*,if.+ etc.)
    >
    > (It basically amounts in reverting the output of a deterministic
    > finite automaton.)
    >
    > -- Alain.
    >
    > P/S: BTW, I just discovered grep --colour... Useful in such cases.


    Alain (and Bjoern)

    I am convinced.
    It is possible but indeed very painfull if your list of reserved words
    is big, which is the case for me.
    Thanks a lot,
    Dick Deneer
    , Mar 24, 2007
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Markus
    Replies:
    1
    Views:
    1,094
    Markus
    Nov 22, 2005
  2. Rick Razzano

    XSD document for XSD defintion

    Rick Razzano, Sep 26, 2003, in forum: XML
    Replies:
    1
    Views:
    481
    C. M. Sperberg-McQueen
    Sep 26, 2003
  3. Replies:
    1
    Views:
    854
    Martin Honnen
    Jan 14, 2004
  4. Peter Aberline

    xsd:any as a child of xsd:all

    Peter Aberline, Apr 5, 2004, in forum: XML
    Replies:
    0
    Views:
    783
    Peter Aberline
    Apr 5, 2004
  5. Bernd Oninger
    Replies:
    1
    Views:
    523
    Henry S. Thompson
    Jun 30, 2004
Loading...

Share This Page