newbie question on sequencing

Discussion in 'Java' started by Mar Thomas, Aug 26, 2003.

  1. Mar Thomas

    Mar Thomas Guest

    Heres my problem. I have an xml which looks like this
    <myfile>
    <parent num=1.00>
    <child num=1.01>
    <child num=1.02>
    <child num=1.03>
    </parent>
    <parent num=1-a.00>
    <child num=1-a.01>
    <child num=1-a.02>
    <child num=1-a.03>
    </parent>
    <parent num=A>
    <child num=a>
    <child num=b>
    <child num=c>
    </parent>
    </myfile>

    You will notice that the numbering structure changes for every element. How
    can I find out

    1. What the sequence is for each element
    2. If there are any numbers missing in the each of the sequences

    Can my XML parser help me get this info. I dont know where to start

    Thanks
    Mar Thomas, Aug 26, 2003
    #1
    1. Advertising

  2. Mar Thomas

    Roedy Green Guest

    On Tue, 26 Aug 2003 14:31:23 -0400, "Mar Thomas" <>
    wrote or quoted :

    ><parent num=1.00>
    > <child num=1.01>
    > <child num=1.02>
    > <child num=1.03>
    ></parent>
    ><parent num=1-a.00>
    > <child num=1-a.01>
    > <child num=1-a.02>
    > <child num=1-a.03>
    ></parent>
    ><parent num=A>
    > <child num=a>
    > <child num=b>
    > <child num=c>
    ></parent>


    Let's break the problem in two. Problem one, extract a sequence you
    want to analyse from the XML. .e.g. "1.01" 1.02", 1.03" or "a", "b",
    "c".

    Now for the analysis:

    1. use a regex to see if a sequence follows a known pattern. Apply to
    the regex to each value in turn for each of your patterns. See
    http://mindprod.com/jgloss/regex.html

    2. Now you have identified the pattern, you can create a generator of
    the expected value given the previous value. If they don't match, you
    have a break.
    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, Aug 26, 2003
    #2
    1. Advertising

  3. Mar Thomas

    Brad BARCLAY Guest

    Mar Thomas wrote:

    > You will notice that the numbering structure changes for every element. How
    > can I find out
    >
    > 1. What the sequence is for each element
    > 2. If there are any numbers missing in the each of the sequences
    >
    > Can my XML parser help me get this info. I dont know where to start


    About all that an XML parser is going to be able to give you is the
    daata itself. The parser doesn't necessarily know nor care what the
    data actually is, so long as it conforms to the relevent DTD.

    I assume you're trying to determine the sequencing programatically?

    There are two things you really need to accomplish here -- the first is
    a regular expression that encompasses the "language" of the values, and
    the second is to create a dictionary ordering for the value elements so
    that you can properly increment them.

    For the first, you'll want to start by defining the relevent alphabet
    for the language. To do this, you'll want to inspect the elements and
    identify:

    - The letter elements
    - The numerical elements
    - The symbol elements

    Before you go much further, try to determine wether or not there are
    going to be _any_ rules for the numbering -- ie: are non-alphanumerics
    considered static seperators that are unchanging, or can they too be
    incremented? If the former, things are a bit easier -- if a
    non-alphanumeric occurs in the numbering, it will be unchanging in its
    "position" throughout all members, making the construction of the
    regular expression defining that numbering easier. If they can be an
    active element of the numbering, things are somewhat more difficult.

    As well, you'll have to try to determine what is to happen when a
    letter or number identifier reaches its maximum amount for the given
    number of digits. For example, if you have the following numbering in
    your XML file:

    '1'
    '2'
    '3'

    ...we can probably safely assume that '4' is next. But what comes
    after '9'? Will it be '10' (adding another digit where one didn't exist
    before?), 'A' (retaining single-digitedness, but either switching to
    letters, _or_ assuming a hexidecimal representation), or will this not
    be allowed?


    Similar goes for letters. What comes after 'z'? 'A'? 'aa'?
    Undefined? Nothing?

    If you're working with numerical values, are you going to assume
    they're decimal? If you only have as input the numbers 1 - 3 as above,
    you could be working with octal values, where there is no '8' or '9'
    digits. If you know for certain that only decimal values will be
    allowed, this makes such issues quite a bit easier.

    All of these factors will determine your regular expression
    construction which, if you don't have any rules, can be a difficult
    thing to construct algorithmically (as to correctly achieve the ends you
    desire, it's not enough to create an expression that accepts the values
    present, and the values presumed. ".*" will accept your values (and
    everything else while you're at it). What you need to do is create an
    expression which excepts _exactly_ your language -- ie: it will accept
    all the allowable elements of the language, but nothing that isn't part
    of the language).

    Once you have those in place, you can use them to ensure that the
    elements are consistent with the language they appear to be part of.

    The next step is to have some dictionary rules in place for
    incrementing and comparison. Assuming the common right-hand-digit
    incrementing system the common numerical systems use, doing this will be
    easy -- you can use a straight ASCII increment for all non-seperator
    (static) elements, incrementing just as you would if you were working
    with decimal numbers. To verify that the elements present do indeed
    form a series, simply read the first value, increment it by one, and
    check to see if that equals to the next value. If it does, it's in
    sequence. If not, it's not (or you've made an incorrect assumption as
    to the values).

    You've asked a very difficult set of questions -- ones which have no
    specific answers (aand no real "optimal" answer). For any "word" in a
    language, there are an infinite number of grammers that can contain that
    "word", most of which will also contain invalid values, and many of
    which will reject valid values in the same language. You're trying to
    devine a whole language based on a few elements. The only way you can
    be precise in this instance is if you assume that those values are the
    _only_ acceptable values in the language, and you construct a regular
    expression that accepts exactly and only those values -- which doesn't
    appear to be what you want.

    The long and the sort of it being, unless you have some really explicit
    rules, or create an XML entity (or attribute) where the developer can
    define the regular expression in use for their numbering language, any
    solution you come up with is going to be imprecise, and may be
    error-prone with certain types of numberings.

    (It should also be noted here that there are a lot of languages which
    regular expressions _cannot_ define. These include anything that
    requires some form of "memory" between states -- something which would
    need a grammer instead of a finite automata).

    HTH!

    Brad BARCLAY

    --
    =-=-=-=-=-=-=-=-=
    From the OS/2 WARP v4.5 Desktop of Brad BARCLAY.
    The jSyncManager Project: http://www.jsyncmanager.org
    
    Brad BARCLAY, Aug 27, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kamal Jeet Singh
    Replies:
    1
    Views:
    700
    Scott Allen
    Sep 23, 2004
  2. Kamal Jeet Singh
    Replies:
    1
    Views:
    436
    Martin Dechev
    Sep 24, 2004
  3. Tom Houston via .NET 247

    Web form sequencing questions - vb.net & tab order

    Tom Houston via .NET 247, Mar 13, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    725
    Tom Houston via .NET 247
    Mar 13, 2005
  4. Chris
    Replies:
    11
    Views:
    468
    Chris Uppal
    Mar 22, 2007
  5. Cathead

    Composite Control sequencing question

    Cathead, Sep 11, 2003, in forum: ASP .Net Web Controls
    Replies:
    0
    Views:
    164
    Cathead
    Sep 11, 2003
Loading...

Share This Page