My SAX Parser, regexp style. Cut & paste version .901

Discussion in 'Perl Misc' started by robic0, Jan 8, 2006.

  robic0

    robic0 Guest

    Since so much was learned on the substitution method, thought
    this might be a better approach.
    This is just the starting framework. The rest will be filled in.
    Turn off the debug output for full speed.

    Un-wrap the regexp if it is, before using.

    print <<EOM;

    # -----------------------
    # XML (Regex) SAX Parser
    # Version .901 - 1/7/06
    # Copyright 2005,2006
    # by
    # -----------------------


    use strict;
    use warnings;

    open DATA, "config.html" or die "can't open config.html...";
    my $gabage1 = join ('', <DATA>);
    close DATA;

    my ($cnt, $content, $show_pos, $debug) = (1, '', 1, 1);

    # master
    # 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8
    8 9 9

    while ($gabage1 =~
    # 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8
    8 9 9
    if (defined $9) { $content .= $9; next; }
    print "-"x20,"\n" if ($debug);
    if (length ($content)) {
    print "9 $content\n" if ($debug);
    $content = '';
    if ($show_pos) {
    my $rr = pos $gabage1;
    print "$rr ";
    print "1 VERSION: $1\n" if ($debug && defined $1);
    print "2 META: $2\n" if ($debug && defined $2);
    print "3 DOCTYPE: $3\n" if ($debug && defined $3);
    print "4 CDATA: $4\n" if ($debug && defined $4);
    print "3 COMMENT: $5\n" if ($debug && defined $5);
    ## <tag> or </tag> or <tag/>
    print "6 TAG: $6\n" if ($debug && defined $6);
    ## <tag attrib/> or <tag attrib>
    print "7,8 TAG: $7 Attr: $8\n" if ($debug && defined $7);

    robic0, Jan 8, 2006
  2. robic0

    robic0 Guest

    On Sat, 07 Jan 2006 17:31:46 -0800, robic0 wrote:

    BIG things on the way...
    robic0, Jan 15, 2006
  3. robic0

    robic0 Guest

    I'm about to finish this thing. Its mostly modeled after Expat.
    Its all perl, mine is faster parsing about 1 meg a second.
    Its also complient will current xml standards on
    There's so much to it, I don't think I want to post it here.
    I would like to make it into a "free" module on cpan or Active States
    release version.

    I think its commercial level. The fact is I can "interject" special
    searches and handling if I want to. It is designed using the specs
    from here:

    Its version 1.1 If I'm using the wron specs, please let me know.
    Its awsome, tremendously fast.
    I am going to also write a full featured "schema checker" using this
    base parser. I've never seen something so easy as schema checking.
    Thinking beyond I will move into modification tools. Even style sheet
    mods (i think, its all too easy now). I will do it all in markup.
    The code is about 600 lines now. I could plop it down here. I have
    all constructs covered in the above 1.1 specs. I'm worried a little
    about encoding and unicode. By an large, I've never seen anything
    so easy in my life. I fear that my code is approacing a proffessional
    level and I may "not" want to just plop it down here.

    I may want to contact AS or Cpan to post the module so its not ripped
    off. However, I know I could do a schema checker in a week. Since its
    all so easy now, I'm wondering if I can make any money at this or is it
    all just a give-away...

    Oh well, from a homeless man to a middle class man, I know it won't be
    that much. However, I have developed tools that could do conversions.
    Yea sure I want to put my stuff in the public domain, but the internals
    I do with them could do fast custom conversions.

    What do you think? Say it now, if it ends up in AS or Cpan you won't have
    the option to reccommend. It will arrive there, but whats the money behind
    hard core conversions, style, schema, filters, anything?
    robic0, Jan 22, 2006
