My SAX Parser, regexp style. Cut & paste version .901

Discussion in 'Perl Misc' started by robic0, Jan 8, 2006.

  1. robic0

    robic0 Guest

    Since so much was learned on the substitution method, thought
    this might be a better approach.
    This is just the starting framework. The rest will be filled in.
    Turn off the debug output for full speed.

    Un-wrap the regexp if it is, before using.

    print <<EOM;

    # -----------------------
    # XML (Regex) SAX Parser
    # Version .901 - 1/7/06
    # Copyright 2005,2006
    # by robic0-At-yahoo.com
    # -----------------------

    EOM

    use strict;
    use warnings;

    open DATA, "config.html" or die "can't open config.html...";
    my $gabage1 = join ('', <DATA>);
    close DATA;

    my ($cnt, $content, $show_pos, $debug) = (1, '', 1, 1);

    # master
    #/(?:<\?(.*?)\?>)|(?:<META(.*?)>)|(?:<!DOCTYPE(.*?)>)|(?:<!\[CDATA\[(.*?)\]\]>)|(?:<!--(.*?)-->)|(?:<(\/*[\:0-9a-zA-Z]+?[\s]*\/*)>)|(?:<([\:0-9a-zA-Z]+?)[\s]+((?:[\:0-9a-zA-Z]+[\s]*=[\s]*["'][^<]*['"])+[\s]*\/*)>)|(.+?)/sg)
    # 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8
    8 9 9

    while ($gabage1 =~
    /(?:<\?(.*?)\?>)|(?:<META(.*?)>)|(?:<!DOCTYPE(.*?)>)|(?:<!\[CDATA\[(.*?)\]\]>)|(?:<!--(.*?)-->)|(?:<(\/*[\:0-9a-zA-Z]+?[\s]*\/*)>)|(?:<([\:0-9a-zA-Z]+?)[\s]+((?:[\:0-9a-zA-Z]+[\s]*=[\s]*["'][^<]*['"])+[\s]*\/*)>)|(.+?)/sg)
    # 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8
    8 9 9
    {
    if (defined $9) { $content .= $9; next; }
    print "-"x20,"\n" if ($debug);
    if (length ($content)) {
    print "9 $content\n" if ($debug);
    $content = '';
    }
    if ($show_pos) {
    my $rr = pos $gabage1;
    print "$rr ";
    }
    print "1 VERSION: $1\n" if ($debug && defined $1);
    print "2 META: $2\n" if ($debug && defined $2);
    print "3 DOCTYPE: $3\n" if ($debug && defined $3);
    print "4 CDATA: $4\n" if ($debug && defined $4);
    print "3 COMMENT: $5\n" if ($debug && defined $5);
    ## <tag> or </tag> or <tag/>
    print "6 TAG: $6\n" if ($debug && defined $6);
    ## <tag attrib/> or <tag attrib>
    print "7,8 TAG: $7 Attr: $8\n" if ($debug && defined $7);
    $cnt++;
    }

    __END__
     
    robic0, Jan 8, 2006
    #1
    1. Advertising

  2. robic0

    robic0 Guest

    On Sat, 07 Jan 2006 17:31:46 -0800, robic0 wrote:

    BIG things on the way...
     
    robic0, Jan 15, 2006
    #2
    1. Advertising

  3. robic0

    robic0 Guest

    I'm about to finish this thing. Its mostly modeled after Expat.
    Its all perl, mine is faster parsing about 1 meg a second.
    Its also complient will current xml standards on w3c.org.
    There's so much to it, I don't think I want to post it here.
    I would like to make it into a "free" module on cpan or Active States
    release version.

    I think its commercial level. The fact is I can "interject" special
    searches and handling if I want to. It is designed using the specs
    from here:

    http://www.w3.org/TR/xml11/#NT-AttValue

    Its version 1.1 If I'm using the wron specs, please let me know.
    Its awsome, tremendously fast.
    I am going to also write a full featured "schema checker" using this
    base parser. I've never seen something so easy as schema checking.
    Thinking beyond I will move into modification tools. Even style sheet
    mods (i think, its all too easy now). I will do it all in markup.
    The code is about 600 lines now. I could plop it down here. I have
    all constructs covered in the above 1.1 specs. I'm worried a little
    about encoding and unicode. By an large, I've never seen anything
    so easy in my life. I fear that my code is approacing a proffessional
    level and I may "not" want to just plop it down here.

    I may want to contact AS or Cpan to post the module so its not ripped
    off. However, I know I could do a schema checker in a week. Since its
    all so easy now, I'm wondering if I can make any money at this or is it
    all just a give-away...

    Oh well, from a homeless man to a middle class man, I know it won't be
    that much. However, I have developed tools that could do conversions.
    Yea sure I want to put my stuff in the public domain, but the internals
    I do with them could do fast custom conversions.

    What do you think? Say it now, if it ends up in AS or Cpan you won't have
    the option to reccommend. It will arrive there, but whats the money behind
    hard core conversions, style, schema, filters, anything?
     
    robic0, Jan 22, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?QW5nZWw=?=

    custom cut copy and paste

    =?Utf-8?B?QW5nZWw=?=, Jan 10, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    378
    =?Utf-8?B?QW5nZWw=?=
    Jan 10, 2005
  2. Roedy Green

    Cut/Paste Bug

    Roedy Green, Jul 8, 2004, in forum: Java
    Replies:
    7
    Views:
    581
    Andrew Thompson
    Jul 9, 2004
  3. Esteban

    Cut and paste images

    Esteban, Sep 14, 2004, in forum: Java
    Replies:
    5
    Views:
    5,034
  4. cpprogrammer
    Replies:
    0
    Views:
    562
    cpprogrammer
    May 11, 2006
  5. robic0
    Replies:
    43
    Views:
    645
    robic0
    Jan 6, 2006
Loading...

Share This Page