a little parsing challenge ☺

Discussion in 'Python' started by Xah Lee, Jul 17, 2011.

  1. Xah Lee

    Xah Lee Guest

    2011-07-16

    folks, this one will be interesting one.

    the problem is to write a script that can check a dir of text files
    (and all subdirs) and reports if a file has any mismatched matching
    brackets.

    • The files will be utf-8 encoded (unix style line ending).

    • If a file has mismatched matching-pairs, the script will display the
    file name, and the line number and column number of the first
    instance where a mismatched bracket occures. (or, just the char number
    instead (as in emacs's “pointâ€))

    • the matching pairs are all single unicode chars. They are these and
    nothing else: () {} [] “†‹› «» ã€ã€‘ 〈〉 《》 「ã€ã€Žã€
    Note that ‘single curly quote’ is not consider matching pair here.

    • You script must be standalone. Must not be using some parser tools.
    But can call lib that's part of standard distribution in your lang.

    Here's a example of mismatched bracket: ([)], (“[[â€), ((, 】etc. (and
    yes, the brackets may be nested. There are usually text between these
    chars.)

    I'll be writing a emacs lisp solution and post in 2 days. Ι welcome
    other lang implementations. In particular, perl, python, php, ruby,
    tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp
    (clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure,
    Scala, C, C++, or any others, are all welcome, but i won't be able to
    eval it. javascript implementation will be very interesting too, but
    please indicate which and where to install the command line version.

    I hope you'll find this a interesting “challengeâ€. This is a parsing
    problem. I haven't studied parsers except some Wikipedia reading, so
    my solution will probably be naive. I hope to see and learn from your
    solution too.

    i hope you'll participate. Just post solution here. Thanks.

    Xah
     
    Xah Lee, Jul 17, 2011
    #1
    1. Advertising

  2. On Jul 17, 12:47 am, Xah Lee <> wrote:
    > i hope you'll participate. Just post solution here. Thanks.


    http://pastebin.com/7hU20NNL


    Raymond
     
    Raymond Hettinger, Jul 17, 2011
    #2
    1. Advertising

  3. On Sun, Jul 17, 2011 at 5:47 PM, Xah Lee <> wrote:
    > the problem is to write a script that can check a dir of text files
    > (and all subdirs) and reports if a file has any mismatched matching
    > brackets.


    I wonder will it be possible to code the whole thing as a single
    regular expression... I'm pretty sure it could be done as a sed or awk
    script, but I'm insufficiently expert in either to do the job.

    ChrisA
     
    Chris Angelico, Jul 17, 2011
    #3
  4. Xah Lee

    rusi Guest

    On Jul 17, 4:34 pm, Chris Angelico <> wrote:
    > On Sun, Jul 17, 2011 at 5:47 PM, Xah Lee <> wrote:
    > > the problem is to write a script that can check a dir of text files
    > > (and all subdirs) and reports if a file has any mismatched matching
    > > brackets.

    >
    > I wonder will it be possible to code the whole thing as a single
    > regular expression... I'm pretty sure it could be done as a sed or awk
    > script, but I'm insufficiently expert in either to do the job.
    >
    > ChrisA


    No possible: See http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages#Use_of_lemma

    Informally stated as regexes cant count.
     
    rusi, Jul 17, 2011
    #4
  5. Robert Klemme, Jul 17, 2011
    #5
  6. Xah Lee

    mhenn Guest

    Am 17.07.2011 15:20, schrieb Robert Klemme:
    > On 07/17/2011 11:48 AM, Raymond Hettinger wrote:
    >> On Jul 17, 12:47 am, Xah Lee<> wrote:
    >>> i hope you'll participate. Just post solution here. Thanks.

    >>
    >> http://pastebin.com/7hU20NNL

    >
    > Ruby solution: https://gist.github.com/1087583


    I acutally don't know Ruby that well, but it looks like your program
    recognizes "[(])" as correct although it is not, because you translate
    "[(])" to "(())" (which is indeed correct, but does not resemble the
    input correctly anymore).

    >
    > Kind regards
    >
    > robert
     
    mhenn, Jul 17, 2011
    #6
  7. Chris Angelico wrote:

    > On Sun, Jul 17, 2011 at 5:47 PM, Xah Lee <> wrote:
    >> the problem is to write a script that can check a dir of text files
    >> (and all subdirs) and reports if a file has any mismatched matching
    >> brackets.

    >
    > I wonder will it be possible to code the whole thing as a single
    > regular expression... I'm pretty sure it could be done as a sed or awk
    > script, but I'm insufficiently expert in either to do the job.


    Did you notice the excessive crosspost? Please do not feed the troll.

    In the classical sense is not possible, as classical regular expressions
    have no concept of recursion. Indeed, matching brackets are a textbook
    example for a non-regular¹ context-free language L = {a^n b^n; n > 0} that
    can only be recognized by a pushdown automaton (PDA). (Finite automata
    "cannot count".)

    However, it is possible (and done) to use classical regular expressions or
    non-recursive Regular Expressions (note the different character case) to
    match tokens more efficiently with a PDA implementation. This is commonly
    called a parser. (Programming languages tend to be specified in terms of a
    context-free grammar – they tend to be context-free languages –, which is
    why a parser is a part of a compiler or interpreter. See for example
    Python.²)

    It is possible, with Perl-compatible Regular Expressions (PCRE), provided
    that you have enough memory, to use such an extended Regular Expression (not
    to be confused with EREs³)â´:

    \((([^()]*|(?R))*)\)

    However, even Python 3.2 does not support those expressions (although it
    supports some other PCRE patterns, like named subexpressions)âµ, neither do
    standard and forked versions of sed(1) (BREs, EREs, using an NFA) nor awk
    (EREs, using a DFA or NFA). [That is not to say it would not be possible
    with Python, or sed or awk (both of which are off-topic here), but that more
    than a Regular Expression would be required.]

    On this subject, I recommend reading the preview chapters of the second and
    third editions, respectively, of Jeffrey E. F. Friedl's "Mastering Regular
    Expressions", which are available online for free at O'Reilly.comâ¶.

    HTH.

    ______
    ¹ because it can be proved that the pumping lemma for regular languages
    does not apply to it; see also
    <http://en.wikipedia.org/wiki/Chomsky_hierarchy> pp.
    ² <http://docs.python.org/reference/>
    ³ <http://en.wikipedia.org/wiki/Regular_expression>
    â´ Cf. <http://php.net/manual/en/regexp.reference.recursive.php>
    âµ <http://docs.python.org/library/re.html>
    ⶠ<http://oreilly.com/catalog/regex/chapter/ch04.html>,
    <http://oreilly.com/catalog/9780596528126/preview#preview>
    --
    PointedEars

    Bitte keine Kopien per E-Mail. / Please do not Cc: me.
     
    Thomas 'PointedEars' Lahn, Jul 17, 2011
    #7
  8. Xah Lee

    gene heskett Guest

    On Sunday, July 17, 2011 10:12:27 AM Xah Lee did opine:

    > 2011-07-16
    >
    > folks, this one will be interesting one.
    >
    > the problem is to write a script that can check a dir of text files
    > (and all subdirs) and reports if a file has any mismatched matching
    > brackets.
    >
    > • The files will be utf-8 encoded (unix style line ending).
    >
    > • If a file has mismatched matching-pairs, the script will display the
    > file name, and the line number and column number of the first
    > instance where a mismatched bracket occures. (or, just the char number
    > instead (as in emacs's “pointâ€))
    >
    > • the matching pairs are all single unicode chars. They are these and
    > nothing else: () {} [] “†‹› «» ã€ã€‘ 〈〉 《》 「〠『ã€
    > Note that ‘single curly quote’ is not consider matching pair here.
    >
    > • You script must be standalone. Must not be using some parser tools.
    > But can call lib that's part of standard distribution in your lang.
    >
    > Here's a example of mismatched bracket: ([)], (“[[â€), ((, 】etc. (and
    > yes, the brackets may be nested. There are usually text between these
    > chars.)
    >
    > I'll be writing a emacs lisp solution and post in 2 days. Ι welcome
    > other lang implementations. In particular, perl, python, php, ruby,
    > tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp
    > (clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure,
    > Scala, C, C++, or any others, are all welcome, but i won't be able to
    > eval it. javascript implementation will be very interesting too, but
    > please indicate which and where to install the command line version.
    >
    > I hope you'll find this a interesting “challengeâ€. This is a parsing
    > problem. I haven't studied parsers except some Wikipedia reading, so
    > my solution will probably be naive. I hope to see and learn from your
    > solution too.
    >
    > i hope you'll participate. Just post solution here. Thanks.
    >
    > Xah


    This is a very old solution, some of it nearly 30 years old.
    Written in C, the trick is in doing it in python I guess.
    ======================begin cntx.c=======================
    /*^k^s
    ..ds2
    ..hb
    ..fb^k^s^b Cntx.c, page #^k^s^b
    *****************************************************************
    * *
    * CC (C Checker) *
    * *
    * C Source Brackets, Parenthesis, brace, *
    * quote and comment Checker *
    * *
    * T. Jennings -- Sometime in 1983 *
    * Slightly tweaked and renamed MINILINT *
    * KAB Aug 84 *
    * Ported to OS9 and further tweaked *
    * Brian Paquette March 91 *
    * Brackets, single, double quote counters added *
    * & renamed Cntx 04/09/91 *
    * by Gene Heskett *
    * *
    * some additional code for ignoring "syntax" chars inside of *
    * double quoted strings added 3/21/93 by Gene Heskett *
    * same for single quoted stuffs 7/10/93 by Gene Heskett *
    * And long lines handling ability added too. *
    * Adding tab ignorers and a counter to tally how many 11/21/93 *
    ****************************************************************/
    #define OS9 /* used for nested comment handling*/
    /* comment out for non OS9/6809*/

    #include <stdio.h>
    #include <ctype.h>
    #include <string.h>
    #define FALSE 0
    #define TRUE 1

    #ifdef OS9
    #define NO " No "
    #define YES " Yes "
    char *cmnt;
    #endif

    /* Very crude but very effective C source debugger. Counts the numbers of
    matching braces, parentheses and comments, and displays them at the left
    edge of the screen. The best way to see what it does is to do it; try

    cntx -v cntx.c

    Properly handles parens and braces inside comments; they are ignored.
    Also ignores single quotes if doubles are odd number, so singles
    can be used in a printf string for punctuation. IF you see the doubles
    are odd at line end (the printout tally), all bets are OFF!
    Enter cntx on the command line by itself for a usage note.
    */

    main(argc,argv)
    int argc;
    char *argv[];
    {
    FILE *f;
    char c,secnd_c,lastc;
    int bracket,parens,braces,squote,dquote,comments;
    int perr,bkerr,brerr,sqerr,dqerr;
    int verbose,okay;
    int filearg = 0;
    int line, col, tabc;

    if ((argc < 2)||(argc > 3)) getout(0);
    if (argc == 3)
    {
    verbose = TRUE; /* already tested for -v switch */
    if((argv[1][0] == '-') && (toupper(argv[1][1]) == 'V'))
    filearg = 2; /*file name pointed to by argv[2] */
    if((argv[2][0] == '-') && (toupper(argv[2][1]) == 'V'))
    filearg = 1;
    if(!filearg) getout(192);
    }
    else
    {
    verbose = FALSE;
    filearg = 1;
    }
    if ((f = fopen(argv[filearg],"r")) == NULL)
    {
    fprintf(stderr,"Cntx: can't open '%s'\n",argv[1]);
    getout(216);
    }
    bracket= braces= parens= comments= squote= dquote= 0;
    perr= bkerr= brerr= sqerr= dqerr= 0;
    line= col= tabc= 0;
    secnd_c= lastc= '\0';

    while ((c = getc(f)) != EOF)
    {
    while(c==0x09) /* ignore, but tally the count */
    {
    tabc+=1;
    c=getc(f);
    }

    /* print running tally if in verbose mode and at beginning of line*/
    /* OS9 version prints status of whether or not one is in a comment rather*/
    /* than a count, as the Microware C compiler does not nest comments*/

    if ((col == 0) && verbose )
    {
    #ifdef OS9
    if (comments)
    cmnt = YES;
    else cmnt = NO;
    printf("%d: [%d] {%d} (%d) \'%d\' \"%d\" /*%s*/
    tabcnt=%d\n\n",
    line,bracket,braces,parens,squote,dquote,cmnt,tabc);
    #else
    printf("%d: [%d] {%d} (%d) \'%d\' \"%d\" /*%d*/\n\n",
    line,bracket,braces,parens,squote,dquote,comments);
    #endif
    }

    /* additions to help tally squote & dquote errors at line end,
    squotes and dquotes should match if we don't count those squotes
    present when dquotes are odd number as in inside a printf or
    puts statement. Also if they are part of an escape sequence,
    don't count */

    if (col == 0 && (squote % 2) ) ++sqerr;
    if (col == 0 && (dquote % 2) ) ++dqerr;
    if (col == 0 && bracket ) ++bkerr;

    /* now clears the error to get back in step */
    if (col == 0) squote=dquote=0;

    /* Don't count parens and braces that are inside comments. This of course
    assumes that comments are properly matched; in any case, that will be the
    first thing to look for. */

    if (comments <= 0)
    { /* 3/20/93, 7/10/93 taking sensitivity out of quoted stuffs */
    /* here, do ++dquote if its not a char constant like this
    */
    if ( c == '"' ) ++dquote; /* a little simpler */

    if ( !(dquote & 1) ) /* was the && of those */
    {
    if (c == '{' ) ++braces;
    if (c == '(' ) ++parens;
    if (lastc != '\'' && secnd_c == '[' && c != '\'' ) ++bracket;
    /* here, skip squotes in a "text string's" */
    if ( secnd_c != '\\' && c== '\'' && !(dquote) ) ++squote;
    if ( lastc == '\\' && secnd_c == '\\' && c == '\'' ) ++squote;
    if (c == '}' ) --braces;
    if (c == ')' ) --parens;
    if (lastc != '\'' && secnd_c == ']' && c != '\'' ) --bracket;
    }
    }

    /* Now do comments. This properly handles nested comments;
    whether or not the compiler does is your responsibility */

    #ifdef OS9

    /* The Microware C compiler for OS9 does NOT nest comments. */
    /* The comment-close-mark (asterisk-backslash) will terminate */
    /* (see K & R) a comment no matter how many '/*' come before it*/

    if ((c == '/') && (secnd_c == '*'))
    comments = 0;
    if ((c == '*') && (secnd_c == '/') && (comments == 0))
    ++comments;
    #else
    if ( (c == '/' ) && (secnd_c == '*' ) ) --comments;
    if ( (c == '*' ) && (secnd_c == '/' ) ) ++comments;
    #endif
    ++col;
    if (c == '\n' && secnd_c != '\\' )
    { /* non-escaped newline == New Line */
    col= 0; /* set column 0 */
    ++line;
    }
    if (verbose)
    putchar(c); /* display text */
    lastc= secnd_c; /* update last char */
    secnd_c= c;
    }
    if (verbose)
    {
    #ifdef OS9
    if (comments)
    cmnt = YES;
    else cmnt = NO;
    printf("EOF: [%d] {%d} (%d) \'%d\' \"%d\" /*%s*/\n",
    bracket,braces,parens,squote,dquote,cmnt);
    #else
    printf("EOF: [%d] {%d} (%d) \'%d\' \"%d\" /*%d*/\n",
    bracket,braces,parens,squote,dquote,comments);
    #endif
    }
    okay = TRUE;
    if (bracket||bkerr) puts("Unbalanced brackets\n"), okay = FALSE;
    if (braces) puts("Unbalanced braces\n"),okay = FALSE;
    if (parens) puts("Unbalanced parentheses\n"),okay = FALSE;
    if (sqerr||(squote%2)) puts("Unmatched single quotes\n"),okay=FALSE;
    if (dqerr||(dquote%2)) puts("Unmatched double quotes\n"),okay=FALSE;
    if (comments) puts("Unbalanced comments\n"),okay = FALSE;
    if (okay) puts("No errors found\n");
    }
    getout(errex)
    int errex;
    {
    fprintf(stderr,"Usage: Cntx [-v] <filename> [-v]\n");
    fprintf(stderr," -v = verbose mode \n");
    exit(errex);
    }
    =====================end cntx.c====================
    =================begin cntx.hlp====================
    This "Cntx" is based rather loosely on the previously uploaded
    file called MINILINT, in that if you use the -v option, it will
    show you the file and its report on a line by line basis as
    MINILINT did. Cntx however will also check for use and misuse of
    more of the usual "C" punctuation. Its smart enough to ignore an
    "escaped" character, or those buried in a text string inside a
    printf("[[{{'etc"); statement. The basic organization is from
    "MINILINT", but much expanded in checking scope. It still is NOT
    a "lint" which is why I didn't call it that, but it has turned out
    to be awfully handy. Ported to the Amiga, it found some stuff in
    the code I was feeding DICE that I had totally missed, and which
    was not being properly reported by DICE either, the errors it was
    spitting out made no sense whatsoever. I had somehow lost a
    terminating "}" in one of the PRINTFORM files in the translation
    to a C that required proto statements. Cntx found it, even if
    Dillons Integrated C Environments "dcc" didn't. But it still isn't
    a "lint", not yet.

    Usage: cntx [-v] filename

    Without the -v, it rapidly scans the whole source file and gives
    only a final report of "no errors found" or "mismatched brackets",
    etc.

    Added 3/20/93 MEH: One more conditional test now causes it to skip
    thru any parens, braces or brackets found within a double quoted
    string such as the format string for printf. As the tally needs
    to be reset at the start of a new line to maintain the error
    checking phasing in case there is an error, the total double
    quote count for the whole file is no longer kept. Only the error
    tally now shows at the end of a file scan. So to see the line
    with the error, you must use the -v option, preferably on a
    pauseing screen so that one screen full of data can be seen at a
    time. I liked the totals myself, but this does work better. Now
    Edition 4.

    Added 7/10/93 MEH: Essentially the same as the above paragraph but
    for stuff inside a pair of single quotes, so now *any* character
    can be single quoted without being an error. Now Edition 5.
    ===================end cntx.hlp=====================

    Sometimes this list is hilarious with its re-inventions of the wheel. :)

    The above code isn't the final version, I had it running on linux too, but
    one of fedoras (or W.D.s pissy drives) infamous crashes caused that version
    to come up missing, forgotten when I was doing the salvage operation to a
    new hard drive.

    Cheers, gene
    --
    "There are four boxes to be used in defense of liberty:
    soap, ballot, jury, and ammo. Please use in that order."
    -Ed Howdershelt (Author)
    A holding company is a thing where you hand an accomplice the goods while
    the policeman searches you.
     
    gene heskett, Jul 17, 2011
    #8
  9. On Jul 17, 9:47 am, Xah Lee <> wrote:
    > 2011-07-16
    >
    > folks, this one will be interesting one.
    >
    > the problem is to write a script that can check a dir of text files
    > (and all subdirs) and reports if a file has any mismatched matching
    > brackets.
    >
    > • The files will be utf-8 encoded (unix style line ending).
    >
    > • If a file has mismatched matching-pairs, the script will display the
    > file name, and the  line number and column number of the first
    > instance where a mismatched bracket occures. (or, just the char number
    > instead (as in emacs's “pointâ€))
    >
    > • the matching pairs are all single unicode chars. They are theseand
    > nothing else: () {} [] “†‹› «»ã€ã€‘ 〈〉 《》 「〠『ã€
    > Note that ‘single curly quote’ is not consider matching pair here.
    >
    > • You script must be standalone. Must not be using some parser tools.
    > But can call lib that's part of standard distribution in your lang.
    >
    > Here's a example of mismatched bracket: ([)], (“[[â€), ((,】etc. (and
    > yes, the brackets may be nested. There are usually text between these
    > chars.)
    >
    > I'll be writing a emacs lisp solution and post in 2 days. Ι welcome
    > other lang implementations. In particular, perl, python, php, ruby,
    > tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp
    > (clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure,
    > Scala, C, C++, or any others, are all welcome, but i won't be able to
    > eval it. javascript implementation will be very interesting too, but
    > please indicate which and where to install the command line version.
    >
    > I hope you'll find this a interesting “challengeâ€. This is a parsing
    > problem. I haven't studied parsers except some Wikipedia reading, so
    > my solution will probably be naive. I hope to see and learn from your
    > solution too.
    >
    > i hope you'll participate. Just post solution here. Thanks.


    I thought I'd have some fun with multi-processing:

    https://gist.github.com/1087682
     
    Thomas Jollans, Jul 17, 2011
    #9
  10. Xah Lee

    Thomas Boell Guest

    On Sun, 17 Jul 2011 02:48:42 -0700 (PDT)
    Raymond Hettinger <> wrote:

    > On Jul 17, 12:47 am, Xah Lee <> wrote:
    > > i hope you'll participate. Just post solution here. Thanks.

    >
    > http://pastebin.com/7hU20NNL


    I'm new to Python. I think I'd have done it in a similar way (in any
    language). Your use of openers/closers looks nice though. In the
    initialization of openers, I guess you implicitly create a kind of
    hash, right? Then the 'in' operator checks for the keys. That is elegant
    because you have the openers and closers right next to each other, not
    in separate lists.

    But why do you enumerate with start=1? Shouldn't you start with index 0?
     
    Thomas Boell, Jul 17, 2011
    #10
  11. On 07/17/2011 03:55 PM, mhenn wrote:
    > Am 17.07.2011 15:20, schrieb Robert Klemme:
    >> On 07/17/2011 11:48 AM, Raymond Hettinger wrote:
    >>> On Jul 17, 12:47 am, Xah Lee<> wrote:
    >>>> i hope you'll participate. Just post solution here. Thanks.
    >>>
    >>> http://pastebin.com/7hU20NNL

    >>
    >> Ruby solution: https://gist.github.com/1087583

    >
    > I acutally don't know Ruby that well, but it looks like your program
    > recognizes "[(])" as correct although it is not, because you translate
    > "[(])" to "(())" (which is indeed correct, but does not resemble the
    > input correctly anymore).


    Right you are. The optimization breaks the logic. Good catch!

    Kind regards

    robert
     
    Robert Klemme, Jul 17, 2011
    #11
  12. On 07/17/2011 06:01 PM, Robert Klemme wrote:
    > On 07/17/2011 03:55 PM, mhenn wrote:
    >> Am 17.07.2011 15:20, schrieb Robert Klemme:
    >>> On 07/17/2011 11:48 AM, Raymond Hettinger wrote:
    >>>> On Jul 17, 12:47 am, Xah Lee<> wrote:
    >>>>> i hope you'll participate. Just post solution here. Thanks.
    >>>>
    >>>> http://pastebin.com/7hU20NNL
    >>>
    >>> Ruby solution: https://gist.github.com/1087583

    >>
    >> I acutally don't know Ruby that well, but it looks like your program
    >> recognizes "[(])" as correct although it is not, because you translate
    >> "[(])" to "(())" (which is indeed correct, but does not resemble the
    >> input correctly anymore).

    >
    > Right you are. The optimization breaks the logic. Good catch!


    Turns out with a little possessiveness I can fix my original approach
    which has the added benefit of not needing three passes through the file
    (the two #tr's are obsolete now).

    https://gist.github.com/1087583

    Cheers

    robert
     
    Robert Klemme, Jul 17, 2011
    #12
  13. On Jul 17, 8:49 am, Thomas Boell <> wrote:
    > But why do you enumerate with start=1? Shouldn't you start with index 0?


    The problem specification says that the the char number should match
    the emacs goto-char function which is indexed from one, not from
    zero. This is testable by taking the output of the program and
    running it through emacs to see that the cursor gets moved exactly to
    the location of the mismatched delimiter.


    Raymond
     
    Raymond Hettinger, Jul 17, 2011
    #13
  14. On Jul 17, 7:15 am, Thomas 'PointedEars' Lahn <>
    wrote:
    > Did you notice the excessive crosspost?  Please do not feed the troll.


    IMO, this was a legitimate cross post since it is for a multi-language
    programming challenge and everyone can learn from comparing the
    results.


    Raymond
     
    Raymond Hettinger, Jul 17, 2011
    #14
  15. Raymond Hettinger wrote:

    > Thomas 'PointedEars' Lahn wrote:
    >> Did you notice the excessive crosspost? Please do not feed the troll.

    >
    > IMO, this was a legitimate cross post since it is for a multi-language
    > programming challenge and everyone can learn from comparing the
    > results.


    Even if so (which I seriously doubt, see also my sig), you cannot reasonably
    deny that "Xah Lee" is a well-known Usenet troll, and that this "challenge"
    is nothing more than yet another sophisticated attempt at trolling. Please
    do not feed.


    PointedEars
    --
    No article in the world is relevant to more than a small handful of
    groups. If WW III is announced, it will be announced in
    net.announce.important. -- Peter da Silva, bofh.cabal, "Usenet II rules"
     
    Thomas 'PointedEars' Lahn, Jul 17, 2011
    #15
  16. On 07/17/2011 10:16 PM, Thomas 'PointedEars' Lahn wrote:
    > Raymond Hettinger wrote:
    >
    >> Thomas 'PointedEars' Lahn wrote:
    >>> Did you notice the excessive crosspost? Please do not feed the troll.

    >>
    >> IMO, this was a legitimate cross post since it is for a multi-language
    >> programming challenge and everyone can learn from comparing the
    >> results.

    >
    > Even if so (which I seriously doubt, see also my sig), you cannot reasonably
    > deny that "Xah Lee" is a well-known Usenet troll, and that this "challenge"
    > is nothing more than yet another sophisticated attempt at trolling. Please
    > do not feed.


    You know what you're doing? You're feeding the troll.

    Yes, I know Xah Lee. Yes, he is known for trolling. So what? That alone
    does not mean that every single thing he posts has to be bad. I'm all
    with Raymond here.

    There's nothing wrong with this post. This is the one time when it's
    okay to feed the troll: reinforce good behaviour.
     
    Thomas Jollans, Jul 17, 2011
    #16
  17. Thomas 'PointedEars' Lahn wrote:

    > It is possible [to parse the parentheses language], with Perl-compatible
    > Regular Expressions (PCRE), provided that you have enough memory, to use
    > such an extended Regular Expression (not to be confused with EREs³)â´:
    >
    > \((([^()]*|(?R))*)\)
    >
    > However, even Python 3.2 does not support those expressions (although it
    > supports some other PCRE patterns, like named subexpressions)âµ, neither do
    > standard and forked versions of sed(1) (BREs, EREs, using an NFA) nor awk
    > (EREs, using a DFA or NFA). [That is not to say it would not be possible
    > with Python, or sed or awk (both of which are off-topic here), but that
    > more than a Regular Expression would be required.]


    Supplemental: Further research shows that the Python `re' module is not
    going to implement (PCRE) recursive Regular Expressions. The maintainer's,
    Matthew Barnett's, argument (of 2009-03-24) is that such things are better
    left to modules such as `pyparsing' [1][2].

    However, FWIW, here is the Python port of the start of a language parser
    originally written in (and for) ECMAScript:

    import re

    def matchingBraces(s):
    level = 0

    for match in re.finditer(r'[{}]', s):
    paren = match.group(0)

    if paren == "{":
    level += 1
    else:
    if level == 0: return False
    level -= 1

    return level == 0

    As you can see, the theoretically necessary PDA¹ implementation can be
    simplified to a braces counter with range checks by iterative use of a
    Regular Expression. Extensions to meet the "challenge" are left as an
    exercise to the reader.

    It has also occurred to me that because parentheses (`(', `)') and brackets
    (`[', `]') have special meaning in Regular Expressions (grouping and
    character classes, respectively), you could escape all other special
    characters in a text and use the RE evaluator itself to find out whether
    they are properly nested, having it evaluate one large RE. But I have not
    tested this idea yet. (Obviously it cannot be used to satisfy the
    "challenge"'s condition that braces – `{', `}' – and other parenthesis-like
    characters are to be considered as well, and that the parenthesis-like
    characters are to be printed.)

    ______
    ¹ Pushdown automaton

    References:
    [1] <http://bugs.python.org/issue694374>
    [2] <http://pyparsing.wikispaces.com/>
    --
    PointedEars

    Bitte keine Kopien per E-Mail. / Please do not Cc: me.
     
    Thomas 'PointedEars' Lahn, Jul 17, 2011
    #17
  18. Xah Lee

    rantingrick Guest

    On Jul 17, 2:47 am, Xah Lee <> wrote:
    > 2011-07-16
    >
    > folks, this one will be interesting one.
    >
    > the problem is to write a script that can check a dir of text files
    > (and all subdirs) and reports if a file has any mismatched matching
    > brackets.
    >
    >[...]
    >
    > • You script must be standalone. Must not be using some parser tools.
    > But can call lib that's part of standard distribution in your lang.


    I stopped reading here and did...

    >>> from HyperParser import HyperParser # python2.x


    ....and called it a day. ;-) This module is part of the stdlib (idlelib
    \HyperParser) so as per your statement it is legal (may not be the
    fastest solution).
     
    rantingrick, Jul 18, 2011
    #18
  19. I don't know why, but I just had to try it (even though I don't usually
    use Perl and had to look up a lot of stuff). I came up with this:

    /(?|
    (\()(?&matched)([\}\]â€â€ºÂ»ã€‘〉》ã€ã€]|$) |
    (\{)(?&matched)([\)\]â€â€ºÂ»ã€‘〉》ã€ã€]|$) |
    (\[)(?&matched)([\)\}â€â€ºÂ»ã€‘〉》ã€ã€]|$) |
    (“)(?&matched)([\)\}\]›»】〉》ã€ã€]|$) |
    (‹)(?&matched)([\)\}\]â€Â»ã€‘〉》ã€ã€]|$) |
    («)(?&matched)([\)\}\]â€â€ºã€‘〉》ã€ã€]|$) |
    (ã€)(?&matched)([\)\}\]â€â€ºÂ»ã€‰ã€‹ã€ã€]|$) |
    (〈)(?&matched)([\)\}\]â€â€ºÂ»ã€‘》ã€ã€]|$) |
    (《)(?&matched)([\)\}\]â€â€ºÂ»ã€‘〉ã€ã€]|$) |
    (「)(?&matched)([\)\}\]â€â€ºÂ»ã€‘〉》ã€]|$) |
    (『)(?&matched)([\)\}\]â€â€ºÂ»ã€‘〉》ã€]|$))
    (?(DEFINE)(?<matched>(?:
    \((?&matched)\) |
    \{(?&matched)\} |
    \[(?&matched)\] |
    “(?&matched)†|
    ‹(?&matched)› |
    «(?&matched)» |
    ã€(?&matched)】 |
    〈(?&matched)〉 |
    《(?&matched)》 |
    「(?&matched)〠|
    『(?&matched)〠|
    [^\(\{\[“‹«ã€ã€ˆã€Šã€Œã€Ž\)\}\]â€â€ºÂ»ã€‘〉》ã€ã€]++)*+))
    /sx;

    If the pattern matches, there is a mismatched bracket. $1 is set to the
    mismatched opening bracket. $-[1] is its location. $2 is the mismatched
    closing bracket or '' if the bracket was never closed. $-[2] is set to
    the location of the closing bracket or the end of the string if the
    bracket wasn't closed.


    I didn't write all that manually; it was generated with this:

    my @open = ('\(','\{','\[','“','‹','«','ã€','〈','《','「','『');
    my @close = ('\)','\}','\]','â€','›','»','】','〉','》','ã€','ã€');

    '(?|'.join('|',map
    {'('.$open[$_].')(?&matched)(['.join('',@close[0..($_-1),($_+1)..$#close]).']|$)'}
    (0 .. $#open)).')(?(DEFINE)(?<matched>(?:'.join('|',map
    {$open[$_].'(?&matched)'.$close[$_]} (0 ..
    $#open)).'|[^'.join('',@open,@close).']++)*+))'
     
    Rouslan Korneychuk, Jul 18, 2011
    #19
  20. Rouslan Korneychuk, 18.07.2011 09:09:
    > I don't know why, but I just had to try it (even though I don't usually use
    > Perl and had to look up a lot of stuff). I came up with this:
    >
    > /(?|
    > (\()(?&matched)([\}\]â€â€ºÂ»ã€‘〉》ã€ã€]|$) |
    > (\{)(?&matched)([\)\]â€â€ºÂ»ã€‘〉》ã€ã€]|$) |
    > (\[)(?&matched)([\)\}â€â€ºÂ»ã€‘〉》ã€ã€]|$) |
    > (“)(?&matched)([\)\}\]›»】〉》ã€ã€]|$) |
    > (‹)(?&matched)([\)\}\]â€Â»ã€‘〉》ã€ã€]|$) |
    > («)(?&matched)([\)\}\]â€â€ºã€‘〉》ã€ã€]|$) |
    > (ã€)(?&matched)([\)\}\]â€â€ºÂ»ã€‰ã€‹ã€ã€]|$) |
    > (〈)(?&matched)([\)\}\]â€â€ºÂ»ã€‘》ã€ã€]|$) |
    > (《)(?&matched)([\)\}\]â€â€ºÂ»ã€‘〉ã€ã€]|$) |
    > (「)(?&matched)([\)\}\]â€â€ºÂ»ã€‘〉》ã€]|$) |
    > (『)(?&matched)([\)\}\]â€â€ºÂ»ã€‘〉》ã€]|$))
    > (?(DEFINE)(?<matched>(?:
    > \((?&matched)\) |
    > \{(?&matched)\} |
    > \[(?&matched)\] |
    > “(?&matched)†|
    > ‹(?&matched)› |
    > «(?&matched)» |
    > ã€(?&matched)】 |
    > 〈(?&matched)〉 |
    > 《(?&matched)》 |
    > 「(?&matched)〠|
    > 『(?&matched)〠|
    > [^\(\{\[“‹«ã€ã€ˆã€Šã€Œã€Ž\)\}\]â€â€ºÂ»ã€‘〉》ã€ã€]++)*+))
    > /sx;
    >
    > If the pattern matches, there is a mismatched bracket. $1 is set to the
    > mismatched opening bracket. $-[1] is its location. $2 is the mismatched
    > closing bracket or '' if the bracket was never closed. $-[2] is set to the
    > location of the closing bracket or the end of the string if the bracket
    > wasn't closed.
    >
    >
    > I didn't write all that manually; it was generated with this:
    >
    > my @open = ('\(','\{','\[','“','‹','«','ã€','〈','《','「','『');
    > my @close = ('\)','\}','\]','â€','›','»','】','〉','》','ã€','ã€');
    >
    > '(?|'.join('|',map
    > {'('.$open[$_].')(?&matched)(['.join('',@close[0..($_-1),($_+1)..$#close]).']|$)'}
    > (0 .. $#open)).')(?(DEFINE)(?<matched>(?:'.join('|',map
    > {$open[$_].'(?&matched)'.$close[$_]} (0 ..
    > $#open)).'|[^'.join('',@open,@close).']++)*+))'



    That's solid Perl. Both the code generator and the generated code are
    unreadable. Well done!

    Stefan
     
    Stefan Behnel, Jul 18, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ThaDoctor
    Replies:
    3
    Views:
    415
    Alan Woodland
    Sep 28, 2007
  2. henon

    [Q] little regexp challenge

    henon, Oct 16, 2003, in forum: Ruby
    Replies:
    5
    Views:
    137
    Robert Klemme
    Oct 17, 2003
  3. Intransition
    Replies:
    21
    Views:
    641
    Josh Cheek
    Jun 11, 2011
  4. Xah Lee
    Replies:
    21
    Views:
    863
    Robert Klemme
    Jul 22, 2011
  5. Xah Lee

    a little parsing challenge ☺

    Xah Lee, Jul 17, 2011, in forum: Perl Misc
    Replies:
    29
    Views:
    344
    Rouslan Korneychuk
    Jul 21, 2011
Loading...

Share This Page