Is C an unsuitable choice as a string parser?

Discussion in 'C Programming' started by Gallagher Polyn, Dec 13, 2013.

  1. What he mean is that whilst we can use escapes to match an expression of
    the form ( text ), we can't define a regular expression to match

    expression = term + or - term
    term = number or ( expression )
    number = list of digits starting with non-zero
    Malcolm McLean, Dec 16, 2013
    1. Advertisements

  2. Gallagher Polyn

    John Gordon Guest

    I assume he was referring to RE's general inability to grok nesting, not
    that RE's literally cannot match a ']' character.
    John Gordon, Dec 16, 2013
    1. Advertisements

  3. Gallagher Polyn

    Ken Brody Guest

    On 12/16/2013 12:19 PM, Malcolm McLean wrote:
    So zero isn't a number? ;-)

    (Yes, I know it was a quick-and-dirty example, and not a "real life"
    Ken Brody, Dec 16, 2013
  4. Gallagher Polyn

    James Kuyper Guest

    That's a more complicated statement than your original; I wouldn't argue
    with the more complicated version - but the original version overstated
    your case.
    James Kuyper, Dec 16, 2013
  5. I'm a bit puzzled. Are you saying that this (the original statement):

    | And RE's can't even match brackets.
    | Isn't the first non-trivial grammer you learn to parse the well-nested
    | brackets one?

    is significantly different to the latter one? I can't see much in it
    Ben Bacarisse, Dec 16, 2013
  6. Gallagher Polyn

    James Kuyper Guest

    Yes, the original says simply, without qualification, that RE's can't
    match brackets. I'm quite familiar with ways of using RE's to match
    brackets - I use vi REs quite frequently for precisely that purpose
    while editing C source code - so I know that's wrong.

    The more complicated version specifies parsing a language that uses
    well-nested brackets. That's a much more complicated issue than simply
    matching brackets. I'm sure that REs could be used as part of such a
    parsing process: the system I'm most familiar with for parsing strings
    are lex/yacc and flex/bison. I'm not sure of the significance of Phil's
    usage of the phrase "well-nested", but I'm quite sure that either of
    those systems can deal quite easily with nested brackets. lex and flex
    are both built upon REs. However, REs play only a small roll in the
    entire process, and I wouldn't dare to claim they could do the entire job.
    James Kuyper, Dec 16, 2013
  7. Gallagher Polyn

    Siri Cruz Guest

    YACC and Lex are string parsers, written C and generating C code. I have also
    written a GHR parser (an enhancement of CKY) in Algol 60 and C, and that can
    parse with any type 2 grammar. So, yes, you can do it. I don't know offhand of a
    type 1 grammar parser, and I don't know if there would a polynomial bounds
    guarantee; a type 0 grammar parser would be problematic since there's no halting
    With a garbage collector such as the Boehm GC, you can easily create a string
    library to your heart's delight. Boehm GC even supplies one for you. Apple's
    LLVM Objective-C is supposed to automate reference counts for effective garbage
    collection; I don't know if it handles circular lists or can be used for
    straight C.

    Any programming language can implement a Turing Machine has the same power as
    any other. It's not what you can do, but what is the easiest notation.
    Siri Cruz, Dec 16, 2013
  8. Gallagher Polyn

    Siri Cruz Guest

    REs can't recognise Dyck sets.

    Simple enough?
    Siri Cruz, Dec 16, 2013
  9. You didn't take the second sentence as qualifying the first? You
    thought Phil Carmody (with whose posts you can't be entirely unfamiliar)
    was stating that (a) REs can't match the bracket /characters/ and, quite
    independently with no connection or expository purpose, (b) speculated
    on what simple grammars one learns? That seems so very odd that I feel
    I must have misunderstood you.
    Ben Bacarisse, Dec 16, 2013
  10. Gallagher Polyn

    Siri Cruz Guest

    Well, yeah, that's already known. Any context free (type 2) language can be
    described as the appropriate mapping of the union of a regular (type 3) language
    and a Dyck set.
    Siri Cruz, Dec 16, 2013
  11. At the risk of complicating matters, some RE systems and/or libraries
    *can* match languages with well-nested brackets. Famously, recent
    versions of Perl's "regular expressions" can do this, and therefore so
    can systems that include libpcre like PHP. Of course, it's the name
    that's wrong -- such patterns no longer match regular languages -- but
    there is far too much history to change the name.
    Ben Bacarisse, Dec 16, 2013
  12. Gallagher Polyn

    Siri Cruz Guest

    Depending on what's allowed you can get away with a DPDA + one integer counter
    without needing a full stack.
    Siri Cruz, Dec 16, 2013
  13. Gallagher Polyn

    James Kuyper Guest

    No, I took the second sentence as giving, as an example, a context where
    the claim made by the first sentence causes a problem. That claim, if
    true, would certainly cause problems in that context.

    I know of several non-trivial grammars supporting nesting brackets, but
    I have no idea which particular grammar he's thinking of. My formal
    education was in theoretical physics; computer programming was
    originally simply a supporting skill that somehow ended up becoming my
    career. Therefore, I never took the kind of course he was referring to.

    If the second sentence was meant to qualify the previous one, it should
    have contained phrases identifying the way in which it qualified it.
    However, I can't figure out how to modify that pair of sentences by
    inserting such phrases, to mean what he meant by those sentences. It's
    far easier to do a complete re-write:

    "Grammars supporting well-nested brackets (such as the very first
    non-trivial one you learned to parse) cannot be parsed entirely by the
    use of RE's."

    If that's not an accurate re-wording of his statement, that just proves
    I still don't understand it.
    James Kuyper, Dec 16, 2013
  14. JK> Yes, the original says simply, without qualification, that RE's
    JK> can't match brackets. I'm quite familiar with ways of using RE's
    JK> to match brackets - I use vi REs quite frequently for precisely
    JK> that purpose while editing C source code - so I know that's
    JK> wrong.

    vi's "regular expressions" are more powerful than the mathematical
    formalism "regular expressions."

    Charlton Wilbur, Dec 17, 2013
  15. I believe it is, though of course I can't speak for him.

    One possible interpretation is that "can't match brackets" means
    that, given something like "(foo (bar) baz)", a regular expression
    can't *match* the '(' at the beginning of the string to the ')'
    at the end of it.

    In any case, I hope we can all agree that the horse we're beating
    is now well and truly dead.
    Keith Thompson, Dec 17, 2013
  16. OK, to get back to C, why are the arguments to preprocessor macros
    required to be balanced, but not the replacements?

    #define left (
    #define right )
    #define lb {
    #define rb }
    #define semi ;

    #include <stdio.h>

    int main left right lb semi
    printf left "hi there!" right semi rb

    Too much fun with the preprocessor.

    (I didn't try writing macros with unbalanced () in the arguments.)

    -- glen
    glen herrmannsfeldt, Dec 17, 2013
  17. Arguments are required to be balanced only for macros that take
    arguments (i.e., function-like macros). Without that requirement,
    it would be impossible to detect the end of an invocation.

    There are probably ways to work around that limitation, but I don't
    think it would be worth the effort. You could pass a ( or ) token as a
    macro argument, which is currently not possible, but personally I've
    never felt the need to do that.
    Keith Thompson, Dec 17, 2013
  18. Gallagher Polyn

    Phil Carmody Guest

    I sometimes rely on people being able to read my mind, and perhaps filling
    in gaps, or, worse, having to extrapolate. Apologies if there were ambiguities.

    Phil Carmody, Dec 17, 2013
  19. Gallagher Polyn

    James Kuyper Guest

    I now realize that you were probably talking about matching
    corresponding brackets to each other, rather than writing an RE that
    matches a single bracket. "match brackets" could mean either of those
    James Kuyper, Dec 17, 2013
  20. Yes, it was a simple misunderstanding, and obvious to anyone who knows
    that regular expressions aren't the same as formal grammars. Which you
    wouldn't know unless you'd either been told or tried to build an RE
    expression parser. So all it needed was a correction to clear it up.
    I'm amazed at some of the posts.
    Malcolm McLean, Dec 17, 2013
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.