announcing RubyLexer 0.6.0

Discussion in 'Ruby' started by vikkous, Apr 23, 2005.

  1. vikkous

    vikkous Guest

    At this time, I am pleased to announce the release of RubyLexer 0.6.0,
    a standalone lexer of ruby in ruby. RubyLexer attempts to completely
    and correctly tokenize all valid ruby 1.8 source code, and it mostly
    succeeds. In time, RubyLexer will be able to lex all ruby code. For
    now, some newer features are unsupported and there are some extremely
    obscure bugs involving strings, but all real world ruby code should be
    supported. It is my hope to provide a high-quality lexer for all those
    language tools which require one.

    RubyLexer is hosted on RubyForge
    (http://rubyforge.org/projects/rubylexer/).
    Here's where to get the tarball:
    http://rubyforge.org/frs/download.php/4191/rubylexer-0.6.0.tar.bz2
    vikkous, Apr 23, 2005
    #1
    1. Advertising

  2. vikkous

    Trans Guest

    Hi,

    could you describe Ruby lexer a bit more. I know very little about
    lexers, so excuse if I ask dumb questions, but... What's the output
    look like? How does it compare to other projects like ParseTree? Do you
    have any plans for its use?

    Thanks,
    T.
    Trans, Apr 23, 2005
    #2
    1. Advertising

  3. vikkous wrote:

    > At this time, I am pleased to announce the release of RubyLexer 0.6.0,
    > a standalone lexer of ruby in ruby. RubyLexer attempts to completely
    > and correctly tokenize all valid ruby 1.8 source code, and it mostly
    > succeeds.


    How extendable is this? Would you be able to add new rules to it add
    run-time? If it is like that then it could be used for writing Ruby
    source code filters which is something that is useful for exploring new
    syntax.

    I can also contribute a few pieces of code that I think that are hard to
    lex properly if you are interested.
    Florian Groß, Apr 23, 2005
    #3
  4. vikkous

    Peter Suk Guest

    On Apr 23, 2005, at 10:54 AM, vikkous wrote:

    > At this time, I am pleased to announce the release of RubyLexer 0.6.0,


    YeeHaaa!! ThankYouThankYou!

    --
    There's neither heaven nor hell, save what we grant ourselves.
    There's neither fairness nor justice, save what we grant each other.
    Peter Suk, Apr 23, 2005
    #4
  5. vikkous

    vikkous Guest

    A lexer, or tokenizer (they mean the same thing) divides an input
    source language into words. It also removes comments and finds the
    boundaries of strings. Once this is done, it's much easier to correctly
    process the language in a pre-processor or parser. Here's an example.
    Given this ruby code:

    8+(9 *5)

    a correct lexing is something like:

    ["8","+","(","*","5",")"]

    (For lexing purposes, punctuation and operators count as strings as
    well.)

    The ouput of RubyLexer is actually more complicated than that... for
    one thing, there are tokens for whitespace as well. for another, the
    individual tokens are not Strings, but Tokens (or subclasses of it, to
    be precise), a class defined in RubyLexer. Tokens to respond to to_s in
    the expected way, however. (Initially, I did want to have RubyLexer
    just return Strings, but it turned out I needed to distinguish
    different token types, and the best way to do that is with the type
    system.)

    ParseTree is a parser, not a lexer. Parsing is the next step in a
    compiler pipeline; it determines what order to evaluate to operations
    in an expression and solves the difficult problems of precedence and
    associativity. (Another way to think of parsers is as the bit that
    figures out where the implicit parentheses are inserted into the source
    code.) I think that the tool corresponding to RubyLexer is Ripper, but
    I don't really know, so don't blame me if I'm wrong.

    I have lots of plans, of course, but being only one little programmer
    with lots of big ideas, who knows if I'll ever get to them...
    vikkous, Apr 23, 2005
    #5
  6. vikkous

    vikkous Guest

    > How extendable is this? Would you be able to add new rules to it
    > add run-time?


    Ummm... if you're really lucky, maybe. I didn't really have
    extensibility in mind. It might be possible to add it, without a lot
    of trouble, depending on what you want to extend. So, what do you want
    to extend?

    > If it is like that then it could be used for writing Ruby
    > source code filters which is something that is useful for exploring
    > new syntax.


    One of the applications I had in mind was to create a lexer family for
    ruby-like languages, but that has sort of fallen by the wayside right
    now. I still like the idea, but other priorities press at the moment.

    > I can also contribute a few pieces of code that I think that are hard


    > to lex properly if you are interested.


    Oh! That would be lovely. Weird syntax, obscure syntax, new syntax,
    twisted, devious, mutant syntax, I want it all for my menagerie.
    vikkous, Apr 23, 2005
    #6
  7. vikkous

    vikkous Guest

    Peter Suk wrote:
    > YeeHaaa!! ThankYouThankYou!


    You're welcome. It's nice to be appredciated.
    vikkous, Apr 23, 2005
    #7
  8. vikkous

    Hal Fulton Guest

    vikkous wrote:
    >>I can also contribute a few pieces of code that I think that are hard
    >>to lex properly if you are interested.

    >
    > Oh! That would be lovely. Weird syntax, obscure syntax, new syntax,
    > twisted, devious, mutant syntax, I want it all for my menagerie.


    Ha... I'll see if I can dig up anything.

    In the meantime, one of my favorites is an expression containing a
    string that contains an interpolated expression that contains a
    string containing another interpolated expression:

    x = "Hi, my name is #{"Slim #{rand(4)>2?"Whitman":"Shady"}"}."


    Hal
    Hal Fulton, Apr 23, 2005
    #8
  9. vikkous ha scritto:
    > At this time, I am pleased to announce the release of RubyLexer 0.6.0,
    > a standalone lexer of ruby in ruby. RubyLexer attempts to completely
    > and correctly tokenize all valid ruby 1.8 source code, and it mostly
    > succeeds. In time, RubyLexer will be able to lex all ruby code. For
    > now, some newer features are unsupported and there are some extremely
    > obscure bugs involving strings, but all real world ruby code should be
    > supported. It is my hope to provide a high-quality lexer for all those
    > language tools which require one.
    >
    > RubyLexer is hosted on RubyForge
    > (http://rubyforge.org/projects/rubylexer/).
    > Here's where to get the tarball:
    > http://rubyforge.org/frs/download.php/4191/rubylexer-0.6.0.tar.bz2
    >


    first let me say I think this is cool :)
    Anyway, I wonder: isn't something like this included with ruby (irb's
    lexer) ?
    Care to explain the differences a little?
    gabriele renzi, Apr 24, 2005
    #9
  10. --------------050309060604080800060103
    Content-Type: text/plain; charset=ISO-8859-1; format=flowed
    Content-Transfer-Encoding: 7bit

    vikkous wrote:

    >>How extendable is this? Would you be able to add new rules to it
    >>add run-time?

    >
    > Ummm... if you're really lucky, maybe. I didn't really have
    > extensibility in mind. It might be possible to add it, without a lot
    > of trouble, depending on what you want to extend. So, what do you want
    > to extend?


    One simple example would be adding a ".=" assign-result-of-method-call
    operator as in "foo = 'bar'; foo .= reverse"

    > Oh! That would be lovely. Weird syntax, obscure syntax, new syntax,
    > twisted, devious, mutant syntax, I want it all for my menagerie.


    See attachment.

    --------------050309060604080800060103
    Content-Type: application/x-ruby;
    name="pre.rb"
    Content-Transfer-Encoding: base64
    Content-Disposition: inline;
    filename="pre.rb"

    ZXZhbCBcDQpldmFsKFwNCiNwdXRzKFwNCiMhaWYgZGVmaW5lZD8oY2xhc3MgRU5WOjpGUklF
    TkRMWTsgZW5kKQ0KJXskXyAgID0NCiAgJSUlDQojeygNCiAoKg0KKT0qJDwNCiApLm1hcCB7
    fCRffA0KfiVyIF4jISBtZXNzaW5lc3M/IHN1YiggJXIgI3sNCiUuXi4uLiUgKC4rKSB9ezF9
    IG5vbnNlbnNlLCA8PCcgPj4nLg0KICBcMQ0KID4+DQpkZWxldGUoICUlJSA8PCc+PiAgDQop
    OicNCikgKSA6ICggKA0KWyBbICMgXSBdDQogIHN1YiggJXINCl4gI3sgICAlcQ0KKA0KLi4g
    JQkqXFxzKgkNCn0gICQpDQp4ZW5vbikgeyBAXyA9ICQxLmluc3BlY3QNCiUoICRfIDw8ICNA
    XyA8PCAlIyMjI0BfDQogKX0JXQ0KXVsgMCBdKQ0KICApfX0NCiRffSkNCl9fRU5EX18NCiMh
    ZW5kDQolKCUocHV0cyAlKEV4ZWN1dGluZyBSdWJ5IGNvZGUuLi4pKSkpDQo=
    --------------050309060604080800060103--
    Florian Groß, Apr 24, 2005
    #10
  11. vikkous

    vikkous Guest

    > first let me say I think this is cool :)
    > Anyway, I wonder: isn't something like this included with ruby (irb's


    > lexer) ?
    > Care to explain the differences a little?


    Irb's lexer is not as complete. I can't think of any examples, but when
    developing this, I played around with irb quite a bit, trying different
    syntaces. Irb would do pretty good most of the time, but every so
    often, I'd come up with something that had to be wrapped in eval %() in
    order to work in irb...
    vikkous, Apr 24, 2005
    #11
  12. vikkous

    vikkous Guest

    > "Hi, my name is #{"Slim #{rand(4)>2?"Whitman":"Shady"} "}."

    Yes, this is the type of thing I'm thinking of! Stretch the language!
    Bend it to the breaking point! <Sound of whip cracking>. But you're not
    being deviant enough; you didn't break my lexer yet (tho you can never
    be too sure with these string interpolations).

    Here's how tricky you have to be to fool it:

    p "#{<<kekerz}#{"foob"
    zimpler
    kekerz
    }"

    Here document header and body in different interpolations... tricky.
    vikkous, Apr 24, 2005
    #12
  13. vikkous

    Peter Suk Guest

    On Apr 23, 2005, at 9:44 PM, vikkous wrote:

    >> first let me say I think this is cool :)
    >> Anyway, I wonder: isn't something like this included with ruby (irb's

    >
    >> lexer) ?
    >> Care to explain the differences a little?

    >
    > Irb's lexer is not as complete. I can't think of any examples, but when
    > developing this, I played around with irb quite a bit, trying different
    > syntaces. Irb would do pretty good most of the time, but every so
    > often, I'd come up with something that had to be wrapped in eval %() in
    > order to work in irb...


    Examples?

    --
    There's neither heaven nor hell, save what we grant ourselves.
    There's neither fairness nor justice, save what we grant each other.
    Peter Suk, Apr 24, 2005
    #13
  14. vikkous

    vikkous Guest

    Florian Groß wrote:
    > One simple example would be adding a ".="

    assign-result-of-method-call
    > operator as in "foo = 'bar'; foo .= reverse"


    At first, I thought, "This guy is dreaming; my code is just too rigid
    to allow extensions of that kind very easily.". But of course, it
    wouldn't be too hard for me to special case this one operator in for
    you if you wanted to... it'd just be a quick hack in RubyLexer#dot...
    in fact, it could be done in a subclass:
    [warning: untested code!]

    class FlorianRubyLexer < RubyLexer
    def dot(ch)
    #this is the routine in RubyLexer that handles tokens beginning
    with '.'
    if readahead(2)=='.='
    KeywordToken.new(@file.read(2),@file.pos-2)
    else
    super
    end
    end
    end

    Not too bad for extensibility, eh? I think things look quite hopeful
    for your idea, actually....
    you do have to know RubyLexer internals to do this kind of thing, but
    that's true for any library. And you probably want to add operators
    that create no new ambiguities in the language. This one doesn't create
    ambiguity, a sign that you've been thinking about this already. Tell me
    more of the kind of thing you want, and maybe I'll write more of your
    lexer for you.

    > See attachment.
    > « pre.rb »


    Now that's deviant! Whitespace as a fancy string delimiter... I don't
    even know if that's what breaks RubyLexer, but that's sick, man, really
    sick.

    Ps: what does the code do?
    vikkous, Apr 24, 2005
    #14
  15. vikkous

    vikkous Guest

    > Examples?

    I should have written them down, but I didn't. Next time I come across
    one, I'll let you know. Some no doubt got into testdata/p.rb (in
    rubylexer).
    vikkous, Apr 24, 2005
    #15
  16. vikkous ha scritto:
    >>first let me say I think this is cool :)
    >>Anyway, I wonder: isn't something like this included with ruby (irb's

    >
    >
    >>lexer) ?
    >>Care to explain the differences a little?

    >
    >
    > Irb's lexer is not as complete. I can't think of any examples, but when
    > developing this, I played around with irb quite a bit, trying different
    > syntaces. Irb would do pretty good most of the time, but every so
    > often, I'd come up with something that had to be wrapped in eval %() in
    > order to work in irb...
    >


    this is what I expected, I just think you should made it clear to casual
    users :)
    gabriele renzi, Apr 24, 2005
    #16
  17. vikkous wrote:

    >>One simple example would be adding a ".=" assign-result-of-method-call
    >>operator as in "foo = 'bar'; foo .= reverse"

    >
    > At first, I thought, "This guy is dreaming; my code is just too rigid
    > to allow extensions of that kind very easily.". But of course, it
    > wouldn't be too hard for me to special case this one operator in for
    > you if you wanted to... it'd just be a quick hack in RubyLexer#dot...
    > in fact, it could be done in a subclass:
    > [warning: untested code!]
    >
    > class FlorianRubyLexer < RubyLexer
    > def dot(ch)
    > #this is the routine in RubyLexer that handles tokens beginning
    > with '.'
    > if readahead(2)=='.='
    > KeywordToken.new(@file.read(2),@file.pos-2)
    > else
    > super
    > end
    > end
    > end


    Which is exactly what I thought would be a good way of extending. This
    looks good.

    Another thing that I would be able to make good use of is getting the
    next expression, whatever it might be.

    Let's say I have this code:

    z = if (x + y) * 2 > 2 then
    code here
    end

    It would then be very nice if I could lex until I see the 'if' then say
    'give me an atomic expression' which would parse until the 'then' and
    then say 'give me an atomic expression' again which would parse until
    the 'end'. Basically I don't want to match paired things (parentheses,
    do .. end, class definitions etc.) at the transformation level.

    Yup, that sample does not introduce any new syntax -- I would like to
    transform it to this:

    z = if ((x + y) * 2 > 2).true? then
    code here
    end

    Which is why I would need to find a sub-expression.

    Also note that just grabbing everything until the next 'then' would not
    be good enough:

    # Nonsense code, but still valid
    if x > if x < 5 then 3 else 2 end then
    puts "Good!"
    end

    If it weren't for that point then IRB's lexer would be a more or less
    nifty match already.

    Does this sound like something that can be done without too much trouble?

    For doing code transformations it is of course also important that you
    can turn back the stream of tokens into a String easily. I did this with
    IRB's lexer by using the .line_no and .pos methods of tokens, but that
    was not too good a match, actually.

    >>See attachment.
    >>« pre.rb »

    >
    > Now that's deviant! Whitespace as a fancy string delimiter... I don't
    > even know if that's what breaks RubyLexer, but that's sick, man, really
    > sick.


    Oh, that is still relatively simple. There's worse stuff happening under
    the surface.

    > Ps: what does the code do?


    If you invoke it as ruby -rpre file.rb it will pre-process file.rb
    before letting Ruby handle it. It parses simple directives that look
    like this:

    #!if rand > 0.5 then
    }{}{ # Cause a Syntax Error
    #!else
    puts "Hello World"
    #!end

    That file would produce a Syntax Error at parse-time half of the time
    and output Hello World in the other cases.

    > "Hello"
    > 1+5
    > Time.now

    #!gsub!(/^>/, "puts")

    And that would make '>' at the beginning of a line mean 'output this: '.

    It's basically something like the C preprocessor, but in a more Rubyish
    manner written in obscure style. I guess it is pretty useless after all.
    Florian Groß, Apr 24, 2005
    #17
  18. vikkous

    vikkous Guest

    Florian Groß ha scritto:
    > Which is exactly what I thought would be a good way of
    > extending. This looks good.


    Everything may not be as simple as this one case was. The fact that the
    first example you gave turned out to be pretty easy is encouraging, but
    I think we're likely to run into something really nasty before you are
    happy.

    > It would then be very nice if I could lex until I see the 'if' then
    > say 'give me an atomic expression' which would parse until
    > the 'then' and then say 'give me an atomic expression' again
    > which would parse until the 'end'. Basically I don't want to
    > match paired things (parentheses, do .. end, class definitions
    > etc.) at the transformation level.


    In general, 'get the next expression' is a problem that requires a
    parser, not a lexer. Have you looked at ParseTree? Of course you have.

    In this case however, you are in luck. Delimited expressions, that
    start and end with ( and ), or begin and end, or whatever, are already
    discovered by my lexer. (During the development of RubyLexer, I
    discovered that it had to be half-a-parser as well, in order to
    correctly get all the information that's needed to lex correctly.) The
    information you want is already being gathered by RubyLexer, it's just
    not available in a public interface. We should negotiate such an
    interface since you seem to need it. What you propose, 'get the next
    expression', is not one I want to do. RubyLexer does not deal in
    abstractions larger than tokens... at least, not on a public level. I
    am, however, willing to emit 'advisory' tokens at certain points in the
    token stream, (several such types of tokens are being emitted already)
    which should allow you to do what we want, if we design it carefully.

    On the other hand.... the reason I chose not to emit advisory tokens
    for this particular case is that the complimentary tool to RubyLexer is
    intended to be Reg, which can find nested pairs of braces and the like
    pretty easily. Have you looked at Reg at all? I realize that I only
    released it yesterday, and as of yet it's only half-working because
    critical features are as yet unimplemented, but I think it might be
    just the thing for the types of preprocessors you have in mind.

    Reg might not be able to easily tell 'if' the postfix operator from
    'if' the value in current RubyLexer output. Since one requires an end
    and the other doesn't, that can be troublesome to deal with. 'do' is
    also a pain, now that I think of it. All these cases are handled
    correctly in RubyLexer, we just have to find an appropriate
    (token-based, not expression-based) interface.

    > Also note that just grabbing everything until the next 'then' would
    > not be good enough:
    >
    > # Nonsense code, but still valid
    > if x > if x < 5 then 3 else 2 end then
    > puts "Good!"
    > end


    Don't worry about this type of thing. I have these problems well under
    control, one way or another.

    > Does this sound like something that can be done without
    > too much trouble?


    Definitely!

    > For doing code transformations it is of course also important that
    > you can turn back the stream of tokens into a String easily. I did
    > this with IRB's lexer by using the .line_no and .pos methods of
    > tokens, but that was not too good a match, actually.


    So what would be a good match? I don't see why this should be a
    problem. My implementation of Token implements to_s, which returns the
    ruby code corresponding to the token; ususally, this is exactly the
    same as the code that created the token originally. There's also a
    offset method, which returns the position of the token in the input
    stream, relative to the very beginning. Tokens don't have a #line_no,
    but you can get the same information from FileAndLineTokens.

    Turning the token stream back into a big string (or file) is esentially
    what one of my test programs (tokentest) does. The resulting ruby files
    are legal and parse in exactly the same way. I haven't yet shown that
    they are really exactly equivalent (but there's not much room for
    variation); that will be the next RubyLexer release.

    > If it weren't for that point then IRB's lexer would be a more or
    > less nifty match already.


    > I did this with IRB's lexer by using the .line_no and .pos
    > methods of tokens, but that was not too good a match, actually.


    Wait,,,, so you wrote irb's lexer? One of my wishlist items is to
    integrate RubyLexer with irb among others.... how hard do you think
    this will be?

    > Oh, that is still relatively simple. There's worse stuff happening
    > under the surface.


    Well, it was unexpected for me. Much to my embarassment; I thought I
    was an expert at this. I must say many elements of this got me very
    confused at first, and obviously I never put all the pieces together.
    Congratulations.

    Ps: I haven't figured out why this breaks RubyLexer yet, but I will.

    Pps: putting tricky stuff in eval strings and the like won't break the
    lexer (yet). To the lexer, it's just a string.

    > It's basically something like the C preprocessor, but in a more
    > Rubyish manner written in obscure style. I guess it is pretty
    > useless after all.


    Not at all. Now that I know what it does, maybe I'll find a use for it,
    someday.
    vikkous, Apr 25, 2005
    #18
  19. vikkous wrote:

    >>Which is exactly what I thought would be a good way of
    >>extending. This looks good.

    >
    > Everything may not be as simple as this one case was. The fact that the
    > first example you gave turned out to be pretty easy is encouraging, but
    > I think we're likely to run into something really nasty before you are
    > happy.


    Hm, that ought to be not too much of a problem. I'm okay with having a
    look at some of the internals for that kind of things.

    >>It would then be very nice if I could lex until I see the 'if' then
    >>say 'give me an atomic expression' which would parse until
    >>the 'then' and then say 'give me an atomic expression' again
    >>which would parse until the 'end'. Basically I don't want to
    >>match paired things (parentheses, do .. end, class definitions
    >>etc.) at the transformation level.

    >
    > In general, 'get the next expression' is a problem that requires a
    > parser, not a lexer. Have you looked at ParseTree? Of course you have.
    >
    > In this case however, you are in luck. Delimited expressions, that
    > start and end with ( and ), or begin and end, or whatever, are already
    > discovered by my lexer. (During the development of RubyLexer, I
    > discovered that it had to be half-a-parser as well, in order to
    > correctly get all the information that's needed to lex correctly.) The
    > information you want is already being gathered by RubyLexer, it's just
    > not available in a public interface. We should negotiate such an
    > interface since you seem to need it. What you propose, 'get the next
    > expression', is not one I want to do. RubyLexer does not deal in
    > abstractions larger than tokens... at least, not on a public level. I
    > am, however, willing to emit 'advisory' tokens at certain points in the
    > token stream, (several such types of tokens are being emitted already)
    > which should allow you to do what we want, if we design it carefully.


    Hm, I am not sure if that is enough for this case. The condition part of
    a if or something else will after all not always be surrounded by ( and
    ) or begin and end or something similar.

    Advisory tokens (which would tell me that I am now entering the
    condition of if and now leaving it and now entering the action part of
    it and so on) might do this. However, you are right in that this is not
    usually the task of a lexer. In the past I have frequently had trouble
    with the distinction of lexing and parsing in real language parsing --
    most languages require you to keep some context for actually tokenizing
    them. Ruby, for example, requires that your lexer knows about all kinds
    of quoted Strings and where they end and interpolated expressions inside
    them. I'm not sure of where to best draw the line so it's probably
    better to let you decide.

    > On the other hand.... the reason I chose not to emit advisory tokens
    > for this particular case is that the complimentary tool to RubyLexer is
    > intended to be Reg, which can find nested pairs of braces and the like
    > pretty easily. Have you looked at Reg at all? I realize that I only
    > released it yesterday, and as of yet it's only half-working because
    > critical features are as yet unimplemented, but I think it might be
    > just the thing for the types of preprocessors you have in mind.


    Heh, I didn't realize that you were also the author of that library so I
    did not draw the connection. I have, however, marked those two threads
    as something I will have to examine. (They are now colored red.)

    I'm watching Reg with growing interest -- I'm not sure if I have already
    told this to you (I remember telling the author of "BNF-like grammar
    specified DIRECTLY in Ruby"), but I have also done something vaguely
    similar -- I have done an object-oriented way of constructing and
    combining Regular Expressions. What you have done is something better.

    I'm especially interested in how the LALR parser, Reg and RubyLexer
    might all work together. Any way of getting some sample code? I'm aware
    of the fact that this is all subject to change as long as you have not
    implemented all the necessary features like look-ahead, but getting a
    quick overview would still be nice.

    > Reg might not be able to easily tell 'if' the postfix operator from
    > 'if' the value in current RubyLexer output. Since one requires an end
    > and the other doesn't, that can be troublesome to deal with. 'do' is
    > also a pain, now that I think of it. All these cases are handled
    > correctly in RubyLexer, we just have to find an appropriate
    > (token-based, not expression-based) interface.


    I would be pretty much okay with the advisory tokens idea -- it sounds
    like meta-tokens that tell me about the context.

    >>For doing code transformations it is of course also important that
    >>you can turn back the stream of tokens into a String easily. I did
    >>this with IRB's lexer by using the .line_no and .pos methods of
    >>tokens, but that was not too good a match, actually.

    >
    > So what would be a good match? I don't see why this should be a
    > problem. My implementation of Token implements to_s, which returns the
    > ruby code corresponding to the token; ususally, this is exactly the
    > same as the code that created the token originally. There's also a
    > offset method, which returns the position of the token in the input
    > stream, relative to the very beginning. Tokens don't have a #line_no,
    > but you can get the same information from FileAndLineTokens.


    This does sound good. Having an offset ought to actually be better than
    separate character and line numbers as well.

    >>I did this with IRB's lexer by using the .line_no and .pos
    >>methods of tokens, but that was not too good a match, actually.

    >
    > Wait,,,, so you wrote irb's lexer? One of my wishlist items is to
    > integrate RubyLexer with irb among others.... how hard do you think
    > this will be?


    Nope, not really. I've just used it out of IRB. Integrating it ought to
    be possible, but I'm not sure why that would be necessary.

    > Well, it was unexpected for me. Much to my embarassment; I thought I
    > was an expert at this. I must say many elements of this got me very
    > confused at first, and obviously I never put all the pieces together.
    > Congratulations.
    >
    > Ps: I haven't figured out why this breaks RubyLexer yet, but I will.


    Good luck. :)

    > Pps: putting tricky stuff in eval strings and the like won't break the
    > lexer (yet). To the lexer, it's just a string.


    Yup, same for IRB.
    Florian Groß, Apr 25, 2005
    #19
  20. vikkous

    Peter Suk Guest

    On Apr 25, 2005, at 8:37 AM, Florian Groß wrote:

    >
    > I'm especially interested in how the LALR parser, Reg and RubyLexer
    > might all work together. Any way of getting some sample code? I'm
    > aware of the fact that this is all subject to change as long as you
    > have not implemented all the necessary features like look-ahead, but
    > getting a quick overview would still be nice.
    >


    I am currently constructing an LALR parser for Ruby using RubyLexer for
    the Alumina-VM project. I suspect that RubyLexer is going to make this
    much cleaner.

    --Peter


    --
    There's neither heaven nor hell, save what we grant ourselves.
    There's neither fairness nor justice, save what we grant each other.
    Peter Suk, Apr 25, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Hahn
    Replies:
    22
    Views:
    655
    Carlos Ribeiro
    Sep 9, 2004
  2. Caleb Clausen

    [Ann] RubyLexer 0.6.2

    Caleb Clausen, Jun 2, 2005, in forum: Ruby
    Replies:
    0
    Views:
    96
    Caleb Clausen
    Jun 2, 2005
  3. Caleb Clausen

    [ANN] rubylexer 0.7.0 Released

    Caleb Clausen, Feb 21, 2008, in forum: Ruby
    Replies:
    0
    Views:
    99
    Caleb Clausen
    Feb 21, 2008
  4. Caleb Clausen

    [ANN] rubylexer 0.7.1 Released

    Caleb Clausen, Sep 2, 2008, in forum: Ruby
    Replies:
    1
    Views:
    78
    Roger Pack
    Oct 8, 2008
  5. Caleb Clausen

    [ANN] rubylexer 0.7.3 Released

    Caleb Clausen, May 1, 2009, in forum: Ruby
    Replies:
    2
    Views:
    119
    Caleb Clausen
    May 3, 2009
Loading...

Share This Page