perl identifier limits

Discussion in 'Perl Misc' started by Alex Shinn, Feb 6, 2004.

  1. Alex Shinn

    Alex Shinn Guest

    Got quite a surprise today when I encountered an "Identifier too long"
    error message. Nothing in the FAQ, but the BUGS section of "perldoc
    perl" does include:

    While none of the built-in data types have any arbitrary size limits
    (apart from memory size), there are still a few arbitrary limits: a
    given identifier may not be longer than 255 characters

    Not that I'd write such a long identifier, but I've got auto-generated
    code that reaches twice that length. Any ideas apart from applying
    compression algorithms to the id names? Any plans on fixing this?

    --
    Alex
    Alex Shinn, Feb 6, 2004
    #1
    1. Advertising

  2. Alex Shinn

    Uri Guttman Guest

    >>>>> "AS" == Alex Shinn <> writes:

    AS> Got quite a surprise today when I encountered an "Identifier too long"
    AS> error message. Nothing in the FAQ, but the BUGS section of "perldoc
    AS> perl" does include:

    AS> While none of the built-in data types have any arbitrary size limits
    AS> (apart from memory size), there are still a few arbitrary limits: a
    AS> given identifier may not be longer than 255 characters

    AS> Not that I'd write such a long identifier, but I've got auto-generated
    AS> code that reaches twice that length. Any ideas apart from applying
    AS> compression algorithms to the id names? Any plans on fixing this?

    fix your code. i can't see any possible reason to generate names that
    long. you would have to come up with some amazing reasons to support
    your claim that you need it.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Feb 6, 2004
    #2
    1. Advertising

  3. Alex Shinn

    Alex Shinn Guest

    At Fri, 06 Feb 2004 09:36:23 GMT, Uri Guttman wrote:
    >
    > >>>>> "AS" == Alex Shinn <> writes:

    >
    > AS> Not that I'd write such a long identifier, but I've got auto-generated
    > AS> code that reaches twice that length. Any ideas apart from applying
    > AS> compression algorithms to the id names? Any plans on fixing this?
    >
    > fix your code. i can't see any possible reason to generate names that
    > long. you would have to come up with some amazing reasons to support
    > your claim that you need it.


    You obviously don't write Perl with a Lisp mindset. If you
    auto-generate code on the fly it is not always easy to design it such
    that names won't conflict. In my case I'm working with an application
    server which can have a *huge* base of dynamically generated code. A
    potential workaround is to use only hashtables and store anonymous
    subroutines in them, but this is far from an insignificant rewrite and
    looses some flexibility. After googling I find I'm not the only one who
    has had this problem:

    http://www.gossamer-threads.com/arc...fier_Too_Long_Error_with_Long_Pathnames_P480/

    It's also a very silly & trivial bug in Perl, which is acknowledged as a
    known bug. And Python does it right!

    /me ducks and runs

    --
    Alex
    Alex Shinn, Feb 6, 2004
    #3
  4. Alex Shinn <> wrote in
    news::

    > At Fri, 06 Feb 2004 09:36:23 GMT, Uri Guttman wrote:
    >>
    >> >>>>> "AS" == Alex Shinn <> writes:

    >>
    >> AS> Not that I'd write such a long identifier, but I've got
    >> auto-generated AS> code that reaches twice that length. Any ideas
    >> apart from applying AS> compression algorithms to the id names?
    >> Any plans on fixing this?
    >>
    >> fix your code. i can't see any possible reason to generate names that
    >> long. you would have to come up with some amazing reasons to support
    >> your claim that you need it.

    >
    > You obviously don't write Perl with a Lisp mindset. If you
    > auto-generate code on the fly it is not always easy to design it such
    > that names won't conflict. In my case I'm working with an application
    > server which can have a *huge* base of dynamically generated code. A
    > potential workaround is to use only hashtables and store anonymous
    > subroutines in them,


    Complete shot in the dark: How about using the MD5 or SHA1 hash of the very
    very very very long names you need?

    --
    A. Sinan Unur
    (reverse each component for email address)
    A. Sinan Unur, Feb 6, 2004
    #4
  5. Alex Shinn

    Rocco Caputo Guest

    On Fri, 06 Feb 2004 18:50:28 +0900, Alex Shinn wrote:
    > At Fri, 06 Feb 2004 09:36:23 GMT, Uri Guttman wrote:
    >>
    >> >>>>> "AS" == Alex Shinn <> writes:

    >>
    >> AS> Not that I'd write such a long identifier, but I've got auto-generated
    >> AS> code that reaches twice that length. Any ideas apart from applying
    >> AS> compression algorithms to the id names? Any plans on fixing this?
    >>
    >> fix your code. i can't see any possible reason to generate names that
    >> long. you would have to come up with some amazing reasons to support
    >> your claim that you need it.

    >
    > You obviously don't write Perl with a Lisp mindset. If you
    > auto-generate code on the fly it is not always easy to design it such
    > that names won't conflict. In my case I'm working with an application
    > server which can have a *huge* base of dynamically generated code. A
    > potential workaround is to use only hashtables and store anonymous
    > subroutines in them, but this is far from an insignificant rewrite and
    > looses some flexibility. After googling I find I'm not the only one who
    > has had this problem:
    >
    > http://www.gossamer-threads.com/arc...fier_Too_Long_Error_with_Long_Pathnames_P480/
    >
    > It's also a very silly & trivial bug in Perl, which is acknowledged as a
    > known bug. And Python does it right!


    But it's a very rare problem to run into. As such, it's not a pressing
    issue for [wild guess] 99% of the people who use Perl. As you feel
    strongly about it, you may want to address the problem yourself and
    submit a patch.

    Or you can do the damsel in distress routine ("OH! HELP! SOMEONE PLEASE
    HELP ME!") until some shining knight patches it for you. For your sake,
    I hope you're cute. :)

    While you're holding your breath, consider rolling your own symbol
    table: A hash of long identifiers mapped to computed short ones. As
    your program writes Perl source, it can translate the too-long symbols
    into the short ones.

    Sure, nobody will understand the generated source code. You probably
    don't want people editing it directly anyway, so the obfuscation acts as
    a deterrent.

    --
    Rocco Caputo - - http://poe..perlorg/
    Rocco Caputo, Feb 6, 2004
    #5
  6. Alex Shinn

    Uri Guttman Guest

    >>>>> "AS" == Alex Shinn <> writes:

    AS> At Fri, 06 Feb 2004 09:36:23 GMT, Uri Guttman wrote:
    >>
    >> >>>>> "AS" == Alex Shinn <> writes:

    >>

    AS> Not that I'd write such a long identifier, but I've got auto-generated
    AS> code that reaches twice that length. Any ideas apart from applying
    AS> compression algorithms to the id names? Any plans on fixing this?
    >>
    >> fix your code. i can't see any possible reason to generate names that
    >> long. you would have to come up with some amazing reasons to support
    >> your claim that you need it.


    AS> You obviously don't write Perl with a Lisp mindset. If you

    hell, i wouldn't do anything with a lisp mindset. i would rather toggle
    in code by binary switches (done it) than have a lisp mindset.

    AS> auto-generate code on the fly it is not always easy to design it such
    AS> that names won't conflict. In my case I'm working with an application
    AS> server which can have a *huge* base of dynamically generated code. A
    AS> potential workaround is to use only hashtables and store anonymous
    AS> subroutines in them, but this is far from an insignificant rewrite and
    AS> looses some flexibility. After googling I find I'm not the only one who
    AS> has had this problem:

    AS> http://www.gossamer-threads.com/arc...fier_Too_Long_Error_with_Long_Pathnames_P480/

    that seems to be an asp problem as much as a perl one. why a path name
    gets converted to a sub or identifier name is the question.

    but the fact that is it tells me something.

    the symbol table is not meant to be a general purpose hash structure. so
    using it as such (via symrefs) is very dumb. you say you lose
    flexibility by using hashes vs identifiers and that makes even less
    sense than lisp mind. i would have done it with dispatch tables and
    trees of them and had no issues with the names as i stay out of the
    symtable unless i have to. you didn't have to do it but you chose
    (wrongly) to use symbols for that. symbols are usually human written and
    read so a limit of 255 chars is fine. hash keys have no length limit so
    that is better for any auto generated stuff.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Feb 6, 2004
    #6
  7. Alex Shinn

    Alex Shinn Guest

    Rocco Caputo <> wrote in message news:<>...
    >
    > But it's a very rare problem to run into. As such, it's not a pressing
    > issue for [wild guess] 99% of the people who use Perl. As you feel
    > strongly about it, you may want to address the problem yourself and
    > submit a patch.


    I will probably end up doing so. The md5sum is an interesting idea,
    but I don't like even insignificant probabilities of clashes, and at times
    I need to debug the generated code so readable names are a plus.

    > Or you can do the damsel in distress routine ("OH! HELP! SOMEONE PLEASE
    > HELP ME!") until some shining knight patches it for you. For your sake,
    > I hope you're cute. :)


    How about I just say I'm cute and hide behind my gender-neutral
    first name and race-neutral last name? :) Any brave knights out there?

    > While you're holding your breath, consider rolling your own symbol
    > table: A hash of long identifiers mapped to computed short ones. As
    > your program writes Perl source, it can translate the too-long symbols
    > into the short ones.


    The more I think about this the uglier it gets. When you generate code like

    $var1 = expr1;
    $var2 = expr2;

    sub func1 { <some-expr-of-var1> }
    sub func2 { func1(<some-expr-of-var2>) }

    replacing all of those with nested hash-tables gets really convoluted:

    $hash = $globalhash{$modulename};

    $hash{var1} = expr1;
    $hash{var2} = expr2;

    $hash{func1} = sub { <some-expr-of-$hash{var1}> }
    $hash{func2} = sub { &{$hash{func1}}(<some-expr-of-$hash{var2}>) }

    Maybe the above example doesn't look *too* horrible, but the more
    variable references and subroutines you have the more cryptic it is.
    And I do have to debug the generated code sometimes. That plus
    all the places where I have to rewrite the code generators makes
    patching Perl the easiest solution.

    Thanks for your help,
    Alex
    Alex Shinn, Feb 7, 2004
    #7
  8. Alex Shinn

    Alex Shinn Guest

    Uri Guttman <> wrote in message news:<>...
    > >>>>> "AS" == Alex Shinn <> writes:

    >
    > AS> You obviously don't write Perl with a Lisp mindset. If you
    >
    > hell, i wouldn't do anything with a lisp mindset. i would rather toggle
    > in code by binary switches (done it) than have a lisp mindset.


    I wasn't suggesting you do, nor was I suggesting there is anything
    superior about Lisp. It just encourages another style of programming
    called meta-programming. And the nice thing about Perl is TMTOWDI -
    you can meta-program and write code generators if you want.

    > the symbol table is not meant to be a general purpose hash structure. so
    > using it as such (via symrefs) is very dumb.


    I'm not using it as a hash table, I'm actually writing Perl *code* and so
    the natural solution is to use identifiers. Using hash-tables is a clumsy
    workaround. Regardless, I don't understand your animosity and don't
    appreciate being called dumb. I can only assume you feel threatened by
    something you don't understand and feel the need to put it down.

    --
    Alex
    Alex Shinn, Feb 7, 2004
    #8
  9. Alex Shinn

    Uri Guttman Guest

    >>>>> "AS" == Alex Shinn <> writes:

    AS> The more I think about this the uglier it gets. When you generate
    AS> code like

    AS> $var1 = expr1;
    AS> $var2 = expr2;

    AS> sub func1 { <some-expr-of-var1> }
    AS> sub func2 { func1(<some-expr-of-var2>) }

    AS> replacing all of those with nested hash-tables gets really convoluted:

    AS> $hash = $globalhash{$modulename};

    AS> $hash{var1} = expr1;
    AS> $hash{var2} = expr2;

    AS> $hash{func1} = sub { <some-expr-of-$hash{var1}> }
    AS> $hash{func2} = sub { &{$hash{func1}}(<some-expr-of-$hash{var2}>) }

    this is cleaner code to generate IMO

    $hash{func1} = sub { <some-expr-of-$hash{var1}> }
    $hash{func2} = sub { $hash{func1}->(<some-expr-of-$hash{var2}>) }

    AS> Maybe the above example doesn't look *too* horrible, but the more
    AS> variable references and subroutines you have the more cryptic it
    AS> is. And I do have to debug the generated code sometimes. That
    AS> plus all the places where I have to rewrite the code generators
    AS> makes patching Perl the easiest solution.

    and you could do a global replace on all sub defs and sub calls to use
    the hashes. in fact you could do this as a pass AFTER you generate all
    the code. it would almost be as easy as:

    s/sub\s*(\w+)/\$hash{$1} = sub/g ;
    s/(\w+)\(/\$hash{$1}->(/g ;

    the second one will probably need a tighter way to find your sub names
    and not find perl funcs. but i leave that as an exercise to you. (hint:
    use a /e and call a sub. in there check for the existance of the
    generated sub name and only replace if found). (another hint: if all
    your sub names are very long then just look for a minimum size to match)

    see, simple. i will send you a bill. :)

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Feb 7, 2004
    #9
  10. Alex Shinn

    Uri Guttman Guest

    >>>>> "AS" == Alex Shinn <> writes:

    AS> Uri Guttman <> wrote in message news:<>...

    AS> You obviously don't write Perl with a Lisp mindset. If you
    >>
    >> hell, i wouldn't do anything with a lisp mindset. i would rather toggle
    >> in code by binary switches (done it) than have a lisp mindset.


    AS> I wasn't suggesting you do, nor was I suggesting there is anything
    AS> superior about Lisp. It just encourages another style of programming
    AS> called meta-programming. And the nice thing about Perl is TMTOWDI -
    AS> you can meta-program and write code generators if you want.

    well, you brought up lisp mindset. them's fighting words! :)

    and i have generated code in several projects so i understand the
    issues.

    >> the symbol table is not meant to be a general purpose hash structure. so
    >> using it as such (via symrefs) is very dumb.


    AS> I'm not using it as a hash table, I'm actually writing Perl *code*
    AS> and so the natural solution is to use identifiers. Using
    AS> hash-tables is a clumsy workaround. Regardless, I don't
    AS> understand your animosity and don't appreciate being called dumb.
    AS> I can only assume you feel threatened by something you don't
    AS> understand and feel the need to put it down.

    but you are using it as a hash table in that you are creating names in
    it. true they are simple (if long identifiers) but they are just
    entries. the symtable has this max id restriction so you have to convert
    to a regular hash table. my point was that assuming the symtable is a
    normal hash with infinite length keys was wrong. and i have railed
    against symrefs (which you aren't using) plenty of times so it carried
    over here.

    but see my other post just now for a solution that should work and be
    very easy to do.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Feb 7, 2004
    #10
  11. Alex Shinn

    Rocco Caputo Guest

    On 7 Feb 2004 01:39:19 -0800, Alex Shinn wrote:
    > Rocco Caputo <> wrote in message news:<>...
    >> While you're holding your breath, consider rolling your own symbol
    >> table: A hash of long identifiers mapped to computed short ones. As
    >> your program writes Perl source, it can translate the too-long symbols
    >> into the short ones.

    >
    > The more I think about this the uglier it gets. When you generate code like
    >
    > $var1 = expr1;
    > $var2 = expr2;
    >
    > sub func1 { <some-expr-of-var1> }
    > sub func2 { func1(<some-expr-of-var2>) }
    >
    > replacing all of those with nested hash-tables gets really convoluted:
    >
    > $hash = $globalhash{$modulename};
    >
    > $hash{var1} = expr1;
    > $hash{var2} = expr2;
    >
    > $hash{func1} = sub { <some-expr-of-$hash{var1}> }
    > $hash{func2} = sub { &{$hash{func1}}(<some-expr-of-$hash{var2}>) }


    This is not what I intended. The %symbol_table hash would be kept in
    your code generator. You would translate your long names to shorter
    ones at output time.

    #!/usr/bin/perl
    # This is the CODE GENERATOR, not the generated code!

    my %symbol_table;
    my $symbol = "symAAAAAA";

    ... la la la ...;

    $symbol_table{$long_version} = $symbol++;

    ... la la la ...;

    print "$symbol_table{$long_version} = $expression\n";

    ... la la la ...;

    print "sub $symbol_table{$long_version} { $body }\n";

    ... la la la ...;

    So the generated source is full of symAAAAAA, symAAAAAB, or something
    you like better. It's not as meaningful as your intepretation, but it's
    less ugly and certainly faster at runtime.

    > Maybe the above example doesn't look *too* horrible, but the more
    > variable references and subroutines you have the more cryptic it is.
    > And I do have to debug the generated code sometimes. That plus
    > all the places where I have to rewrite the code generators makes
    > patching Perl the easiest solution.


    Debugging generated code sucks. At least generate well commented code
    if you can't avoid it.

    It says something bad about your program if patching Perl is easier than
    maintaining it.

    --
    Rocco Caputo - - http://poe.perl.org/
    Rocco Caputo, Feb 7, 2004
    #11
  12. Alex Shinn

    Alex Shinn Guest

    Rocco Caputo <> wrote in message news:<>...
    > On 7 Feb 2004 01:39:19 -0800, Alex Shinn wrote:
    > > Rocco Caputo <> wrote in message news:<>...

    >
    > So the generated source is full of symAAAAAA, symAAAAAB, or something
    > you like better. It's not as meaningful as your intepretation, but it's
    > less ugly and certainly faster at runtime.


    OK, but I want to try to keep meaningful names if at all possible. You're right
    though, the hash tables would be an unacceptable performance loss.

    > Debugging generated code sucks. At least generate well commented code
    > if you can't avoid it.


    The generated code is already well commented and has special hooks built-in
    when I need to debug it. It's well designed so that when I need to debugging
    it is quite easy.

    > It says something bad about your program if patching Perl is easier than
    > maintaining it.


    Don't be a jerk. You know nothing about my code. The Perl patch, however,
    should be relatively trivial, since Perl itself has no problems reading/hashing
    arbitrary length strings.

    And it's a BUG!!! Apparently Perl hackers are so insecure about their language
    that you can't ask for help with a workaround for a Perl bug without them
    trying to convince you that YOU are the one in the wrong. Nevermind, I'll
    figure it out on my own and never write to this group again.

    --
    Alex
    Alex Shinn, Feb 8, 2004
    #12
  13. Alex Shinn

    Rocco Caputo Guest

    On 7 Feb 2004 22:31:30 -0800, Alex Shinn wrote:
    > Rocco Caputo <> wrote in message news:<>...
    >> It says something bad about your program if patching Perl is easier than
    >> maintaining it.

    >
    > Don't be a jerk. You know nothing about my code. The Perl patch, however,
    > should be relatively trivial, since Perl itself has no problems reading/hashing
    > arbitrary length strings.


    Mr. Cranky needs a nap before he starts yelling.

    I know nothing about your program's source, but I've looked at Perl's.
    Where I sit, it still says something bad about your program if it's
    harder to maintain than Perl.

    > And it's a BUG!!! Apparently Perl hackers are so insecure about their language
    > that you can't ask for help with a workaround for a Perl bug without them
    > trying to convince you that YOU are the one in the wrong. Nevermind, I'll
    > figure it out on my own and never write to this group again.


    There's no need to yell. Of course it's a bug. From perldoc perldiag:

    Identifier too long
    (F) Perl limits identifiers (names for variables, functions, etc.)
    to about 250 characters for simple names, and somewhat more for
    compound names (like $A::B). You've exceeded Perl's limits.
    Future versions of Perl are likely to eliminate these arbitrary
    limitations.

    So fix it already. The code you're looking for is in toke.c, wherever
    you find the symbol ident_too_long.

    I, for one, look forward to seeing your patch on the perl5-porters
    mailing list.

    --
    Rocco Caputo - - http://poe.perl.org/
    Rocco Caputo, Feb 8, 2004
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page