perl identifier limits

A

Alex Shinn

Got quite a surprise today when I encountered an "Identifier too long"
error message. Nothing in the FAQ, but the BUGS section of "perldoc
perl" does include:

While none of the built-in data types have any arbitrary size limits
(apart from memory size), there are still a few arbitrary limits: a
given identifier may not be longer than 255 characters

Not that I'd write such a long identifier, but I've got auto-generated
code that reaches twice that length. Any ideas apart from applying
compression algorithms to the id names? Any plans on fixing this?
 
U

Uri Guttman

AS> Got quite a surprise today when I encountered an "Identifier too long"
AS> error message. Nothing in the FAQ, but the BUGS section of "perldoc
AS> perl" does include:

AS> While none of the built-in data types have any arbitrary size limits
AS> (apart from memory size), there are still a few arbitrary limits: a
AS> given identifier may not be longer than 255 characters

AS> Not that I'd write such a long identifier, but I've got auto-generated
AS> code that reaches twice that length. Any ideas apart from applying
AS> compression algorithms to the id names? Any plans on fixing this?

fix your code. i can't see any possible reason to generate names that
long. you would have to come up with some amazing reasons to support
your claim that you need it.

uri
 
A

Alex Shinn

AS> Not that I'd write such a long identifier, but I've got auto-generated
AS> code that reaches twice that length. Any ideas apart from applying
AS> compression algorithms to the id names? Any plans on fixing this?

fix your code. i can't see any possible reason to generate names that
long. you would have to come up with some amazing reasons to support
your claim that you need it.

You obviously don't write Perl with a Lisp mindset. If you
auto-generate code on the fly it is not always easy to design it such
that names won't conflict. In my case I'm working with an application
server which can have a *huge* base of dynamically generated code. A
potential workaround is to use only hashtables and store anonymous
subroutines in them, but this is far from an insignificant rewrite and
looses some flexibility. After googling I find I'm not the only one who
has had this problem:

http://www.gossamer-threads.com/arc...fier_Too_Long_Error_with_Long_Pathnames_P480/

It's also a very silly & trivial bug in Perl, which is acknowledged as a
known bug. And Python does it right!

/me ducks and runs
 
A

A. Sinan Unur

You obviously don't write Perl with a Lisp mindset. If you
auto-generate code on the fly it is not always easy to design it such
that names won't conflict. In my case I'm working with an application
server which can have a *huge* base of dynamically generated code. A
potential workaround is to use only hashtables and store anonymous
subroutines in them,

Complete shot in the dark: How about using the MD5 or SHA1 hash of the very
very very very long names you need?
 
R

Rocco Caputo

You obviously don't write Perl with a Lisp mindset. If you
auto-generate code on the fly it is not always easy to design it such
that names won't conflict. In my case I'm working with an application
server which can have a *huge* base of dynamically generated code. A
potential workaround is to use only hashtables and store anonymous
subroutines in them, but this is far from an insignificant rewrite and
looses some flexibility. After googling I find I'm not the only one who
has had this problem:

http://www.gossamer-threads.com/arc...fier_Too_Long_Error_with_Long_Pathnames_P480/

It's also a very silly & trivial bug in Perl, which is acknowledged as a
known bug. And Python does it right!

But it's a very rare problem to run into. As such, it's not a pressing
issue for [wild guess] 99% of the people who use Perl. As you feel
strongly about it, you may want to address the problem yourself and
submit a patch.

Or you can do the damsel in distress routine ("OH! HELP! SOMEONE PLEASE
HELP ME!") until some shining knight patches it for you. For your sake,
I hope you're cute. :)

While you're holding your breath, consider rolling your own symbol
table: A hash of long identifiers mapped to computed short ones. As
your program writes Perl source, it can translate the too-long symbols
into the short ones.

Sure, nobody will understand the generated source code. You probably
don't want people editing it directly anyway, so the obfuscation acts as
a deterrent.
 
U

Uri Guttman

AS> Not that I'd write such a long identifier, but I've got auto-generated
AS> code that reaches twice that length. Any ideas apart from applying
AS> compression algorithms to the id names? Any plans on fixing this?
AS> You obviously don't write Perl with a Lisp mindset. If you

hell, i wouldn't do anything with a lisp mindset. i would rather toggle
in code by binary switches (done it) than have a lisp mindset.

AS> auto-generate code on the fly it is not always easy to design it such
AS> that names won't conflict. In my case I'm working with an application
AS> server which can have a *huge* base of dynamically generated code. A
AS> potential workaround is to use only hashtables and store anonymous
AS> subroutines in them, but this is far from an insignificant rewrite and
AS> looses some flexibility. After googling I find I'm not the only one who
AS> has had this problem:

AS> http://www.gossamer-threads.com/arc...fier_Too_Long_Error_with_Long_Pathnames_P480/

that seems to be an asp problem as much as a perl one. why a path name
gets converted to a sub or identifier name is the question.

but the fact that is it tells me something.

the symbol table is not meant to be a general purpose hash structure. so
using it as such (via symrefs) is very dumb. you say you lose
flexibility by using hashes vs identifiers and that makes even less
sense than lisp mind. i would have done it with dispatch tables and
trees of them and had no issues with the names as i stay out of the
symtable unless i have to. you didn't have to do it but you chose
(wrongly) to use symbols for that. symbols are usually human written and
read so a limit of 255 chars is fine. hash keys have no length limit so
that is better for any auto generated stuff.

uri
 
A

Alex Shinn

Rocco Caputo said:
But it's a very rare problem to run into. As such, it's not a pressing
issue for [wild guess] 99% of the people who use Perl. As you feel
strongly about it, you may want to address the problem yourself and
submit a patch.

I will probably end up doing so. The md5sum is an interesting idea,
but I don't like even insignificant probabilities of clashes, and at times
I need to debug the generated code so readable names are a plus.
Or you can do the damsel in distress routine ("OH! HELP! SOMEONE PLEASE
HELP ME!") until some shining knight patches it for you. For your sake,
I hope you're cute. :)

How about I just say I'm cute and hide behind my gender-neutral
first name and race-neutral last name? :) Any brave knights out there?
While you're holding your breath, consider rolling your own symbol
table: A hash of long identifiers mapped to computed short ones. As
your program writes Perl source, it can translate the too-long symbols
into the short ones.

The more I think about this the uglier it gets. When you generate code like

$var1 = expr1;
$var2 = expr2;

sub func1 { <some-expr-of-var1> }
sub func2 { func1(<some-expr-of-var2>) }

replacing all of those with nested hash-tables gets really convoluted:

$hash = $globalhash{$modulename};

$hash{var1} = expr1;
$hash{var2} = expr2;

$hash{func1} = sub { <some-expr-of-$hash{var1}> }
$hash{func2} = sub { &{$hash{func1}}(<some-expr-of-$hash{var2}>) }

Maybe the above example doesn't look *too* horrible, but the more
variable references and subroutines you have the more cryptic it is.
And I do have to debug the generated code sometimes. That plus
all the places where I have to rewrite the code generators makes
patching Perl the easiest solution.

Thanks for your help,
Alex
 
A

Alex Shinn

Uri Guttman said:
AS> You obviously don't write Perl with a Lisp mindset. If you

hell, i wouldn't do anything with a lisp mindset. i would rather toggle
in code by binary switches (done it) than have a lisp mindset.

I wasn't suggesting you do, nor was I suggesting there is anything
superior about Lisp. It just encourages another style of programming
called meta-programming. And the nice thing about Perl is TMTOWDI -
you can meta-program and write code generators if you want.
the symbol table is not meant to be a general purpose hash structure. so
using it as such (via symrefs) is very dumb.

I'm not using it as a hash table, I'm actually writing Perl *code* and so
the natural solution is to use identifiers. Using hash-tables is a clumsy
workaround. Regardless, I don't understand your animosity and don't
appreciate being called dumb. I can only assume you feel threatened by
something you don't understand and feel the need to put it down.
 
U

Uri Guttman

AS> The more I think about this the uglier it gets. When you generate
AS> code like

AS> $var1 = expr1;
AS> $var2 = expr2;

AS> sub func1 { <some-expr-of-var1> }
AS> sub func2 { func1(<some-expr-of-var2>) }

AS> replacing all of those with nested hash-tables gets really convoluted:

AS> $hash = $globalhash{$modulename};

AS> $hash{var1} = expr1;
AS> $hash{var2} = expr2;

AS> $hash{func1} = sub { <some-expr-of-$hash{var1}> }
AS> $hash{func2} = sub { &{$hash{func1}}(<some-expr-of-$hash{var2}>) }

this is cleaner code to generate IMO

$hash{func1} = sub { <some-expr-of-$hash{var1}> }
$hash{func2} = sub { $hash{func1}->(<some-expr-of-$hash{var2}>) }

AS> Maybe the above example doesn't look *too* horrible, but the more
AS> variable references and subroutines you have the more cryptic it
AS> is. And I do have to debug the generated code sometimes. That
AS> plus all the places where I have to rewrite the code generators
AS> makes patching Perl the easiest solution.

and you could do a global replace on all sub defs and sub calls to use
the hashes. in fact you could do this as a pass AFTER you generate all
the code. it would almost be as easy as:

s/sub\s*(\w+)/\$hash{$1} = sub/g ;
s/(\w+)\(/\$hash{$1}->(/g ;

the second one will probably need a tighter way to find your sub names
and not find perl funcs. but i leave that as an exercise to you. (hint:
use a /e and call a sub. in there check for the existance of the
generated sub name and only replace if found). (another hint: if all
your sub names are very long then just look for a minimum size to match)

see, simple. i will send you a bill. :)

uri
 
U

Uri Guttman

AS> You obviously don't write Perl with a Lisp mindset. If you
AS> I wasn't suggesting you do, nor was I suggesting there is anything
AS> superior about Lisp. It just encourages another style of programming
AS> called meta-programming. And the nice thing about Perl is TMTOWDI -
AS> you can meta-program and write code generators if you want.

well, you brought up lisp mindset. them's fighting words! :)

and i have generated code in several projects so i understand the
issues.

AS> I'm not using it as a hash table, I'm actually writing Perl *code*
AS> and so the natural solution is to use identifiers. Using
AS> hash-tables is a clumsy workaround. Regardless, I don't
AS> understand your animosity and don't appreciate being called dumb.
AS> I can only assume you feel threatened by something you don't
AS> understand and feel the need to put it down.

but you are using it as a hash table in that you are creating names in
it. true they are simple (if long identifiers) but they are just
entries. the symtable has this max id restriction so you have to convert
to a regular hash table. my point was that assuming the symtable is a
normal hash with infinite length keys was wrong. and i have railed
against symrefs (which you aren't using) plenty of times so it carried
over here.

but see my other post just now for a solution that should work and be
very easy to do.

uri
 
R

Rocco Caputo

The more I think about this the uglier it gets. When you generate code like

$var1 = expr1;
$var2 = expr2;

sub func1 { <some-expr-of-var1> }
sub func2 { func1(<some-expr-of-var2>) }

replacing all of those with nested hash-tables gets really convoluted:

$hash = $globalhash{$modulename};

$hash{var1} = expr1;
$hash{var2} = expr2;

$hash{func1} = sub { <some-expr-of-$hash{var1}> }
$hash{func2} = sub { &{$hash{func1}}(<some-expr-of-$hash{var2}>) }

This is not what I intended. The %symbol_table hash would be kept in
your code generator. You would translate your long names to shorter
ones at output time.

#!/usr/bin/perl
# This is the CODE GENERATOR, not the generated code!

my %symbol_table;
my $symbol = "symAAAAAA";

... la la la ...;

$symbol_table{$long_version} = $symbol++;

... la la la ...;

print "$symbol_table{$long_version} = $expression\n";

... la la la ...;

print "sub $symbol_table{$long_version} { $body }\n";

... la la la ...;

So the generated source is full of symAAAAAA, symAAAAAB, or something
you like better. It's not as meaningful as your intepretation, but it's
less ugly and certainly faster at runtime.
Maybe the above example doesn't look *too* horrible, but the more
variable references and subroutines you have the more cryptic it is.
And I do have to debug the generated code sometimes. That plus
all the places where I have to rewrite the code generators makes
patching Perl the easiest solution.

Debugging generated code sucks. At least generate well commented code
if you can't avoid it.

It says something bad about your program if patching Perl is easier than
maintaining it.
 
A

Alex Shinn

Rocco Caputo said:
So the generated source is full of symAAAAAA, symAAAAAB, or something
you like better. It's not as meaningful as your intepretation, but it's
less ugly and certainly faster at runtime.

OK, but I want to try to keep meaningful names if at all possible. You're right
though, the hash tables would be an unacceptable performance loss.
Debugging generated code sucks. At least generate well commented code
if you can't avoid it.

The generated code is already well commented and has special hooks built-in
when I need to debug it. It's well designed so that when I need to debugging
it is quite easy.
It says something bad about your program if patching Perl is easier than
maintaining it.

Don't be a jerk. You know nothing about my code. The Perl patch, however,
should be relatively trivial, since Perl itself has no problems reading/hashing
arbitrary length strings.

And it's a BUG!!! Apparently Perl hackers are so insecure about their language
that you can't ask for help with a workaround for a Perl bug without them
trying to convince you that YOU are the one in the wrong. Nevermind, I'll
figure it out on my own and never write to this group again.
 
R

Rocco Caputo

Don't be a jerk. You know nothing about my code. The Perl patch, however,
should be relatively trivial, since Perl itself has no problems reading/hashing
arbitrary length strings.

Mr. Cranky needs a nap before he starts yelling.

I know nothing about your program's source, but I've looked at Perl's.
Where I sit, it still says something bad about your program if it's
harder to maintain than Perl.
And it's a BUG!!! Apparently Perl hackers are so insecure about their language
that you can't ask for help with a workaround for a Perl bug without them
trying to convince you that YOU are the one in the wrong. Nevermind, I'll
figure it out on my own and never write to this group again.

There's no need to yell. Of course it's a bug. From perldoc perldiag:

Identifier too long
(F) Perl limits identifiers (names for variables, functions, etc.)
to about 250 characters for simple names, and somewhat more for
compound names (like $A::B). You've exceeded Perl's limits.
Future versions of Perl are likely to eliminate these arbitrary
limitations.

So fix it already. The code you're looking for is in toke.c, wherever
you find the symbol ident_too_long.

I, for one, look forward to seeing your patch on the perl5-porters
mailing list.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top