Hi,
I know that PLY lex is able to do line counting. I am wondering if there
is a way to count the number of each keywords (tokens) in a given file?
For example, how many IF tokens etc?
... if a: foo()
... elif b: bar()
... if c: baz()
... """)
>>> sum([1 for t in tokenize.generate_tokens(src.readline) if t[1]=='if'])
2
That generates an intermediate list with a 1 for each 'if', but it's not a big
price to pay IMO.
If you have a file in the current working directory, e.g., foo.py, substitute
src = file('foo.py')
or do it in one line, like (untested):
sum([1 for t in tokenize.generate_tokens(file('foo.py').readline) if t[1]=='if'])
generate_tokens returns a generator that returns tuples, e.g. for the above
Rewind src:
Get the generator:
Manually get a couple of examples: (1, 'if', (2, 0), (2, 2), 'if a: foo()\n')
Rewind the StringIO object to start again:
Show all the token tuples: ...
(53, '\n', (1, 0), (1, 1), '\n')
(1, 'if', (2, 0), (2, 2), 'if a: foo()\n')
(1, 'a', (2, 3), (2, 4), 'if a: foo()\n')
(50, ':', (2, 4), (2, 5), 'if a: foo()\n')
(1, 'foo', (2, 6), (2, 9), 'if a: foo()\n')
(50, '(', (2, 9), (2, 10), 'if a: foo()\n')
(50, ')', (2, 10), (2, 11), 'if a: foo()\n')
(4, '\n', (2, 11), (2, 12), 'if a: foo()\n')
(1, 'elif', (3, 0), (3, 4), 'elif b: bar()\n')
(1, 'b', (3, 5), (3, 6), 'elif b: bar()\n')
(50, ':', (3, 6), (3, 7), 'elif b: bar()\n')
(1, 'bar', (3, 8), (3, 11), 'elif b: bar()\n')
(50, '(', (3, 11), (3, 12), 'elif b: bar()\n')
(50, ')', (3, 12), (3, 13), 'elif b: bar()\n')
(4, '\n', (3, 13), (3, 14), 'elif b: bar()\n')
(1, 'if', (4, 0), (4, 2), 'if c: baz()\n')
(1, 'c', (4, 3), (4, 4), 'if c: baz()\n')
(50, ':', (4, 4), (4, 5), 'if c: baz()\n')
(1, 'baz', (4, 6), (4, 9), 'if c: baz()\n')
(50, '(', (4, 9), (4, 10), 'if c: baz()\n')
(50, ')', (4, 10), (4, 11), 'if c: baz()\n')
(4, '\n', (4, 11), (4, 12), 'if c: baz()\n')
(0, '', (5, 0), (5, 0), '')
HTH
Regards,
Bengt Richter