Is it possible to get string from function?

R

Roy Smith

I realize the subject line is kind of meaningless, so let me explain :)

I've got some unit tests that look like:

class Foo(TestCase):
def test_t1(self):
RECEIPT = "some string"

def test_t2(self):
RECEIPT = "some other string"

def test_t3(self):
RECEIPT = "yet a third string"

and so on. It's important that the strings be mutually unique. In the
example above, it's trivial to look at them and observe that they're all
different, but in real life, the strings are about 2500 characters long,
hex-encoded. It even turns out that a couple of the strings are
identical in the first 1000 or so characters, so it's not trivial to do
by visual inspection.

So, I figured I would write a meta-test, which used introspection to
find all the methods in the class, extract the strings from them (they
are all assigned to a variable named RECEIPT), and check to make sure
they're all different.

Is it possible to do that? It is straight-forward using the inspect
module to discover the methods, but I don't see any way to find what
strings are assigned to a variable with a given name. Of course, that
assignment doesn't even happen until the function is executed, so
perhaps what I want just isn't possible?

It turns out, I solved the problem with more mundane tools:

grep 'RECEIPT = ' test.py | sort | uniq -c

and I could have also solved the problem by putting all the strings in a
dict and having the functions pull them out of there. But, I'm still
interested in exploring if there is any way to do this with
introspection, as an academic exercise.
 
R

Roy Smith

Ben Finney said:
That looks like a poorly defined class.

Are the test cases pretty much identical other than the data in those
strings?

No, each test is quite different. The only thing they have in common is
they all involve a string representation of a transaction receipt. I
elided the actual test code in my example above because it wasn't
relevant to my question.
 
C

Chris Angelico

So, I figured I would write a meta-test, which used introspection to
find all the methods in the class, extract the strings from them (they
are all assigned to a variable named RECEIPT), and check to make sure
they're all different.

In theory, it should be. You can disassemble the function and find the
assignment. Check out Lib/dis.py - or just call it and process its
output. Names of local variables are found in
test_t1.__code__.co_names, the constants themselves are in
test_1.__code__.co_consts, and then it's just a matter of matching up
which constant got assigned to the slot represented by the name
RECEIPT.

But you might be able to shortcut it enormously. You say the strings
are "about 2500 characters long, hex-encoded". What are the chances of
having another constant, somewhere in the test function, that also
happens to be roughly that long and hex-encoded? If the answer is
"practically zero", then skip the code, skip co_names, and just look
through co_consts.



class TestCase:
pass # not running this in the full environment

class Foo(TestCase):
def test_t1(self):
RECEIPT = "some string"

def test_t2(self):
RECEIPT = "some other string"

def test_t3(self):
RECEIPT = "yet a third string"

def test_oops(self):
RECEIPT = "some other string"

unique = {}
for funcname in dir(Foo):
if funcname.startswith("test_"):
for const in getattr(Foo,funcname).__code__.co_consts:
if isinstance(const, str) and const.endswith("string"):
if const in unique:
print("Collision!", unique[const], "and", funcname)
unique[const] = funcname



This depends on your RECEIPT strings ending with the word "string" -
change the .endswith() check to be whatever it takes to distinguish
your critical constants from everything else you might have. Maybe:

CHARSET = set("0123456789ABCDEF") # or use lower-case letters, or
both, according to your hex encoding

if isinstance(const, str) and len(const)>2048 and set(const)<=CHARSET:

Anything over 2KB with no characters outside of that set is highly
likely to be what you want. Of course, this whole theory goes out the
window if your test functions can reference another test's RECEIPT;
though if you can guarantee that this is the *first* such literal (if
RECEIPT="..." is the first thing the function does), then you could
just add a 'break' after the unique[const]=funcname assignment and
it'll check only the first - co_consts is ordered.

An interesting little problem!

ChrisA
 
R

Roy Smith

Chris Angelico said:
So, I figured I would write a meta-test, which used introspection to
find all the methods in the class, extract the strings from them (they
are all assigned to a variable named RECEIPT), and check to make sure
they're all different.
[...]
But you might be able to shortcut it enormously. You say the strings
are "about 2500 characters long, hex-encoded". What are the chances of
having another constant, somewhere in the test function, that also
happens to be roughly that long and hex-encoded?

The chances are exactly zero.
If the answer is "practically zero", then skip the code, skip
co_names, and just look through co_consts.

That sounds like it should work, thanks!
Of course, this whole theory goes out the
window if your test functions can reference another test's RECEIPT;

No, they don't do that.
 
S

Steven D'Aprano

I've got some unit tests that look like:

class Foo(TestCase):
def test_t1(self):
RECEIPT = "some string"

def test_t2(self):
RECEIPT = "some other string"

def test_t3(self):
RECEIPT = "yet a third string"

and so on. It's important that the strings be mutually unique. In the
example above, it's trivial to look at them and observe that they're all
different, but in real life, the strings are about 2500 characters long,
hex-encoded. It even turns out that a couple of the strings are
identical in the first 1000 or so characters, so it's not trivial to do
by visual inspection.

Is the mapping of receipt string to test fixed? That is, is it important
that test_t1 *always* runs with "some string", test_t2 "some other
string", and so forth?

If not, I'd start by pushing all those strings into a global list (or
possible a class attribute. Then:

LIST_OF_GIANT_STRINGS = [blah blah blah] # Read it from a file perhaps?
assert len(LIST_OF_GIANT_STRINGS) == len(set(LIST_OF_GIANT_STRINGS))


Then, change each test case to:

def test_t1(self):
RECEIPT = random.choose(LIST_OF_GIANT_STRINGS)


Even if two tests happen to pick the same string on this run, they are
unlikely to pick the same string on the next run.

If that's not good enough, if the strings *must* be unique, you can use a
helper like this:

def choose_without_replacement(alist):
random.shuffle(alist)
return alist.pop()

class Foo(TestCase):
def test_t1(self):
RECEIPT = choose_without_replacement(LIST_OF_GIANT_STRINGS)


All this assumes that you don't care which giant string matches which
test method. If you do, then:

DICT_OF_GIANT_STRINGS = {
'test_t1': ...,
'test_t2': ...,
} # Again, maybe read them from a file.

assert len(list(DICT_OF_GIANT_STRINGS.values())) == \
len(set(DICT_OF_GIANT_STRINGS.values()))


You can probably build up the dict from the test class by inspection,
e.g.:

DICT_OF_GIANT_STRINGS = {}
for name in Foo.__dict__:
if name.startswith("test_"):
key = name[5:]
if key.startswith("t"):
DICT_OF_GIANT_STRINGS[name] = get_giant_string(key)

I'm sure you get the picture. Then each method just needs to know it's
own name:


class Foo(TestCase):
def test_t1(self):
RECEIPT = DICT_OF_GIANT_STRINGS["test_t1"]


which I must admit is much easier to read than

RECEIPT = "...2500 hex encoded characters..."
 
P

Peter Otten

Roy said:
I realize the subject line is kind of meaningless, so let me explain :)

I've got some unit tests that look like:

class Foo(TestCase):
def test_t1(self):
RECEIPT = "some string"

def test_t2(self):
RECEIPT = "some other string"

def test_t3(self):
RECEIPT = "yet a third string"

and so on. It's important that the strings be mutually unique. In the
example above, it's trivial to look at them and observe that they're all
different, but in real life, the strings are about 2500 characters long,
hex-encoded. It even turns out that a couple of the strings are
identical in the first 1000 or so characters, so it's not trivial to do
by visual inspection.

So, I figured I would write a meta-test, which used introspection to
find all the methods in the class, extract the strings from them (they
are all assigned to a variable named RECEIPT), and check to make sure
they're all different.

Is it possible to do that? It is straight-forward using the inspect
module to discover the methods, but I don't see any way to find what
strings are assigned to a variable with a given name. Of course, that
assignment doesn't even happen until the function is executed, so
perhaps what I want just isn't possible?

It turns out, I solved the problem with more mundane tools:

grep 'RECEIPT = ' test.py | sort | uniq -c

and I could have also solved the problem by putting all the strings in a
dict and having the functions pull them out of there. But, I'm still
interested in exploring if there is any way to do this with
introspection, as an academic exercise.

Instead of using introspection you could make it explicit with a decorator:

$ cat unique_receipt.py
import functools
import sys
import unittest

_receipts = {}
def unique_receipt(receipt):
def deco(f):
if receipt in _receipts:
raise ValueError(
"Duplicate receipt {!r} in \n {} and \n {}".format(
receipt, _receipts[receipt], f))
_receipts[receipt] = f
@functools.wraps(f)
def g(self):
return f(self, receipt)
return g
return deco

class Foo(unittest.TestCase):
@unique_receipt("foo")
def test_t1(self, RECEIPT):
pass

@unique_receipt("bar")
def test_t2(self, RECEIPT):
pass

@unique_receipt("foo")
def test_t3(self, RECEIPT):
pass

if __name__ == "__main__":
unittest.main()
$ python unique_receipt.py
Traceback (most recent call last):
File "unique_receipt.py", line 19, in <module>
class Foo(unittest.TestCase):
File "unique_receipt.py", line 28, in Foo
@unique_receipt("foo")
File "unique_receipt.py", line 11, in deco
receipt, _receipts[receipt], f))
ValueError: Duplicate receipt 'foo' in
<function test_t1 at 0x7fc8714af5f0> and
<function test_t3 at 0x7fc8714af7d0>
 
R

Roy Smith

Steven D'Aprano said:
Is the mapping of receipt string to test fixed? That is, is it important
that test_t1 *always* runs with "some string", test_t2 "some other
string", and so forth?

Yes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top