Dice gen and analyser script for RPGs: comments sought

R

Richard Buckle

Hi fellow Pythonistas,

I've been using Python in anger for some time, and I must say, as I
wrote in <http://macprang.sourceforge.net/>:

"It's refreshing beyond words to use a language that so freely combines
such a small, clean syntax with such a powerful synthesis of
procedural, object-oriented and functional techniques."

My (hardcore C++) workplace is now very much converted to Python for
scripting tasks! We have completely abandoned Unix and DOS
shell-script, and Perl. We drive our entire "build-for-QA" process,
including code-signing, with Python and also use it for all our
automated nightly builds and unit tests.

So, inspired by the Python cookbook article 17.13, and because the
subject interests me, I've bashed out a module in my own time that can
do brute force and monte carlo analysis of dice under various RPG
rulesets:

* vanilla rolls of n x-sided dice, the result being multiplied by y
(usual d20 system ndx*y)
* as above, but the lowest m rolls are discarded
(optional d20 char gen rules)
* dice beating a threshold (Storyteller-style system),
with optional roll again
* opposed sorted rolls (Risk-style system).

The 1.0 version is at http://www.sailmaker.co.uk/newfiles/dice.py and
is public domain.

Before you ask, yes I'm fully aware of Newton's binomial theorem and
its generaliziations. The point of this code is to however to
generalise, exercise and accumulate the brute force and Monte Carlo
methods across various evaluatuon rules.

All comments welcomed, cc to email preferred merely because it shows up
sooner for me and is less likely to be missed.

Comments, insights and overall evaluations are especially welcomed re:
* Cleanliness of design
* Pythonicity of design
* Pythonicity of code
* Efficiency of code
* Quality of docstrings
* Conformance with modern docstring standards
* Conformance with coding standards e.g. PEP 8

I look forward to receiving your comments and criticisms.

Regards,
Richard.
 
P

Paul Rubin

Richard Buckle said:
Comments, insights and overall evaluations are especially welcomed re:
* Cleanliness of design
* Pythonicity of design
* Pythonicity of code
* Efficiency of code
* Quality of docstrings
* Conformance with modern docstring standards
* Conformance with coding standards e.g. PEP 8

I look forward to receiving your comments and criticisms.

I looked at this for a minute and found it carefully written but
somewhat hard to understand. I think it's maybe more OOP-y than
Python programs usually are; perhaps it's Java-like. There are some
efficiency issues that would be serious if the code were being used on
large problems, but probably no big deal for "3D6" or the like. In
particular, you call histogram.numSamples() in a bunch of places and
each call results in scanning through the dict. You might want to
cache this value in the histogram instance. Making histogram a
subclass of dict also seems a little bit peculiar.

You might find it more in the functional programming spirit to use
gencomps rather than listcomps in a few places:

return sum([freq * val for val, freq in self.items()])
becomes
return sum((freq * val) for val, freq in self.iteritems())

etc.
 
R

Richard Buckle

(re said:
I looked at this for a minute

Thanks for taking the time to look over it; I greatly appreciate it!
and found it carefully written but somewhat hard to understand.

Obviously, as the author, I have a blind spot about others
understanding it, but I always strive for my code to be self-commenting
and understandable at first sight, so I would genuinely welcome
feedback on which parts are difficult to understand and need more
commenting.

Perhaps it is insufficient commenting of the implementations of the
various subclasses of class Dice? I admit /they/ get a but funky in
places.
I think it's maybe more OOP-y than
Python programs usually are; perhaps it's Java-like.

That's interesting: while my background is certainly more OO than
functional, Java is the language I use least. Certainly I relish the
freedom to mix OO, functional and procedural paradigms that you get
with Python, so if there are ways in which that mixture can improve
dice.py, please enlighten me!

On the other hand, is "being more OOP-y than Python programs usually
are" a bad thing in itself, if the object analysis is cleanly designed?
There are some
efficiency issues that would be serious if the code were being used on
large problems, but probably no big deal for "3D6" or the like. In
particular, you call histogram.numSamples() in a bunch of places and
each call results in scanning through the dict.

A good point which I did consider, however after practical tests I
ruled in favour of code simplicity for the case in hand, while being
careful to leave the Histogram class open to optimisations such as you
describe via subclassing.

An earlier draft of the Histogram class did indeed have each method do
a single pass through the dict, accumulating what is now numSamples(),
sigma() and sigmaSquared() in local variables. But in practice I found
that for my problem domain, even a 'large' problem would have
relatively few histogram buckets, and computing the summary statistics
for the histogram took a /miniscule/ amount of time compared to the
time taken by generating the data.

On that basis I refactored the Histogram class to follow the "once and
only once" coding principle, resulting in a much clearer and more
concise Histogram class, with very simple and natural code that any
high-school statisician can immediately see is correct. I found no
measurable perfomance hit.
You might want to cache this value in the histogram instance.

On the same basis as above, I decided against caching any results in
the Histogram class, preferring simplicity of code to the additional
logic that would be required to manage cache invalidation upon arrival
of new samples.

I've learned the hard way that caching, while a dramatic optimisation
in the right place, introduces a lot of complexity and pitfalls (even
more so in multi-threaded scenarios), so these days I only do it when
performance analysis reveals the need.

Of course, for histograms with huge numbers of buckets, there may well
be mileage in subclassing the Histogram class to implement the above
optimisations, and I think the class is sufficiently cleanly designed
to facilitate that.
Making histogram a subclass of dict also seems a little bit peculiar.

Why? Genuine question. Keep in mind that I might want to re-use the
Histogram class for cases where the buckets may be floating point, so I
can't just use a vector.

I wanted a mapping of bucket->frequency with an easy way to add a new
sample by incrementing the frequency, so dict seemed a natural choice.
I am of course open to better ideas!
You might find it more in the functional programming spirit to use
gencomps rather than listcomps in a few places:

return sum([freq * val for val, freq in self.items()])
becomes
return sum((freq * val) for val, freq in self.iteritems())

Nice: thanks for making me aware of gencomps!

I agree that they're better, but when at home I'm on Mac OS X and so
stuck with Python 2.3 unless I engage in the necessary shenanigans to
slap Python 2.4 onto Mac OS X 10.4.7. Yes, I know how to do it, I've
done it before, but it's a PITA and I don't want to deal with it any
more.

Moreover I want my code to run on vanilla installations of Mac OS X
10.4.x, so I just put up with it and code to Python 2.3.

I'm on Apple's developer program, so if anyone has a Radar bug number I
can cite while requesting they pull their socks up and ship Python 2.4
with the OS, just let me know and I'll cite it to Apple.

Richard.
 
G

George Sakkis

Richard said:
Comments, insights and overall evaluations are especially welcomed re:
* Cleanliness of design
* Pythonicity of design
* Pythonicity of code
* Efficiency of code
* Quality of docstrings
* Conformance with modern docstring standards
* Conformance with coding standards e.g. PEP 8

I look forward to receiving your comments and criticisms.

Here are some random (and less random) comments in no particular order,
with minimal or no justification; if you're not sure why some point is
good advice, simply ask :) I did just a "shallow parsing", meaning I
didn't attempt to understand what it's actually doing, check for
correctness, algorithm optimizations or major refactorings.

* Instead of importing sets, have the following snippet to be able to
use the builtin set (available since 2.4):
try: set
except NameError: from sets import Set as set

* Prefer new-style classes unless you have a good reason not to. The
change is minimal; just replace "class Dice" with "class Dice(object)".

* Replace most occurences of dict.keys, dict.values, dict.items with
dict.iterkeys, dict.itervalues, dict.iteritems respectively (unless you
write code with Py3K in mind ;-)).

* Avoid mutable default function arguments unless necessary. Instead of
a default empty list, use either an empty tuple or None (in which case
you may assign an empty list in the function's body when the argument
is None).

* Raising LookupError when a sanity check of function arguments fails
looks strange. ValueError (or a a specialised subclass of it) would be
more appropriate. 'assert' statements should also not be used for
arguments checking because they may be turned off when running the
program with '-O'.

* reduce() is discouraged; it's almost always more readable to unroll
it in a for loop.

* Are all the methods public, i.e. useful to a client of the classes ?
If not, the convention is to start the non-public methods' name with a
single underscore (or double underscore sometimes, but name mangling
often causes more problems than it solves in my experience, especially
when inheritance is involved).

* Several snippets can be condensed with list comprehensions, though
one-liners are not to everyone's taste. E.g. I'd write
histograms = []
maxDice = 10
for i in xrange(maxDice):
dice = StoryTellerDice(i)
histogram = dice.bruteforce()
histograms.append(histogram)
as histograms = [StoryTellerDice(i).bruteforce() for i in xrange(10)]

* [personal preference]: Don't leave space between *every* operator in
expressions, group them based on precedence. E.g. instead of "(n *
sigmaSq - sigma * sigma) / (n * n)", I read it easier as "(n*sigmaSq -
sigma*sigma) / (n*n).


HTH,
George
 
S

Steven Bethard

George said:
* [personal preference]: Don't leave space between *every* operator in
expressions, group them based on precedence. E.g. instead of "(n *
sigmaSq - sigma * sigma) / (n * n)", I read it easier as "(n*sigmaSq -
sigma*sigma) / (n*n).

The spaced-out version is more `PEP 8`_ compliant. Under "Whitespace in
Expressions and Statements -> Other Recommendations" it says "Use spaces
around arithmetic operators" and gives a few examples of "Yes" usage
that look much like the OP's usage.

That said, I also tend to omit the spaces around multiplicative
operators (though I'm slowly training myself out of that). As you say,
in the end, this is really a matter of personal preference.

... _PEP 8: http://www.python.org/dev/peps/pep-0008/


STeVe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top