random.sample with large weighted sample-sets?

T

Tim Chase

I'm not coming up with the right keywords to find what I'm hunting.
I'd like to randomly sample a modestly compact list with weighted
distributions, so I might have

data = (
("apple", 20),
("orange", 50),
("grape", 30),
)

and I'd like to random.sample() it as if it was a 100-element list.
However, ideally, this could be done in O(size-of-data) storage
rather than requiring the build-out of the entire set just for
sampling purposes, as the actual data can get a bit large. For this
small toy data-set, I can use

sample_me = sum((*n for s,n in data, [])
random.sample(sample_me, k)

but for large counts, the list returned from sum() grinds my system
because I start swapping. What am I missing? (links to relevant
keywords/searches/algorithms welcome in lieu of actually answering
in-line)

Thanks,

-tkc
 

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top