T
Tim Chase
I'm not coming up with the right keywords to find what I'm hunting.
I'd like to randomly sample a modestly compact list with weighted
distributions, so I might have
data = (
("apple", 20),
("orange", 50),
("grape", 30),
)
and I'd like to random.sample() it as if it was a 100-element list.
However, ideally, this could be done in O(size-of-data) storage
rather than requiring the build-out of the entire set just for
sampling purposes, as the actual data can get a bit large. For this
small toy data-set, I can use
sample_me = sum((*n for s,n in data, [])
random.sample(sample_me, k)
but for large counts, the list returned from sum() grinds my system
because I start swapping. What am I missing? (links to relevant
keywords/searches/algorithms welcome in lieu of actually answering
in-line)
Thanks,
-tkc
I'd like to randomly sample a modestly compact list with weighted
distributions, so I might have
data = (
("apple", 20),
("orange", 50),
("grape", 30),
)
and I'd like to random.sample() it as if it was a 100-element list.
However, ideally, this could be done in O(size-of-data) storage
rather than requiring the build-out of the entire set just for
sampling purposes, as the actual data can get a bit large. For this
small toy data-set, I can use
sample_me = sum((
random.sample(sample_me, k)
but for large counts, the list returned from sum() grinds my system
because I start swapping. What am I missing? (links to relevant
keywords/searches/algorithms welcome in lieu of actually answering
in-line)
Thanks,
-tkc