T
Tom Anderson
Hi chaps and chapesses,
I'm putting together some little tests to compare various serialisation
formats - java's own, XML, ASN.1, whatever else i can think of. To do this
properly, i need some data to populate my objects before i write them out.
I started off with a good old Customer/Order/etc model, with randomly
generated values, but i'm not really happy with this - i can't randomly
generate names and addresses with as much entropy as real ones, and that's
going to make compression more effective than it would be in real-world
data.
What i really need, then, is some real-world data. I don't mind what the
domain or data model is, as long as it's got several different object
types, lots of pointers and not too much of either text or numeric data
(so not particle physics, x-ray crystallography, text corpus, or image
data). I'd like about a hundred megs of it. Does anyone know of a dataset
i could use for this? I've had a bit of a look, but all i can find is,
yes, particle physics, x-ray crystallography, text corpus, or image data.
All suggestions welcome!
tom
I'm putting together some little tests to compare various serialisation
formats - java's own, XML, ASN.1, whatever else i can think of. To do this
properly, i need some data to populate my objects before i write them out.
I started off with a good old Customer/Order/etc model, with randomly
generated values, but i'm not really happy with this - i can't randomly
generate names and addresses with as much entropy as real ones, and that's
going to make compression more effective than it would be in real-world
data.
What i really need, then, is some real-world data. I don't mind what the
domain or data model is, as long as it's got several different object
types, lots of pointers and not too much of either text or numeric data
(so not particle physics, x-ray crystallography, text corpus, or image
data). I'd like about a hundred megs of it. Does anyone know of a dataset
i could use for this? I've had a bit of a look, but all i can find is,
yes, particle physics, x-ray crystallography, text corpus, or image data.
All suggestions welcome!
tom