T
Thor Whalen
The first thing I do once I import new data (as a pandas dataframe) is to .head() it, .describe() it, and then kick around a few specific stats according to what I see.
But I'm not satisfied with .describe(). Amongst others, non-numerical columns are ignored, and off-the-shelf stats will be computed for any numerical column.
I've been shopping around for a "data peeping" function that would:
(1) Have a hands-off mode where simply typing
diagnose_this(data)
the function would figure things out on its own, and notify me when in doubt. For example, would assume that any string data with not too many unique values should be considered categorical and appropriate statistics erected.
(2) Perform standard diagnoses and print them out. For example, (a) missing values? (b) heterogeneously formatted data? (c) columns with only one unique value? etc.
(3) Be parametrizable, if I so choose.
Does anyone know of such a function?
But I'm not satisfied with .describe(). Amongst others, non-numerical columns are ignored, and off-the-shelf stats will be computed for any numerical column.
I've been shopping around for a "data peeping" function that would:
(1) Have a hands-off mode where simply typing
diagnose_this(data)
the function would figure things out on its own, and notify me when in doubt. For example, would assume that any string data with not too many unique values should be considered categorical and appropriate statistics erected.
(2) Perform standard diagnoses and print them out. For example, (a) missing values? (b) heterogeneously formatted data? (c) columns with only one unique value? etc.
(3) Be parametrizable, if I so choose.
Does anyone know of such a function?