Data munging, the choices available.

I

irishhacker

What's the percentage of Perl users who use Perl for data munging
(cleaning up data , data transformation, etc) on a fairly common
occasion?
Perl is particularly good at regular expressions, which is useful for
some types of data munging.

There are mainly three choices for data munging. Which choice is best
depends on the type of data problem one has. There are many different
types of data munging, both in degree of difficulty, and flavor.

ALL-PURPOSE PROGRAMMING LANGUAGES
obvious example: Perl

SPECIALIZED PROGRAMMING LANGUAGES
obvious example: SAS datastep (but extremely expensive) , also SPSS
( to get data ready for analysis, same thing)
PSPP (GPL open source re-implementation of SPSS programming language,
@ http://directory.fsf.org/math/stats )
DAP (GPL open source re-implementation of SAS programming language, @
http://directory.fsf.org/math/stats )
vilno (GPL open source, another data transformation programming
language and engine, @ http://code.google.com/p/vilno )

GRAPHICAL USER INTERFACE
Kettle ( http://kettle.pentaho.org )
KETL, ( http://www.ketl.org ) and on and on.
Particularly popular with the "T" part of "ETL" .
ETL is always marketed as having a GUI front-end, no one ever mentions
using an ETL programming language.
If the complexity/quality of the data is not that bad, and hence the
required munging is not too complicated, then a GUI product is good.
But if Murphy's law strikes with the databases(if something can go
wrong it will), programming languages provide more flexibily for bad
situations.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top