R and Perl

Discussion in 'Perl Misc' started by ccc31807, Sep 2, 2011.

  1. ccc31807

    ccc31807 Guest

    Has anyone used R with Perl for statistical programming?

    Has anyone used R with Perl to output graphical files?

    Is it any more complicated than writing R scripts and calling the R
    interpreter using system() or the like?

    I'm about to embark on a major project with R, and really, really need
    Perl to munge my data files. I would like to automate the entire
    thing, but if I can't I can use Perl to generate the input data for R
    and manually generate the output files.

    Thanks, CC.
    ccc31807, Sep 2, 2011
    #1
    1. Advertising

  2. ccc31807

    azrazer Guest

    Hello,
    Le 03/09/2011 00:26, ccc31807 a écrit :
    > Has anyone used R with Perl for statistical programming?
    > Has anyone used R with Perl to output graphical files?

    I think so, too.
    >
    > Is it any more complicated than writing R scripts and calling the R
    > interpreter using system() or the like?

    Using system raises no issues in my opinion...
    You definitely can write your script and call R using the commandline.

    > I'm about to embark on a major project with R, and really, really need
    > Perl to munge my data files. I would like to automate the entire
    > thing, but if I can't I can use Perl to generate the input data for R
    > and manually generate the output files.

    Could you be a bit more precise about what you want to do.
    AFAIK, from experience, the best thing to do would be to format your
    data using Perl without making any modification on your data (i.e. if
    you have LONG+LARGE tables of numbers, don't make any mathematics on
    them using perl) but just FORMAT them as a well-structured table.

    Then do all the filtering, mathematical operations etc... on your
    database using R.

    (I am not saying that Perl is not suitable for such operations, but i
    think it is better to launch your Perl script once, and then work on the
    database using R, if it is the software you want to use ! Raw data
    usually provides more information than modified data)

    Could you be more precise about why you cannot use perl to generate the
    input data for R ? --and if so, why calling system() is a problem ?--
    > Thanks, CC.


    cheers.
    azrazer, Sep 6, 2011
    #2
    1. Advertising

  3. ccc31807

    ccc31807 Guest

    On Sep 6, 3:03 am, azrazer <> wrote:
    > Could you be a bit more precise about what you want to do.


    I have multiple data files that I will retrieve from a database query.
    These will be on the order of 150K rows, and an indeterminate number
    of columns. The columns will include both dates and status codes, and
    I will need to build a data structure containing the cumulative count
    of status codes over several months, day by day. Then, I need to build
    graphical files with line charts.

    This is currently done by hand in Excel, and I have been tasked with
    automating the process.

    Munging the data and getting the cumulative count per status code per
    day is a snap in Perl, and while I've generated charts in Perl using
    GD::Graph, using R is certainly a lot easier, and besides, I am
    motivated to learn R.

    > AFAIK, from experience, the best thing to do would be to format your
    > data using Perl without making any modification on your data


    The raw data needs to be processed. The 'data' that I will use will be
    contained in hashes, the keys will be status codes, the sub keys will
    be dates, and the values will be integers, sort of like this:

    $hash{S}{20110601} => 10
    $hash{S}{20110602} => 13
    $hash{S}{20110603} => 21
    $hash{S}{20110604} => 19
    $hash{S}{20110605} => 25
    $hash{S}{20110606} => 29
    $hash{S}{20110607} => 28

    So, I can print out the hash in an R compatible data frame and use it
    directly to generate a PDF.

    > Could you be more precise about why you cannot use perl to generate the
    > input data for R ? --and if so, why calling system() is a problem


    I will use Perl to munge the data and produce as output an input file
    for R. I want to be able to push a button and have the computer do all
    the work.

    Thanks for your reply, CC.
    ccc31807, Sep 7, 2011
    #3
  4. ccc31807

    azrazer Guest

    Le 08/09/2011 00:26, ccc31807 a écrit :
    > On Sep 6, 3:03 am, azrazer<> wrote:
    >> Could you be a bit more precise about what you want to do.

    >
    > I have multiple data files that I will retrieve from a database query.
    > These will be on the order of 150K rows, and an indeterminate number
    > of columns. The columns will include both dates and status codes, and
    > I will need to build a data structure containing the cumulative count
    > of status codes over several months, day by day. Then, I need to build
    > graphical files with line charts.

    Well yes, this is easily done using R, you just have to aggregate data
    (don't you ?). (using aggregate/ddply)
    >
    > This is currently done by hand in Excel, and I have been tasked with
    > automating the process.
    >
    > Munging the data and getting the cumulative count per status code per
    > day is a snap in Perl, and while I've generated charts in Perl using
    > GD::Graph, using R is certainly a lot easier, and besides, I am
    > motivated to learn R.

    Yes, don't worry this will be a piece of cake too, once your data is
    well organised.
    >
    >> AFAIK, from experience, the best thing to do would be to format your
    >> data using Perl without making any modification on your data

    >
    > The raw data needs to be processed. The 'data' that I will use will be
    > contained in hashes, the keys will be status codes, the sub keys will
    > be dates, and the values will be integers, sort of like this:
    >
    > $hash{S}{20110601} => 10
    > $hash{S}{20110602} => 13
    > $hash{S}{20110603} => 21
    > $hash{S}{20110604} => 19
    > $hash{S}{20110605} => 25
    > $hash{S}{20110606} => 29
    > $hash{S}{20110607} => 28
    >
    > So, I can print out the hash in an R compatible data frame and use it
    > directly to generate a PDF.

    Yup, just generate a CSV file that will be loaded by R and that will be
    it, don't you think ?
    >
    >> Could you be more precise about why you cannot use perl to generate the
    >> input data for R ? --and if so, why calling system() is a problem

    >
    > I will use Perl to munge the data and produce as output an input file
    > for R. I want to be able to push a button and have the computer do all
    > the work.

    Looks like a decent way of doing things => let the computer work ! :)
    have fun,
    >
    > Thanks for your reply, CC.
    azrazer, Sep 8, 2011
    #4
  5. ccc31807

    Jon Du Kim Guest

    If you have existing R code that you would like to
    interface with than some sort of perl/R bridge makes sense.
    But, you do know that perl has a fantastically awesome
    set of libraries known as Perl Data Language (PDL)?
    http://pdl.perl.org/
    I have used the PDL Stats modules and they work well
    for what I was up to. Check them out too.
    http://pdl-stats.sourceforge.net/
    Not sure what you are using R for but you can keep it
    all Perl if you want to...

    On 9/2/11 6:26 PM, ccc31807 wrote:
    > Has anyone used R with Perl for statistical programming?
    >
    > Has anyone used R with Perl to output graphical files?
    >
    > Is it any more complicated than writing R scripts and calling the R
    > interpreter using system() or the like?
    >
    > I'm about to embark on a major project with R, and really, really need
    > Perl to munge my data files. I would like to automate the entire
    > thing, but if I can't I can use Perl to generate the input data for R
    > and manually generate the output files.
    >
    > Thanks, CC.
    Jon Du Kim, Sep 9, 2011
    #5
  6. ccc31807

    Ted Byers Guest

    On Sep 7, 6:26 pm, ccc31807 <> wrote:
    > On Sep 6, 3:03 am, azrazer <> wrote:
    >
    > > Could you be a bit more precise about what you want to do.

    >
    > I have multiple data files that I will retrieve from a database query.
    > These will be on the order of 150K rows, and an indeterminate number
    > of columns. The columns will include both dates and status codes, and
    > I will need to build a data structure containing the cumulative count
    > of status codes over several months, day by day. Then, I need to build
    > graphical files with line charts.
    >
    > This is currently done by hand in Excel, and I have been tasked with
    > automating the process.
    >
    > Munging the data and getting the cumulative count per status code per
    > day is a snap in Perl, and while I've generated charts in Perl using
    > GD::Graph, using R is certainly a lot easier, and besides, I am
    > motivated to learn R.
    >
    > > AFAIK, from experience, the best thing to do would be to format your
    > > data using Perl without making any modification on your data

    >
    > The raw data needs to be processed. The 'data' that I will use will be
    > contained in hashes, the keys will be status codes, the sub keys will
    > be dates, and the values will be integers, sort of like this:
    >
    > $hash{S}{20110601} => 10
    > $hash{S}{20110602} => 13
    > $hash{S}{20110603} => 21
    > $hash{S}{20110604} => 19
    > $hash{S}{20110605} => 25
    > $hash{S}{20110606} => 29
    > $hash{S}{20110607} => 28
    >
    > So, I can print out the hash in an R compatible data frame and use it
    > directly to generate a PDF.
    >
    > > Could you be more precise about why you cannot use perl to generate the
    > > input data for R ? --and if so, why calling system() is a problem

    >
    > I will use Perl to munge the data and produce as output an input file
    > for R. I want to be able to push a button and have the computer do all
    > the work.
    >
    > Thanks for your reply, CC.
    >
    >


    Actually, while the other responses are correct, there is a simpler
    way still. Well, actually two; but it may be blasphemy to say so in
    this forum. ;-) Understand, as long as your DB is one of the common
    ones (e.g. MS SQL Server, MySQL, PostgreSQL, &c.) there are drivers
    that let your R script connect directly to the DB (equivalent to
    Perl's DBI). There is therefore no need to waste time on making CSV
    files. And, given that, you can either do any data manipluation using
    SQL or you can load the raw data into R and use a selection of one of
    its packages to do the sort of manipulations you'd otherwise do using
    SQL. Either of these options will be faster than getting Perl
    involved in some of the data manipulation. Trust me, I have tried it
    in all variations (having perl get/manipulate the data, having the DB
    do the manipulation up to the point where my models can do their
    various analyses, to importing raw data directly from the DB into R
    and having R do it all. In my experience, the latter turned out to be
    the faastest. using SQL's data manipulation capability is faster if
    the R script and the DB are on different machines communicating over a
    slow network.

    HTH

    Ted

    This reduces Perl to simplify invoking the R script (e.g., the only
    way I could make my R programs scheduled tasks is to write a simple
    perl script that starts it.)
    Ted Byers, Sep 14, 2011
    #6
  7. ccc31807

    ccc31807 Guest

    On Sep 14, 2:16 pm, Ted Byers <> wrote:
    > Actually, while the other responses are correct, there is a simpler
    > way still.  Well, actually two; but it may be blasphemy to say so in
    > this forum.  ;-)  Understand, as long as your DB is one of the common
    > ones (e.g. MS SQL Server, MySQL, PostgreSQL, &c.) there are drivers
    > that let your R script connect directly to the DB (equivalent to
    > Perl's DBI).


    My database is a Unidata database from IBM. Aside from the fact that
    there isn't a DBD fir Pick, it uses a non-SQL query language,
    UniQuery, and even aside from that is the fact that you really can't
    manipulate data but just select it.

    My challenge lies between the output file of my Perl script, the CSV
    file, and the invocation of R. I haven't worked on this since my post,
    but If the simplest way works, I'll keep it. 'Simpler' being defined
    as having to write the least amount of code to get the output that I
    need, which appears to be calling the R executable from system() or
    the like.

    > Trust me, I have tried it
    > in all variations (having perl get/manipulate the data, having the DB
    > do the manipulation up to the point where my models can do their
    > various analyses, to importing raw data directly from the DB into R
    > and having R do it all.  In my experience, the latter turned out to be
    > the faastest.  using SQL's data manipulation capability is faster if
    > the R script and the DB are on different machines communicating over a
    > slow network.


    I can see how it would, however, I'm an old web guy, and I think in
    terms of connecting the interface and the database with Perl scripts,
    and I don't really have the motivation to change at this point. Who
    knows, maybe I'll get another job and do my work like this.

    Thanks, CC.
    ccc31807, Sep 15, 2011
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dpackwood
    Replies:
    3
    Views:
    1,765
  2. PerlFAQ Server

    FAQ 1.4 What are Perl 4, Perl 5, or Perl 6?

    PerlFAQ Server, Jan 23, 2011, in forum: Perl Misc
    Replies:
    0
    Views:
    285
    PerlFAQ Server
    Jan 23, 2011
  3. PerlFAQ Server
    Replies:
    0
    Views:
    652
    PerlFAQ Server
    Feb 3, 2011
  4. PerlFAQ Server

    FAQ 1.4 What are Perl 4, Perl 5, or Perl 6?

    PerlFAQ Server, Feb 27, 2011, in forum: Perl Misc
    Replies:
    0
    Views:
    276
    PerlFAQ Server
    Feb 27, 2011
  5. Dilbert
    Replies:
    0
    Views:
    828
    Dilbert
    Nov 10, 2011
Loading...

Share This Page