Perl bioinformatics

Discussion in 'Perl Misc' started by ccc31807, Oct 26, 2009.

  1. ccc31807

    ccc31807 Guest

    I'm not changing jobs, but I've been contacted about some contract
    opportunities that (reportedly) are difficult but seem easy enough to
    me, manipulating genome files to produce various kinds of reports,
    graphs, etc. I have zero experience in this, so I'm just wondering ...

    1. What are the career opportunities in bioinformatics using Perl?

    2. Looking for books, I found the following:
    a. Beginning Perl for Bioinformatics by James Tisdall
    b. Mastering Perl for Bioinformatics by James D. Tisdall
    c. Building Bioinformatics Solutions: with Perl, R and MySQL by
    Conrad Bessant**
    d. Perl Programming for Biologists by D. Curtis Jamison
    e. Genomic Perl: From Bioinformatics Basics to Working Code by Rex A.
    Dwyer

    Looking at the tables of contents, reviews, and reader comments, I
    believe that c. is probably the best value, but it's real hard to tell
    without buying and reading the book. Anybody have any experiences
    using any of these books? I'd like to conserve both time and money by
    starting with the 'best' book.

    Thanks, CC.
    ccc31807, Oct 26, 2009
    #1
    1. Advertising

  2. ccc31807 <> wrote:
    >I'm not changing jobs, but I've been contacted about some contract
    >opportunities that (reportedly) are difficult but seem easy enough to
    >me, manipulating genome files to produce various kinds of reports,
    >graphs, etc. I have zero experience in this, so I'm just wondering ...


    The usual problem is the huge volume of data that needs processing.
    Therefore typically the standard algorithms don't work any more and you
    need a really strong background in data processing.
    Perl is not necessariy the best choice here. Perl's powerful features
    make it easy to write code that seems to do the job, but it won't scale
    from the small test samples to the huge actual data set where you really
    need special methods and optimizations.

    A little while ago there was someone posting questions here regularly
    about how to deal with genom sequences. If don't know if he is still
    around, but maybe you can check the archives and contact him.

    jue
    Jürgen Exner, Oct 26, 2009
    #2
    1. Advertising

  3. In article <>,
    ccc31807 <> wrote:
    >
    >Looking at the tables of contents, reviews, and reader comments, I
    >believe that c. is probably the best value, but it's real hard to tell
    >without buying and reading the book. Anybody have any experiences
    >using any of these books? I'd like to conserve both time and money by
    >starting with the 'best' book.
    >


    The 'best' book is the one that engages you. It's hard to
    predict.

    For $22.95 you can get access to *all* the O'Reilly books
    <http://my.safaribooksonline.com/>
    including several on bioinformatics. There's a free trial!

    You might want to check the used book stores for a textbook like
    _The Molecular Biology of the Gene_, so that you can pick up some
    biology.

    --bks
    Bradley K. Sherman, Oct 26, 2009
    #3
  4. In article <>,
    Jürgen Exner <> wrote:
    > ...
    >The usual problem is the huge volume of data that needs processing.
    >Therefore typically the standard algorithms don't work any more and you
    >need a really strong background in data processing.
    >Perl is not necessariy the best choice here. Perl's powerful features
    >make it easy to write code that seems to do the job, but it won't scale
    >from the small test samples to the huge actual data set where you really
    >need special methods and optimizations.
    > ...


    This is not really fair. Most of bioinformatics is data wrangling
    and Perl is exactly the right choice for that.

    See, e.g.
    <http://www.foo.be/docs/tpj/issues/vol1_2/tpj0102-0001.html>

    --bks
    Bradley K. Sherman, Oct 26, 2009
    #4
  5. ccc31807

    ccc31807 Guest

    On Oct 26, 10:45 am, (Bradley K. Sherman) wrote:
    > >The usual problem is the huge volume of data that needs processing.
    > >Therefore typically the standard algorithms don't work any more and you
    > >need a really strong background in data processing.


    >
    > This is not really fair.  Most of bioinformatics is data wrangling
    > and Perl is exactly the right choice for that.


    In my day job, I deal with data files on the order of several hundred
    thousand records. The scripts I write to produce reports from these
    data files sometimes take a second (or several seconds) to run. The
    data file I have for the bioinformatics project is much larger, but is
    a lot simpler (it's a dotplot file).

    Sometimes, data files can be so huge that the script just breaks.
    Sometimes, the script just runs longer than you might expect.
    Obviously, the longer time really isn't a problem ... there's no
    difference between a script that runs in microseconds and one that
    runs in minutes (say, between 60 and 120) ... as long as the script
    runs to completion.

    I'm sympathetic to jue's observation about the scaling problem, but
    after having looked at the data, the fact that it's genomic or
    biological is totally irrelevant. It's really the amount of data
    rather than the kind of data that seems to be significant.

    You seem to have a handle on what's going on. Is using Perl for
    bioinformatics totally off the wall, or a reasonable option for data
    mangling?

    CC
    ccc31807, Oct 26, 2009
    #5
  6. ccc31807

    Uri Guttman Guest

    >>>>> "JE" == Jürgen Exner <> writes:

    JE> ccc31807 <> wrote:
    >> I'm not changing jobs, but I've been contacted about some contract
    >> opportunities that (reportedly) are difficult but seem easy enough to
    >> me, manipulating genome files to produce various kinds of reports,
    >> graphs, etc. I have zero experience in this, so I'm just wondering ...


    JE> The usual problem is the huge volume of data that needs processing.
    JE> Therefore typically the standard algorithms don't work any more and you
    JE> need a really strong background in data processing.
    JE> Perl is not necessariy the best choice here. Perl's powerful features
    JE> make it easy to write code that seems to do the job, but it won't scale
    JE> from the small test samples to the huge actual data set where you really
    JE> need special methods and optimizations.

    JE> A little while ago there was someone posting questions here regularly
    JE> about how to deal with genom sequences. If don't know if he is still
    JE> around, but maybe you can check the archives and contact him.

    i will disagree on this. first off, perl is major in the biotech world
    for several reasons. one it is the best at text processing and most
    large genetic files are just plain text formats. secondly, there is
    large package called bioperl (with its own mailing list and community)
    that does tons of standard things on those files and more. finally, if
    you look back a bit, there is a great article called 'how perl saved the
    human genome project'. when that project was initially running it was
    distributed over many labs worldwide. and they created many new
    incompatible file formats for the data. the author of cgi.pm (who is
    really an MD and genetic researcher) designed perl modules to convert
    those formats to a common set of core formats so they could easily
    exchange data. so perl has a strong tie to the biotech industry that is
    not likely to be broken for a long while.

    as for jobs, i don't see many leads in that industry but they are
    usually looking for direct experience in it (hard to get from the
    outside) and/or higher degrees in related fields because you would be
    working in such an environment where you need it.

    so if the OP can learn enough from books and practice to get a job in
    the field, i say go for it. there many be other hurdles to jump but i
    can't predict what they will be.

    uri
    perlhunter.com (so i know something about the perl job market)

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Oct 26, 2009
    #6
  7. In article <>,
    ccc31807 <> wrote:
    > ...
    >You seem to have a handle on what's going on. Is using Perl for
    >bioinformatics totally off the wall, or a reasonable option for data
    >mangling?
    >


    I think that Perl is the primary language for bioinformatics.
    I can't back that up with numbers but I have been working in
    bioinformatics since 1992. Some of the younger bioinformaticians
    might want to make a case for Python, but I'm skeptical.

    My philosophy is to use Perl until it becomes necessary to
    write something in C. It rarely becomes necessary.

    Learning databases and statistics are also of great importance.

    --bks
    Bradley K. Sherman, Oct 26, 2009
    #7
  8. On Mon, 26 Oct 2009 17:00:49 +0100, ccc31807 <> wrote:

    > You seem to have a handle on what's going on. Is using Perl for
    > bioinformatics totally off the wall, or a reasonable option for data
    > mangling?


    I have no idea about bioinformatics, but Perl is easy enough that you
    should be able to get a book, jot down a quick & dirty test script and
    just sic it on your biggest and meanest data set.

    Then you get a quick handle on how long basic stuff takes. If it works
    fast enough, fine; if not, feel free to ask here. And if you find that
    it's just not the right tool, then you won't have lost much.

    IMO, the deal breaker will be if you have to handle data in an O(n^2)
    fashion (or worse), i.e. where one would really use some very special
    index structure, especially if the whole data set does not fit into RAM.

    Good luck!
    Jochen Lehmeier, Oct 26, 2009
    #8
  9. On Oct 26, 7:17 am, ccc31807 <> wrote:
    > I'm not changing jobs, but I've been contacted about some contract
    > opportunities that (reportedly) are difficult but seem easy enough to
    > me, manipulating genome files to produce various kinds of reports,
    > graphs, etc. I have zero experience in this, so I'm just wondering ...
    >
    > 1. What are the career opportunities in bioinformatics using Perl?
    >
    > 2. Looking for books, I found the following:
    >  a. Beginning Perl for Bioinformatics by James Tisdall
    >  b. Mastering Perl for Bioinformatics by James D. Tisdall
    >  c. Building Bioinformatics Solutions: with Perl, R and MySQL by
    > Conrad Bessant**
    >  d. Perl Programming for Biologists by D. Curtis Jamison
    >  e. Genomic Perl: From Bioinformatics Basics to Working Code by Rex A.
    > Dwyer
    >
    > Looking at the tables of contents, reviews, and reader comments, I
    > believe that c. is probably the best value, but it's real hard to tell
    > without buying and reading the book. Anybody have any experiences
    > using any of these books? I'd like to conserve both time and money by
    > starting with the 'best' book.
    >
    > Thanks, CC.


    I co-teach a Unix & Perl course at UC Davis that is aimed at teaching
    graduate students how to learn the basics of Perl in a biological
    context. We have specifically tried to assume no prior knowledge of
    programming as many people who take our course are new to this.

    We have made our course materials (data & documentation) freely
    available to anyone else who is interested:

    http://korflab.ucdavis.edu/Unix_and_Perl/index.html

    There is a corresponding Google Group for discussion of issues arising
    from the course. We also make regular updates to the documentation.
    Hope this might be of use to you.

    Keith
    Keith Bradnam, Oct 26, 2009
    #9
  10. Jürgen Exner wrote:
    > ccc31807 <> wrote:
    >> I'm not changing jobs, but I've been contacted about some contract
    >> opportunities that (reportedly) are difficult but seem easy enough to
    >> me, manipulating genome files to produce various kinds of reports,
    >> graphs, etc. I have zero experience in this, so I'm just wondering ...

    >
    > The usual problem is the huge volume of data that needs processing.
    > Therefore typically the standard algorithms don't work any more and you
    > need a really strong background in data processing.


    Isn't that exactly Perl's strength?

    > Perl is not necessariy the best choice here. Perl's powerful features
    > make it easy to write code that seems to do the job, but it won't scale
    > from the small test samples to the huge actual data set where you really
    > need special methods and optimizations.


    If you think about scalability as you write the code, Perl will not
    present any special scalability issues versus other languages. If you
    do not think about scalability, no language choice will protect you.

    I certainly would not implement a heavy duty multiple alignment
    algorithm directly in Perl, but I certainly might (and have) implement
    things like that in Inline::C or just link pre-existing C code in via
    XS, using Perl to handle the book-keeping, memory management, IPC,
    pre-processing and parsing, post-processing, packing, unpacking, etc.

    Based on the description of "produce various kinds of reports", I
    wouldn't think they expect this to cover Smith-Waterman type of things
    anyway, but only the kind of reports that are very similar to what you
    would find in non-bioinformatics type work.

    Xho
    Xho Jingleheimerschmidt, Oct 27, 2009
    #10
  11. Xho Jingleheimerschmidt, Oct 27, 2009
    #11
  12. ccc31807

    Dr.Ruud Guest

    Keith Bradnam wrote:

    > I co-teach a Unix & Perl course at UC Davis that is aimed at teaching
    > graduate students how to learn the basics of Perl in a biological
    > context. We have specifically tried to assume no prior knowledge of
    > programming as many people who take our course are new to this.
    >
    > We have made our course materials (data & documentation) freely
    > available to anyone else who is interested:
    >
    > http://korflab.ucdavis.edu/Unix_and_Perl/index.html
    >
    > There is a corresponding Google Group for discussion of issues arising
    > from the course. We also make regular updates to the documentation.
    > Hope this might be of use to you.


    I Like It.

    --
    Ruud
    Dr.Ruud, Oct 28, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GeekBeak
    Replies:
    0
    Views:
    446
    GeekBeak
    Dec 4, 2003
  2. Andrew Dalke
    Replies:
    0
    Views:
    358
    Andrew Dalke
    Apr 8, 2004
  3. hugo
    Replies:
    4
    Views:
    107
    Chris Cole
    Aug 17, 2004
  4. Carolyn
    Replies:
    1
    Views:
    119
    David H. Adler
    Sep 13, 2005
  5. michaelzhao
    Replies:
    0
    Views:
    81
    michaelzhao
    Jun 21, 2007
Loading...

Share This Page