# [ANN] "Python for Bioinformatics" available and in stock

Discussion in 'Python' started by Sebastian Bassi, Oct 18, 2009.

1. ### Sebastian BassiGuest

I announced that Python for Bioinformatics was ready, now I want to
announce that is available and in stock in most book sellers.
Worldwide, use Amazon.
In Argentina, it is more convenient buying it from me at MercadoLibre:
Here is my stock:
http://www.flickr.com/photos/sbassi/4018649164/sizes/l/

Book announcement:

Python for Bioinformatics book
"Python for Bioinformatics"
ISBN 1584889292
Amazon: http://www.tinyurl.com/biopython
Publisher: http://www.crcpress.com/product/isbn/9781584889298

This book introduces programming concepts to life science researchers,
bioinformaticians, support staff, students, and everyone who is
interested in applying programming to solve biologically-related
problems. Python is the chosen programming language for this task
because it is both powerful and easy-to-use.

It begins with the basic aspects of the language (like data types and
control structures) up to essential skills on today's bioinformatics
tasks like building web applications, using relational database
management systems, XML and version control. There is a chapter
devoted to Biopython (www.biopython.org) since it can be used for most
of the tasks related to bioinformatics data processing.

There is a section with applications with source code, featuring
sequence manipulation, filtering vector contamination, calculating DNA
melting temperature, parsing a genbank file, inferring splicing sites,
and more.

There are questions at the end of every chapter and odd numbered
questiona are answered in an appendix making this text suitable for
classroom use.

This book can be used also as a reference material as it includes
Richard Gruet's Python Quick Reference, and the Python Style Guide.

DVD: The included DVD features a virtual machine with a special
edition of DNALinux, with all the programs and complementary files
required to run the scripts commented in the book. All scripts can be
tweaked to fit a particular configuration. By using a pre-configured
environment than the author, so he can focus on learning Python. All
code is also available at the http://py3.us/## where ## is the code
number, for example: http://py3.us/57

I've been working on this book for more than two years testing the
examples under different setups and working to make the code
compatible for most versions of Python, Biopython and operating
systems. Where there is code that only works with a particular
dependency, this is clearly noted.

Finally, I want to highlight that non-bioinformaticians out there can
use this book as an introduction to bioinformatics by starting with
the included "Diving into the Gene Pool with BioPython" (by Zachary
Voase and published originally in Python Magazine)

--
SebastiÃ¡n Bassi. Diplomado en Ciencia y TecnologÃ­a.

you agree, on behalf of your employer, to release me from all
obligations and waivers arising from any and all NON-NEGOTIATED
browsewrap, confidentiality, non-disclosure, non-compete and
acceptable use policies ("BOGUS AGREEMENTS") that I have
entered into with your employer, its partners, licensors, agents and
assigns, in perpetuity, without prejudice to my ongoing rights and
privileges. You further represent that you have the authority to release
me from any BOGUS AGREEMENTS on behalf of your employer.

Sebastian Bassi, Oct 18, 2009

2. ### BearophileGuest

Re: "Python for Bioinformatics" available and in stock

Sebastian Bassi, this is an piece from the #5:

ProtSeq = raw_input("Protein sequence: ").upper()
ProtDeg = {"A":4,"C":2,"D":2,"E":2,"F":2,"G":4,"H":2,
"I":3,"K":2,"L":6,"M":1,"N":2,"P":4,"Q":2,
"R":6,"S":6,"T":4,"V":4,"W":1,"Y":2}
SegsValues = []
for aa in range(len(ProtSeq)):

A more pythonic code is:

prot_seq = raw_input("Protein sequence: ").upper()
prot_deg = {...
segs_values = []
for aa in xrange(len(prot_seq)):

Note the use of xrange and names_with_underscores. In Python names are
usually lower case and their parts are separated by underscores.

From #6:

segsvalues=[]; segsseqs=[]; segment=protseq[:15]; a=0
==>
segs_values = []
segs_seqs = []
segment = prot_seq[:15]
a = 0

If you want to limit the space in the book the you can pack those
lines in a single line, but it's better to keep the underscores.

From #18:
prop = 100.*cp/len(AAseq)
return (charge,prop)
==>
prop = 100.0 * cp / len(aa_seq)
return (charge, prop)

Adding spaces between operators and after a comma, and a zero after

From #35:
import re
pattern = "[LIVM]{2}.RL[DE].{4}RLE"
....
rgx = re.compile(pattern)
When the pattern gets more complex it's better to show readers to use
a re.VERBOSE pattern, to split it on more lines, indent those lines as

The #51 is missing.

I like Python and I think Python is fit for bioinformatics purposes,
but 3/4 of the purposes of a book like this are to teach
bioinformatics first and computer science and Python second. And
sometimes a dynamic language isn't fast enough for bioinformatics
pieces of C/D/Java code too, to show and discuss implementations of
algorithms that require more heavy computations (that are often
already implemented inside biopython, etc, but someone has to write
those libs too).
The purpose here is not to teach how to write industrial-strength C
libraries to perform those heavier computations, but to give the
reader an idea (in a real lower-level language) how those libraries
are actually implemented. Because science abhors black boxes, a
scientist must have an idea of how all subsystems she/he/hir is using
are working inside (that's why using Mathematica can be bad for a
scientist, because such person has to write "and here magic happens"
in the produced paper).

Bye,
bearophile

Bearophile, Oct 19, 2009

3. ### Sebastian BassiGuest

Re: "Python for Bioinformatics" available and in stock

On Mon, Oct 19, 2009 at 5:43 AM, Bearophile <> wrote:
> A more pythonic code is:

....
> Note the use of xrange and names_with_underscores. In Python names are
> usually lower case and their parts are separated by underscores.

Regarding underscore (and code notation in general) I wrote in the
book (page 6):

Some code in the book will not follow accepted coding styles for the
following reasons:

* There are some instances where the most didactic way to show a
particular piece of code conflicts with the style guide. On those few
occasions, I choose to deviate from the style guide in favor of
clarity.
* Due to size limitation in a printed book, some names were shortened
and other minor drifts from the coding styles have been introduced.
* To show there are more than one way to write the same code. Coding
style is a guideline, so some programmers don't follow them. You
should be able to read bad'' code, since sooner or later you will
have to read other people's code.

> >From #6:

.....
> If you want to limit the space in the book the you can pack those
> lines in a single line, but it's better to keep the underscores.

#5 to #9 are very introductory programs that are introduced in order
to show standard flow control structures (if-elif-else-for-while). I
think that at this level, packing several lines into one is not the
best option for learning.

> >From #18:

> prop = 100.*cp/len(AAseq)
> return (charge,prop)
> ==>
> prop = 100.0 * cp / len(aa_seq)
> return (charge, prop)

> Adding spaces between operators and after a comma, and a zero after

Yes, you are right.

> >From #35:

> import re
> pattern = "[LIVM]{2}.RL[DE].{4}RLE"
> ...
> rgx = re.compile(pattern)
> When the pattern gets more complex it's better to show readers to use
> a re.VERBOSE pattern, to split it on more lines, indent those lines as

This is a very nice suggestion. I will consider for next edition, but
the book is about 600 pages now, so I have to consider very carefully

> The #51 is missing.

Thank you, it is corrected now. It was an HTML file instead of a .py
file so the script I use didn't notice the original file.

> I like Python and I think Python is fit for bioinformatics purposes,
> but 3/4 of the purposes of a book like this are to teach
> bioinformatics first and computer science and Python second. And

This book does not teach bioinformatics, let me copy the "Who Should

"This book is for the life science researcher who wants to learn how
to program. He may have previous exposure to computer programming, but
this is not necessary to understand this book (although it surely
helps).

This book is designed to be useful to several separate but related
audiences, students, graduates, postdocs, and staff scientists, since
all of them can benefit from knowing how to program.

Exposing students to programming at early stages in their career helps
to boost their creativity and logical thinking, and both skills can be
applied in research. In order to ease the learning process for
students, all subjects are introduced with the minimal prerequisites.
There are also questions at the end of each chapter. They can be used
for self-assessing how much you've learnt. The answers are available
to teachers in a separate guide.

Graduates and staff scientists having actual programming needs should
find its several real world examples and abundant reference material
extremely valuable.

Since this book is called \emph{Python for Bioinformatics} it has been
written with the following assumptions in mind:

\begin{itemize}
\item The reader should know how to use a computer. No programming
knowledge is assumed, but the reader is required to have minimum
computer proficiency to be able to use a text editor and handle basic
most instructions from this book will apply to the most common
operating systems (Windows, Mac OSX and Linux); when there is a
command or a procedure that applies only to a specific OS, it will be
clearly noted.

\item The reader should be working (or at least planning to work) with
bioinformatics tools. Even low scale hand made jobs, such as using the
NCBI BLAST to ID a sequence, aligning proteins, primer searching, or
estimating a phylogenetic tree will be useful to follow the examples.
The more familiar the reader is with bioinformatics the better he will
be able to apply the concepts learned in this book.
\end{itemize}

> sometimes a dynamic language isn't fast enough for bioinformatics

Thats depend on what bioinformatic application are you working on. I
think that 3D molecular modeling is a field suitable for a low level
language like C or Fortran, but most bioinformatic applications like
sequence annotation, primer design, sequence processing and curating
biological databases are handled fine with a scripting language like
Python or Perl (btw, Perl is still the most used language in
bioinformatics)

If you are looking for a introduction to bioinformatics book, I don't
think this is a suitable book. But if you want to learn Python for
using in Bioinformatics, you should give it a try.

Best,
SB.

Sebastian Bassi, Oct 21, 2009
4. ### biotic.computer

Joined:
Nov 29, 2009
Messages:
1
Great article Python is great for doing bioinformatics,check this Blog:
bioticcomputer.blogspot.com/

It about Dna simulation with very simple programming syntax and tutorial
have fun

biotic.computer, Nov 29, 2009