Most helpful client reviews
34 of 36 persons found the following review helpful.
Excellent overview of probabilistic computational biology
By Dr. Lee D. Carlson
This book is a very well written overview to concealed Markov models and context-free grammar methods in computational biology. The writers have written a book that is utile to both biologists and mathematicians. Biologists with a background in prospect theory equivalent to a senior-level course will have to be capable to follow along without any trouble. The approach the author's take in the book is very intuitive and they motivate the conceptions with elementary examples before moving on to the more abstract definitions. Exercises likewise abound in the book, and they are straightforward sufficient to work out, and will have to be if one desires an in-depth understanding of the main text. In addition, there is a software package called HMMER, devised by one of the writers (Eddy) that is in the public domain and may be downloaded from the Internet. The package quintessentially uses concealed Markov models to carry out sequence analysis using the methods outlined in the book.
Probabilistic modeling has been applied to some dissimilar areas, including speech recognition, network performance analysis, and computational radiology. An overview of probabilistic modeling is given in the basi chapter, and the writers efficaciously introduce the conceptions without heavy abstract formalism, which for completeness they delegate to the last chapter of the book. Bayesian parameter estimation is introduced as well as greatest or most complete or best possible likelihood estimation. The writers take a pragmatic attitude in the utility of these dissimilar approaches, with both being devised in the book.
This is followed by a treatment of pairwise alignment in Chapter Two, which begins with substitution matrices. They point out, thru a lot of exercises, the role of physics in influencing peculiar alignments (hydrophobicity for example). Global alignment thru the Gotoh algorithm and local alignment thru the Smith-Waterman algorithm, are both discussed very effectively. Finite state machines with accompanying diagrams are employed to talk about dynamic programming approaches to sequence alignment. The BLAST and FASTA packages are briefly discussed, along with the PAM and BLOSUM matrices.
Hidden Markov models are treated exhaustively in the next chapter with the Viterbi and Baum-Welch algorithms playing the central role. HIdden Markov models are then employed in Chapter 4 for pairwise alignment. State diagrams are again used very efficaciously to illustrate the applicable ideas. Profile concealed Markov models which, according to the writers are the most frequent application of concealed Markov models, are treated in detail in the next chapter. A very surprising application of Voronoi diagrams from computational geometry to weighting training sequences is given.
Several dissimilar approaches, such as Barton-Sternberg, CLUSTALW, Feng-Doolittle, MSA, simulated annealing, and Gibbs sampling are applied to multiple sequence alignment methods in Chapter 6. It is very well written, with the only disappointment being that only one exercise is given in the entire chapter. Phylogenetic trees are covered in Chapter 7, with special and significant stress placed on tree building algorithms using parsimony. The next chapter discusses the same topic from a probabilistic perspective. This to me was the most interesting percentage of the book as it connects the sequence alignment algorithms with evolutionary models.
The writers switch gears starting with the next chapter on transformational grammars. It is intriguing to see how conceptions applied in compiler construction may be generalized to the probabilistic case and then used to computational biology. The PROSITE database is given as an example of the application of regular grammars to sequence matching. This chapter is arousing and attention holding reading, and there are a good deal of straightforward exercises illustrating the main points.
The last chapter covers RNA structure analysis, which introduces the conception of a pseudoknot. These are not to be confused with the frequent knot constructions that may be employed to the topology of DNA, but rather result from the existence of non-nested base pairs in RNA sequences. The writers talk about a good deal of other proficiencies applied in RNA sequence analysis and take care to point out which ones are more practical from a computational point of view. Surprisingly, genetic algorithms and algorithms based on Monte Carlo sampling are not discussed in the book, but the writers do give references for the mesmerized reader.
The best attribute of this book is that the writers take a pragmatic point of view of how mathematics may be applied to troubles in computational biology. They are not dogmatic in regards to any peculiar approach, but rather fit the algorithm to the problem at hand.
21 of 23 people found the following review helpful.
Fantastic Descriptions of Probabilistic Sequence Algorithms
By Bob Carpenter
I picked up this book at the recommendation of a number of colleagues in computational linguistics and speech processing as a way to find out what's going on in biological sequence analysis. I was hoping to learn with regards to apps of the kinds of algorithms I know for handling speech and language, such as HMM decoding and context-free grammar parsing, to biological sequences. This book delivered, as recommended.
As the title implies, "Biological Sequence Analysis" focuses almost exlusively on sequence analysis. After a brief overview of stats (more a reminder than an introduction), the initial half of the book is consecrated to alignment algorithms. These algorithms take pairs of sequences of bases making up DNA or sequences of amino acids making up proteins and provide optimal alignments of the sequences or of subsequences according to respective statistical models of match likelihoods. Methods analyzed include edit distances with respective substitution and gapping penalties (penalties for sectionalizations that don't match), Hidden Markov Models (HMMs) for alignment and likewise for classification versus families, and finally, multiple sequence alignment, where alignment is generalized from pairs to sets of sequences. I found the section on building phylogenetic trees by means of hierarchical clustering to be the most arousing and attention holding division of the book (especially given it is practical application to classifying wine varietals!). The remainder of the book is committed to higher-order grammars such as context-free grammars, and their stochastic generalization. Stochastic context-free grammars are used to the analysis of RNA secondary structure (folding). There is a good discussion of the CYK dynamic programming algorithm for non-deterministic context-free grammar parsing; an algorithm that is effortlessly employed to finding the best parse in a probabilistic grammar. The demonstrations of the dynamic programming algorithms for HMM decoding, edit distance minimization, hierarchical clustering and context-free grammar parsing are as good as I've seen anywhere. They are precise, insightful, and informative without being overly subscripted. The illustrations provided are exceedingly helpful, including their positioning on pages where they're relevant.
This book is purposed at biologists attempting to learn when it comes to algorithms, which is clear from the terse descriptions of the underlying biological problems. The technical details were so clear, though, that I was capable to effortlessly follow the algorithms even if I wasn't always sure regarding the genetic applications. After studying a great deal of introductions to genetics and coming back to this book, I was competent to follow the application discussions much more easily. This book assumes the reader is intimate with algorithms and is comfortable controlling a lot of statistics; a gentler introduction to precisely the same mathematics and algorithms may be found in Jurafsky and Martin's "Speech and Language Processing". For biologists who want to see how sequence stats and algorithms employed to language, I would suggest Manning and Schuetze's "Foundations of Statistical Natural Language Processing". Although it is much more demanding computationally, more details on all of these algorithms, as well as galore more background on the biology, along with a good deal of genuinely nifty complexity analysis may be found in Dan Gusfield's "Algorithms on Strings, Trees and Sequences".
In these days of fly-by-night copy-editing and typesetting, I in truth be grateful for Cambridge University Press's refined and tasteful style and attention to detail. Durbin, Eddy, Krogh and Mitchison's "Biological Sequence Analysis" is as finelooking and readable as it is useful.
14 of 16 people found the following review helpful.
Best practical introduction
By biochemprof
This is the best introduction to latest probabilistic sequence analysis methods. However, the book suffers from somewhat convoluted writing and organization. More importantly, it lacks a broader theoretical overview of the dissimilar methods. The methods are staged as a bunch of tools without sufficient critical assessment of their effectiveness or the relative intensities of their underlying theoretical models. I would have welcomed more discussion of how they all fit in a more prominent probabilistic picture... what are the dissimilar simplifications and assumptions made for the sake of simplicity and computation?
See all 19 client reviews...
|