Tuesday, October 23, 2007

How to read a paper

I recently gave a presentation in CMSC838 (How to do research) on reading scientific papers. Here are my slides, rough as they may be.

http://www.cs.umd.edu/~gasarch/838/how-to-read-a-paper.pdf

Preparing for this class was quite interesting. I found out that many others have put in their 2c on this matter. I also found this link to one of Steven Salzberg's papers:

http://www.sciencemag.org/feature/data/scope/keystone1/

The paper is presented in its entirety nicely annotated for the key points. Quite the resource for anyone learning how to read.

Thursday, June 28, 2007

Monday, June 25, 2007

Quality values in new sequencing technologies?

At the Finishing in the Future conference there were quite a few talks describing attempts to characterize the quality of the data arising from new sequencing technologies (primarily 454 and Solexa - the only two currently available to more than a select few). At least for 454 it appears that the quality values reported by their assembler are a bit optimistic. Jim Knight from 454 mentioned that in his view these values are correct: i.e. they reflect the true accuracy of estimating the DNA sequence from the raw information produced by the sequencer. One reason for the discrepancy in the perceived quality of the 454 data and the reported values could reflect errors introduced at some other stages in the sequencing process. For example, in Sanger sequencing, PCR errors cause a degradation in signal, leading to a corresponding decrease in quality values. For 454 (and other high-throughput technologies) it is possible some errors occurring during the amplification steps sneak through without any degradation of signal, leading to overly-optimistic quality values. Is there a way to quantify the magnitude of such errors to create a better estimate of the quality of the data?

Wednesday, June 13, 2007

Finishing in the future?

Just to get the blog started, next week I will be attending a conference on "Finishing in the future" . This conference brings together researchers interested in ways to improve the process of finishing a genome - the laborious process aimed at correctly reconstructing every single base in the genome of an organism. This process is quite expensive, both time- and money-wise, and many current sequencing projects opt to produce a "draft" or "high-quality draft" instead of a complete (and correct) sequence. What would it take for this trend to be reversed? This is what the participants at this conference are trying to figure out.

I will be giving a keynote presentation highlighting some of the challenges encountered when attempting to automate the finishing process. I also have some ideas on how to improve the process, but more on that later.