Monday, June 25, 2007

Quality values in new sequencing technologies?

At the Finishing in the Future conference there were quite a few talks describing attempts to characterize the quality of the data arising from new sequencing technologies (primarily 454 and Solexa - the only two currently available to more than a select few). At least for 454 it appears that the quality values reported by their assembler are a bit optimistic. Jim Knight from 454 mentioned that in his view these values are correct: i.e. they reflect the true accuracy of estimating the DNA sequence from the raw information produced by the sequencer. One reason for the discrepancy in the perceived quality of the 454 data and the reported values could reflect errors introduced at some other stages in the sequencing process. For example, in Sanger sequencing, PCR errors cause a degradation in signal, leading to a corresponding decrease in quality values. For 454 (and other high-throughput technologies) it is possible some errors occurring during the amplification steps sneak through without any degradation of signal, leading to overly-optimistic quality values. Is there a way to quantify the magnitude of such errors to create a better estimate of the quality of the data?

1 comment:

Niranjan said...

I would first try to figure out
the kinds of experimental errors
454 may suffer from by doing extensive comparisons with sanger data. Maybe there is a pattern to situations where 454 says it has the sequence but differs a lot from what sanger produces.