Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reference-Free Validation of Short Read Data



    Reference-Free Validation of Short Read Data


    Jan Schröder1,2*, James Bailey1,2, Thomas Conway2, Justin Zobel1,2

    1 Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia, 2 NICTA Victoria Research Laboratory, Parkville, Victoria, Australia


    Background
    High-throughput DNA sequencing techniques offer the ability to rapidly and cheaply sequence material such as whole genomes. However, the short-read data produced by these techniques can be biased or compromised at several stages in the sequencing process; the sources and properties of some of these biases are not always known. Accurate assessment of bias is required for experimental quality control, genome assembly, and interpretation of coverage results. An additional challenge is that, for new genomes or material from an unidentified source, there may be no reference available against which the reads can be checked.

    Results
    We propose analytical methods for identifying biases in a collection of short reads, without recourse to a reference. These, in conjunction with existing approaches, comprise a methodology that can be used to quantify the quality of a set of reads. Our methods involve use of three different measures: analysis of base calls; analysis of k-mers; and analysis of distributions of k-mers. We apply our methodology to wide range of short read data and show that, surprisingly, strong biases appear to be present. These include gross overrepresentation of some poly-base sequences, per-position biases towards some bases, and apparent preferences for some starting positions over others.

    Conclusions
    The existence of biases in short read data is known, but they appear to be greater and more diverse than identified in previous literature. Statistical analysis of a set of short reads can help identify issues prior to assembly or resequencing, and should help guide chemical or statistical methods for bias rectification.

  • #2
    This is an interesting paper, and it would be more interesting if the authors could stratified the plots by read quality. I guess what is happening is the base caller makes consistent errors on weak bases. In addition, part of the bias comes from the GC bias.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Recent Advances in Sequencing Analysis Tools
      by seqadmin


      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
      05-06-2024, 07:48 AM
    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 05-10-2024, 06:35 AM
    0 responses
    20 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-09-2024, 02:46 PM
    0 responses
    24 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-07-2024, 06:57 AM
    0 responses
    21 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-06-2024, 07:17 AM
    0 responses
    21 views
    0 likes
    Last Post seqadmin  
    Working...
    X