Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • unique reads for downstream analysis

    I had a general query with short fixed length reads. Though I intend it for Solexa data only, it might be directly applicable to solid as well

    For analysis after obtaining the set of reads, do people prefer taking a unique non redundant set of reads, before doing analysis like snp discovery, chip-seq, etc? Are the exactly same reads any information?

    for de novo assembly, velvet behaves slightly different with a unique set of reads, than with some of the reads repeated.
    --
    bioinfosm

  • #2
    I have similar question. I have seen people mentioning to remove redundant reads for chip-seq to minimize the risk of amplification bias. I tried it, and it certainly impact the peak finding a lot. I guess you should not take unique reads for RNA-seq or small RNA-seq. I will be glad to hear what other people's experience and comment about it.

    Comment


    • #3
      I can imagine that the answer depends on the project.
      If in a low coverage sequencing project I have many reads starting at the same position this would suggest PCR duplicates, especially for paired-end reads (assuming that PE duplicates are those for which both reads are exactly duplicated).

      A highly expressed short gene in RNA-seq, on the other hand, will have many reads that start at the same position without them being PCR duplicates.
      Just removing them would then lead to an underestimate of expression level.

      The recent Sanger paper (Kozarewa et al.) calculated expected duplicate frequencies based on average coverage and read length for whole genome sequencing.
      This makes sense, but I think only if you have an even distribution of coverage across the genome.
      If something like mtDNA is present that has excess coverage I could get an overestimate of duplicate frequency if I assume they are all due to amplification bias.

      So, yes, if it were easy to distinguish between duplicates due to high coverage and PCR duplicates, it might be preferable to eliminate them, at least for SNPs, RNA-Seq where counts matter...

      But again, for example in RNA-seq, how to distinguish between duplicates due to high coverage and PCR duplicates ? Maybe calculating an expected duplicate frequency like the Sanger paper but on a gene by gene basis ?

      May I ask what you mean by "velvet behaves slightly different" ?

      Comment


      • #4
        Thanks for the notes.

        Velvet does de novo sequencing and gives different results if you input a non-redundant set of reads, than using all the reads as input
        Another de novo tool edena produces a non redundant set of reads before it proceeds with the de novo assembly...
        --
        bioinfosm

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X