Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • removing duplicate PE reads from unmappable data

    This topic has been discussed a fair bit on seqanswers but I haven't found the answer to this exact question, so am throwing it out there again. I have some 100 bp PE metagenomic data sets from rather low complexity samples. Analysis of duplication levels in read 1 and read 2 separately shows rather high levels of duplication (25-50%). I'm interested in identifying true PCR duplicates through analysis of read 1 and read 2 together-- i.e. true duplicates should be identical at both ends of the molecule. This data is from a community without reference genomes so mapping then using samtools rmdup is not an option. Is there an existing tool for non-map based duplicate removal of PE reads or do I need to cobble something together? I think something like the following could work:

    1. separate out the first xx bp of read 1 and read 2, then merge using fastq-joiner in galaxy or the like
    2. remove exact duplicates using fastx-collapser in galaxy fastx toolkit
    3. extract list of non-duplicated reads from output of fastx-collapser
    4. pull these reads out of the original fastq files for read 1 and read 2

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
51 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
68 views
0 likes
Last Post seqadmin  
Working...
X