Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard Markduplicates - High number of unmatched pairs :(

    Hi everyone

    I've run into a predicament lately, which I'm hoping to gather advice on. We've done paired end illumina whole genome sequencing on a human sample.

    I have 4 lanes of data for a sample, and 2 fastq files per lane (reads1fastq and reads2fastq) of which I split into 8-9 files of 10million reads each. When splitting the files, I made sure to split by 10million*4 lines, and the script I wrote compares the first line of each split reads1 fastq file to it's corresponding split reads2 fastq file. I then align each fastq file with BWA (trimming reads down with the parameter q=30) to create a .sai file and then generate alignments/bam files for each pair of split reads1/reads2 files. I then sort and index the small bam files and then merge them into one large bam file - of which I'm trying to run markduplicates on.

    One thing I've noticed is markduplicates is telling me I have a ridiculously high number of unmatched pairs. I ran markduplicates on the smaller bam files too, and the same is true. For example, for one of the smaller bam files:

    INFO 2012-07-19 13:20:48 MarkDuplicates Read 37393628 records. 28252673 pairs never matched.

    Now I'm relatively new to this whole NGS world of data analysis, but I can't imagine having such a high number of unmatched pairs is a good thing.

    Does anyone have any advice, or has encountered a similar problem? I'm wondering if I did something wrong with splitting and trimming/aligning split fastq files?

    I should note that this DNA was extracted from FFPE tissue so it will be of lower quality than the DNA you guys are used to working with. But I want to make sure this is not a technical error on my part before blaming DNA quality.

    Thanks!

  • #2
    Did you let BWA align them as paired-reads?
    Or did you align each of the mates separately?

    Comment


    • #3
      I have the same problem. Did you find a solution?
      Will I have the same result when I align 1 big file and split it, align the split files and merge them ?

      Comment


      • #4
        I have the same problem, too.
        It is a problem with picard MarkDuplicates, though, because when I run samtools flagstat the pairs appear properly matched. Did you find a solution?
        Thanks!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Innovations in Spatial Biology
          by seqadmin


          Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

          3D Genomics
          While spatial biology often involves studying proteins and RNAs in their...
          Yesterday, 07:30 PM
        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-30-2024, 01:35 PM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Working...
        X