SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Error with MarkDuplicates in Picard slowsmile Bioinformatics 13 11-01-2015 03:16 AM
Picard MarkDuplicates error for RNA-Seq RockChalkJayhawk Bioinformatics 6 07-11-2012 02:07 PM
Error "RG ID on SAMRecord not found in header" from Picard's MarkDuplicates.jar‏ cliff Bioinformatics 4 11-10-2011 03:27 AM
MarkDuplicates in picard bair Bioinformatics 3 12-23-2010 11:00 AM
Picard MarkDuplicates wangzkai Bioinformatics 2 05-18-2010 09:14 PM

Reply
 
Thread Tools
Old 05-17-2013, 07:30 AM   #1
chrismit
Junior Member
 
Location: Baltimore

Join Date: Aug 2012
Posts: 6
Default Error with Picard MarkDuplicates

I have some targeted sequencing that I'm aligning in the following manner:
1) bowtie2 alignment to only the targeted regions
2) take the unaligned reads, align to the whole genome
3) remove all unmapped/secondary alignments from each file
4) merge using Picard

The issue is when I am doing MarkDuplicates, I have reads which are aligned in both files. I don't understand how this is possible, as the reads aligned in each case should be mutually exclusive.
chrismit is offline   Reply With Quote
Old 05-17-2013, 09:01 AM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,150
Default

Quote:
Originally Posted by chrismit View Post
I have some targeted sequencing that I'm aligning in the following manner:
1) bowtie2 alignment to only the targeted regions
2) take the unaligned reads, align to the whole genome
3) remove all unmapped/secondary alignments from each file
4) merge using Picard

The issue is when I am doing MarkDuplicates, I have reads which are aligned in both files. I don't understand how this is possible, as the reads aligned in each case should be mutually exclusive.
My guess is that in taking the "unaligned reads" after step 1 the set may have included read pairs where only one mate was unmapped, but that's just a guess.

More to the point, the workflow you have described is not really the a accepted standard for analyzing targeted sequence data. You should just align all your reads to the whole genome from the start. After cleaning up your alignment (e.g. mark duplicates, remove secondary alignments, etc.) then focus on your targeted regions. This is more correct bioinformatically and will avoid the problem you are now dealing with.
kmcarr is offline   Reply With Quote
Old 05-17-2013, 09:08 AM   #3
chrismit
Junior Member
 
Location: Baltimore

Join Date: Aug 2012
Posts: 6
Default

You're right on the read pair, bowtie2 spits out both pairs into the aligned file if either fail to map (but keeps the alignment in the bam file as well).

The reason I did it like this was because I wanted to overestimate any possible SNVs, which for the current project a conservative estimate of non-SNVs is desired. I'm going to just switch steps 1&2 and remove ones with a partial alignment from the fq.
chrismit is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:56 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO