SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Different duplicate results produced by MarkDuplicates and EstimateLibraryComplexity novabc Bioinformatics 2 03-07-2013 06:20 AM
Strange Bioanalyzer Results, PLEASE HELP!!! Ace5858 Sample Prep / Library Generation 3 09-11-2012 09:55 AM
BLAST+ strange results nupurgupta Bioinformatics 6 06-08-2012 09:53 AM
strange results of samtools liying Bioinformatics 3 09-23-2011 12:02 AM
Strange DE results SMcTaggart Bioinformatics 0 11-25-2010 05:53 AM

Reply
 
Thread Tools
Old 08-15-2014, 02:04 PM   #1
shuoguo
Member
 
Location: Memphis

Join Date: Sep 2012
Posts: 23
Default markduplicates strange results?

i have a small bam for testing and found that before and after deduplication, the number reads does not seem to match.
I used the most recent picard tool kit.

before remove duplication, there are 178306 reported duplications
after remove duplication, there are 0

but, the total number of reads just dropped 1586 (453052 - 451466).

can anyone give any insights?

$ samtools flagstat 5.bam
453052 + 0 in total (QC-passed reads + QC-failed reads)
178306 + 0 duplicates
447531 + 0 mapped (98.78%:-nan%)
453052 + 0 paired in sequencing
226526 + 0 read1
226526 + 0 read2
437506 + 0 properly paired (96.57%:-nan%)
443874 + 0 with itself and mate mapped
3657 + 0 singletons (0.81%:-nan%)
5006 + 0 with mate mapped to a different chr
4378 + 0 with mate mapped to a different chr (mapQ>=5)

$ samtools flagstat 5.dedup.bam
451466 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
445945 + 0 mapped (98.78%:-nan%)
451466 + 0 paired in sequencing
225684 + 0 read1
225782 + 0 read2
436046 + 0 properly paired (96.58%:-nan%)
442404 + 0 with itself and mate mapped
3541 + 0 singletons (0.78%:-nan%)
5002 + 0 with mate mapped to a different chr
4374 + 0 with mate mapped to a different chr (mapQ>=5)
shuoguo is offline   Reply With Quote
Old 08-15-2014, 02:21 PM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

The question then becomes exactly how you removed the duplicates.
dpryan is offline   Reply With Quote
Old 08-15-2014, 02:23 PM   #3
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I don't know the answer to your question, but it looks like after duplicate removal there is a different number of read1's than read2's. That's not good! If two reads are PCR duplicates, then obviously their mates will be as well. I don't know if that behavior is intentional or a bug (I don't use Picard), but I would not want to do that to my data.
Brian Bushnell is offline   Reply With Quote
Old 08-15-2014, 02:26 PM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

@Brian: Note that there's a change in singletons as well.
dpryan is offline   Reply With Quote
Old 08-15-2014, 07:24 PM   #5
shuoguo
Member
 
Location: Memphis

Join Date: Sep 2012
Posts: 23
Default

Oh I missed that part. Does Picard actually remove reads?
I set the remove reads to be true though.

This bam is subsampling from a large one at .5% ratio using samtools.
Could samtools not sampling the paired reads?

Thanks and I will post my cmd once I have my computer...

Quote:
Originally Posted by Brian Bushnell View Post
I don't know the answer to your question, but it looks like after duplicate removal there is a different number of read1's than read2's. That's not good! If two reads are PCR duplicates, then obviously their mates will be as well. I don't know if that behavior is intentional or a bug (I don't use Picard), but I would not want to do that to my data.
shuoguo is offline   Reply With Quote
Old 11-23-2016, 06:06 PM   #6
Wort John
Junior Member
 
Location: shanghai,China

Join Date: Nov 2016
Posts: 1
Default

Quote:
Originally Posted by dpryan View Post
@Brian: Note that there's a change in singletons as well.
What is the singleton sequence ? It means the single read ? Can you explain it in detail ? Thank you !
Wort John is offline   Reply With Quote
Old 11-23-2016, 09:22 PM   #7
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

In this context, a singleton is a pair in which only one of the reads mapped.
Brian Bushnell is offline   Reply With Quote
Old 10-03-2018, 05:47 AM   #8
juanita
Junior Member
 
Location: Europe

Join Date: Sep 2018
Posts: 2
Default

I have a similar issue because after removing duplicates with Picard, I ran ValidateSamFile and I got a substantial proportion of MATE_NOT_FOUND reads.

Before removing duplicates (with ValidateSamFile):
"No errors found"

After removing duplicates:
"ERROR:MATE_NOT_FOUND 1072882"
juanita is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:44 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO