Is it ok to split an aligned bam by chromosome and then run MarkDuplicates on each of the files?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I wouldn't worry about splitting by chromosome and running MarkDuplicates, in fact seems quite astute to me. Do you expect gene duplicates and this is why you are doing it? Or is it due to limited computational resources, or another issue?
I have not tried this but would be interested to see if differences occur between split-by-chr and full bams, can you report back about it if you do that?
Comment
-
Originally posted by Heisman View PostNo in the sense you'll miss PE reads that have each read map to different chromosomes. This is a key advantage of using MarkDuplicates over the samtools method.
Comment
-
Ive been asked to make a pipeline run faster on a distributed system. We can parallize most of the steps (alignment, some of the gatk steps, etc) but the MarkDuplicates is quite time consuming. If I split by chormosome then run on multiple machines it gets done much faster but I don't want it to affect the results.
I am planning tests to compare the deduped bams that are produced soon. I will post the results.
Comment
-
Read through this FAQ for possible tips/explanations to make it faster: http://sourceforge.net/apps/mediawik...=Main_Page#FAQ
bruce01, no idea how common they are but if you want to remove the duplicates you can't split the bam file up by chromosome, I don't think (I could in theory be wrong; I've never considered doing this).
Comment
Latest Articles
Collapse
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-17-2024, 10:28 AM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
12-17-2024, 10:28 AM
|
||
Started by seqadmin, 12-13-2024, 08:24 AM
|
0 responses
42 views
0 likes
|
Last Post
by seqadmin
12-13-2024, 08:24 AM
|
||
Started by seqadmin, 12-12-2024, 07:41 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
12-12-2024, 07:41 AM
|
||
Started by seqadmin, 12-11-2024, 07:45 AM
|
0 responses
42 views
0 likes
|
Last Post
by seqadmin
12-11-2024, 07:45 AM
|
Comment