Seqanswers Leaderboard Ad

**HESmith** · 10-24-2016, 11:21 AM

For starters, why don't you identify a few reads that are duplicated in the facility-generated BAM, then compare to the same reads in your self-generated BAM? If you have trouble interpreting the results, post the reads here so we can help.

**fh331** · 10-24-2016, 01:17 PM

Originally posted by HESmith View Post

For starters, why don't you identify a few reads that are duplicated in the facility-generated BAM, then compare to the same reads in your self-generated BAM? If you have trouble interpreting the results, post the reads here so we can help.

Hi HESmith,
Thanks for the reply. Do you mean I should just extract some duplicated reads from the facility-generated bam and see if they're present in my self-generated bam? The number of reads is very similar between the bam.

My impression is that these extra tools used in the facility somehow flags reads as duplicates but when I decompress and realign it, I get rid of the flags somehow and hence there are no duplicates.

**HESmith** · 10-24-2016, 01:35 PM

Question: how did you determine that your self-aligned reads did not contain any duplicates?

**fanli** · 10-24-2016, 01:35 PM

I'd find it unlikely that a ChIP library had 0% duplication. They are in general highly duplicated as you are sequencing a very limited set of input template.

**fh331** · 10-24-2016, 01:45 PM

@HESmith

bamtools stats -in /path/to/my/bam/self-aligned.bam

**fh331** · 10-24-2016, 01:54 PM

Originally posted by fanli View Post

I'd find it unlikely that a ChIP library had 0% duplication. They are in general highly duplicated as you are sequencing a very limited set of input template.

Hi fanli

I agree with you. Since I have used 'bamtools stats' function to get a quick idea. I am assuming bamtools isn't very stringent in marking duplicates. If I use piccard markduplicates, I think there will be some level of duplication. I can put updated info about that tomorrow.

On the other hand, what level of duplication is normal?

**HESmith** · 10-24-2016, 04:07 PM

Originally posted by fh331 View Post

Do you mean I should just extract some duplicated reads from the facility-generated bam and see if they're present in my self-generated bam?

Duplicate reads are identified by alignment (chromosome/position) information. You want to determine if that information is the same for facility vs. self alignments. Find a few duplicates in the former, then examine the same reads in the latter. Either the alignment information will match (which means that bamtools is not counting the duplicates) or not (indicates a discrepancy b/t the aligners) or the duplicates are missing from the latter (indicates removal of duplicates).

**Chipper** · 10-25-2016, 02:37 AM

Samtools tview

Always look at the reads, not just the stats. The number of unique fragments is what matters, not the duplication rate. 80% duplicates would be useless if you sequenced 2 M reads, but may be ok if you sequenced 200 M.

**fh331** · 10-25-2016, 05:53 AM

Originally posted by HESmith View Post

Duplicate reads are identified by alignment (chromosome/position) information. You want to determine if that information is the same for facility vs. self alignments. Find a few duplicates in the former, then examine the same reads in the latter. Either the alignment information will match (which means that bamtools is not counting the duplicates) or not (indicates a discrepancy b/t the aligners) or the duplicates are missing from the latter (indicates removal of duplicates).

I ran piccard MarkDuplicates on my self-aligned bams and if i check the stats on bamtools after marking duplicates, it returns the same level of duplication. So i think i was just not doing it the right way. After running bwa, I guess i need to markduplicates before checking stats. Something learnt by newbie!

**HESmith** · 10-25-2016, 05:58 AM

Glad that you were able to sort out the problem.

**fh331** · 10-25-2016, 05:59 AM

Originally posted by Chipper View Post

Samtools tview

Always look at the reads, not just the stats. The number of unique fragments is what matters, not the duplication rate. 80% duplicates would be useless if you sequenced 2 M reads, but may be ok if you sequenced 200 M.

Hi Chipper,

Thanks for the reply. How does this tview work? I can't seem to find anything about it besides this: http://samtools.sourceforge.net/tview.shtml

which isn't very informative

**HESmith** · 10-25-2016, 06:03 AM

'tview' is a terminal-based genome viewer. It would allow a quick spot-check of duplication (by visualizing the endpoints of the aligned reads), but it wouldn't calculate the fraction of your reads that are unique.

**fh331** · 10-25-2016, 06:08 AM

Originally posted by fh331 View Post

Hi Chipper,

Thanks for the reply. How does this tview work? I can't seem to find anything about it besides this: http://samtools.sourceforge.net/tview.shtml

which isn't very informative

found it in samtools manual!!! thanks

**fh331** · 10-25-2016, 06:21 AM

@HESmith

Thanks very much. I highly appreciate help from all the experienced users.

For future reference, what can I do better to avoid getting so much duplication levels in chipseq samples? Is it better to start with a lot of DNA, less number of pcr cycles during library prepartion? Any tips would make my life way easier!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Advice on PE ChIP Alignment & Pre-processing Steps??

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News