SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
target mapped read percentage eren Illumina/Solexa 0 08-13-2011 10:06 PM
Percentage of mapped reads ? zack80.liu Bioinformatics 6 03-01-2011 09:08 AM
low percentage of reads mapped rahilsethi SOLiD 3 09-13-2010 06:01 AM
SOLiD SAGE: low percentage of reads mapped rahilsethi SOLiD 0 09-09-2010 11:04 AM
Percent of ChIP-seq reads mapped kmcarr Illumina/Solexa 2 04-01-2009 11:04 PM

Reply
 
Thread Tools
Old 03-25-2011, 01:42 AM   #1
andreiafonseca
Junior Member
 
Location: Portugal

Join Date: Mar 2008
Posts: 9
Default expected percentage of mapped reads in chip-seq experiment

Dear all,


I am starting to work with Chip-seq data to identify binding sites of transcription factors and we have just received a test run from the company in order to decide to go ahead with the sequencing. We are using Hiseq. For a test run we got 100,000 reads of each library. We have sequenced for each sample the correspondent input. For the samples 40-50% reads mapped to the human genome (allowing max 2 mismaches) whereas the input had a percentage of mapped reads ~90% . I was expecting to obtain a lower percentage of mapped reads in the input and not in the sample. I used bowtie to make the alignment. I would like to know if someone as had similar percentages of mapped reads in Chip-seq experiments.
Thanks

Andreia
andreiafonseca is offline   Reply With Quote
Old 03-25-2011, 01:59 AM   #2
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

This could depend on a lot of factors but 40-50% sounds low on the face of it. How does the quality score distribution of the ChIPped sequences look compared to the input? What read lengths are you using?
kopi-o is offline   Reply With Quote
Old 03-25-2011, 02:12 AM   #3
andreiafonseca
Junior Member
 
Location: Portugal

Join Date: Mar 2008
Posts: 9
Default

Thanks for answering! The read length is 49 bp. What do you mean by the quality score distibution? I have checked the average of the QS per position on the read and the quality score distribution is similar between the sample and the input. In the 5' end we have an average ~38 and at the 3'end ~34.
andreiafonseca is offline   Reply With Quote
Old 03-25-2011, 02:34 AM   #4
andreiafonseca
Junior Member
 
Location: Portugal

Join Date: Mar 2008
Posts: 9
Default

one more detail is single-end sequencing
andreiafonseca is offline   Reply With Quote
Old 03-25-2011, 02:38 AM   #5
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

I was thinking that maybe the quality scores would be lower at the 3' end for the sample, but that doesn't appear to be the case. I'm not sure what the explanation could be then, maybe something in the sample preparation?
kopi-o is offline   Reply With Quote
Old 03-25-2011, 02:45 AM   #6
andreiafonseca
Junior Member
 
Location: Portugal

Join Date: Mar 2008
Posts: 9
Default

it could happen that the enrichment was far from perfect, but why does the input has such a high proportion of mapped reads? shouldn't the input have a lower proportion because it is genomic DNA, so it has repeats, telomeric regions which will map to many locations?
andreiafonseca is offline   Reply With Quote
Old 03-25-2011, 02:56 AM   #7
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Well, maybe ... but input DNA sequencing also does not give an unbiased representation of the genome; open-chromatin regions like TSS are overrepresented there too, see e g

http://www.plosone.org/article/info:...l.pone.0005241
http://www.biomedcentral.com/1471-2164/12/134
kopi-o is offline   Reply With Quote
Old 03-25-2011, 03:08 AM   #8
ttnguyen
Member
 
Location: Ireland

Join Date: Mar 2010
Posts: 41
Default

Just wondering "mapped reads" are unique (I mean set -m 1 in Bowtie)? If so, probably 90% of mapped reads (only 49 bp in length) seems so high as ~50% of the human genome are masked by RepeatMasker?

BTW, It would be helpful to see the distribution of sequence quality by using fastQC:
http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
ttnguyen is offline   Reply With Quote
Old 03-25-2011, 04:27 AM   #9
andreiafonseca
Junior Member
 
Location: Portugal

Join Date: Mar 2008
Posts: 9
Default

thanks for your message. I took a bit because I was checking how many reads were uniquely mapped. In samples ~40-50% were uniquely mapped and in input ~80%. In attachment you can see the distribution of the QS, Lib1 is a sample and Lib2 is the corresponding input.
thanks for the help
andreiafonseca is offline   Reply With Quote
Old 03-25-2011, 04:44 AM   #10
ttnguyen
Member
 
Location: Ireland

Join Date: Mar 2010
Posts: 41
Default

I could not find the attachment . I think 40-50% in samples is normal, but 80% in input is quite high. Would it be worth marking PCR duplicate, or comparing your mapping rates with the public datasets (e.g. ENCODE - I think they had Input as well)?
ttnguyen is offline   Reply With Quote
Old 03-25-2011, 04:47 AM   #11
andreiafonseca
Junior Member
 
Location: Portugal

Join Date: Mar 2008
Posts: 9
Default

sorry just noticed there was a problem in the attachment
Attached Images
File Type: png per_base_quality_Lib2_resized2.png (13.3 KB, 28 views)
andreiafonseca is offline   Reply With Quote
Old 03-25-2011, 04:49 AM   #12
andreiafonseca
Junior Member
 
Location: Portugal

Join Date: Mar 2008
Posts: 9
Default

this image is for the sample the previous one was for input
Attached Images
File Type: png per_base_quality_Lib2_resized2.png (9.9 KB, 12 views)
andreiafonseca is offline   Reply With Quote
Old 03-25-2011, 04:50 AM   #13
andreiafonseca
Junior Member
 
Location: Portugal

Join Date: Mar 2008
Posts: 9
Default

can you explain me what do you mean by marking PCR duplicate?
andreiafonseca is offline   Reply With Quote
Old 03-25-2011, 04:59 AM   #14
ttnguyen
Member
 
Location: Ireland

Join Date: Mar 2010
Posts: 41
Default

This is very good QS I think as the average is high and the variation is low (even though at 3' end). Could you show your Bowtie command?
ttnguyen is offline   Reply With Quote
Old 03-25-2011, 05:03 AM   #15
andreiafonseca
Junior Member
 
Location: Portugal

Join Date: Mar 2008
Posts: 9
Default

-f -a --best --strata -v 2 hg19 fasta

then I selected from these the unique alignments

can you tell me what do you mean by marking PCR duplicate?
andreiafonseca is offline   Reply With Quote
Old 03-25-2011, 05:06 AM   #16
ttnguyen
Member
 
Location: Ireland

Join Date: Mar 2010
Posts: 41
Default

Quote:
Originally Posted by andreiafonseca View Post
can you explain me what do you mean by marking PCR duplicate?
The DNA fragments maybe duplicated due to PCR amplification. It means you may have many reads mapped to the same position. Picard can help with removing duplicate.
http://picard.sourceforge.net/comman...overview.shtml

You can find more information about that in this forum as well.
ttnguyen is offline   Reply With Quote
Old 03-25-2011, 05:14 AM   #17
ttnguyen
Member
 
Location: Ireland

Join Date: Mar 2010
Posts: 41
Default

I think that is fine when you chose the -v alignment mode since the QS is good.
ttnguyen is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO