![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
target mapped read percentage | eren | Illumina/Solexa | 0 | 08-13-2011 11:06 PM |
Percentage of mapped reads ? | zack80.liu | Bioinformatics | 6 | 03-01-2011 10:08 AM |
low percentage of reads mapped | rahilsethi | SOLiD | 3 | 09-13-2010 07:01 AM |
SOLiD SAGE: low percentage of reads mapped | rahilsethi | SOLiD | 0 | 09-09-2010 12:04 PM |
Percent of ChIP-seq reads mapped | kmcarr | Illumina/Solexa | 2 | 04-02-2009 12:04 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Portugal Join Date: Mar 2008
Posts: 9
|
![]()
Dear all,
I am starting to work with Chip-seq data to identify binding sites of transcription factors and we have just received a test run from the company in order to decide to go ahead with the sequencing. We are using Hiseq. For a test run we got 100,000 reads of each library. We have sequenced for each sample the correspondent input. For the samples 40-50% reads mapped to the human genome (allowing max 2 mismaches) whereas the input had a percentage of mapped reads ~90% . I was expecting to obtain a lower percentage of mapped reads in the input and not in the sample. I used bowtie to make the alignment. I would like to know if someone as had similar percentages of mapped reads in Chip-seq experiments. Thanks Andreia |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Stockholm, Sweden Join Date: Feb 2008
Posts: 319
|
![]()
This could depend on a lot of factors but 40-50% sounds low on the face of it. How does the quality score distribution of the ChIPped sequences look compared to the input? What read lengths are you using?
|
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Portugal Join Date: Mar 2008
Posts: 9
|
![]()
Thanks for answering! The read length is 49 bp. What do you mean by the quality score distibution? I have checked the average of the QS per position on the read and the quality score distribution is similar between the sample and the input. In the 5' end we have an average ~38 and at the 3'end ~34.
|
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: Portugal Join Date: Mar 2008
Posts: 9
|
![]()
one more detail is single-end sequencing
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Stockholm, Sweden Join Date: Feb 2008
Posts: 319
|
![]()
I was thinking that maybe the quality scores would be lower at the 3' end for the sample, but that doesn't appear to be the case. I'm not sure what the explanation could be then, maybe something in the sample preparation?
|
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Portugal Join Date: Mar 2008
Posts: 9
|
![]()
it could happen that the enrichment was far from perfect, but why does the input has such a high proportion of mapped reads? shouldn't the input have a lower proportion because it is genomic DNA, so it has repeats, telomeric regions which will map to many locations?
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Stockholm, Sweden Join Date: Feb 2008
Posts: 319
|
![]()
Well, maybe ... but input DNA sequencing also does not give an unbiased representation of the genome; open-chromatin regions like TSS are overrepresented there too, see e g
http://www.plosone.org/article/info:...l.pone.0005241 http://www.biomedcentral.com/1471-2164/12/134 |
![]() |
![]() |
![]() |
#8 |
Member
Location: Ireland Join Date: Mar 2010
Posts: 41
|
![]()
Just wondering "mapped reads" are unique (I mean set -m 1 in Bowtie)? If so, probably 90% of mapped reads (only 49 bp in length) seems so high as ~50% of the human genome are masked by RepeatMasker?
BTW, It would be helpful to see the distribution of sequence quality by using fastQC: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ |
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: Portugal Join Date: Mar 2008
Posts: 9
|
![]()
thanks for your message. I took a bit because I was checking how many reads were uniquely mapped. In samples ~40-50% were uniquely mapped and in input ~80%. In attachment you can see the distribution of the QS, Lib1 is a sample and Lib2 is the corresponding input.
thanks for the help |
![]() |
![]() |
![]() |
#10 |
Member
Location: Ireland Join Date: Mar 2010
Posts: 41
|
![]()
I could not find the attachment
![]() |
![]() |
![]() |
![]() |
#11 |
Junior Member
Location: Portugal Join Date: Mar 2008
Posts: 9
|
![]()
sorry just noticed there was a problem in the attachment
|
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: Portugal Join Date: Mar 2008
Posts: 9
|
![]()
this image is for the sample the previous one was for input
|
![]() |
![]() |
![]() |
#13 |
Junior Member
Location: Portugal Join Date: Mar 2008
Posts: 9
|
![]()
can you explain me what do you mean by marking PCR duplicate?
|
![]() |
![]() |
![]() |
#14 |
Member
Location: Ireland Join Date: Mar 2010
Posts: 41
|
![]()
This is very good QS I think as the average is high and the variation is low (even though at 3' end). Could you show your Bowtie command?
|
![]() |
![]() |
![]() |
#15 |
Junior Member
Location: Portugal Join Date: Mar 2008
Posts: 9
|
![]()
-f -a --best --strata -v 2 hg19 fasta
then I selected from these the unique alignments can you tell me what do you mean by marking PCR duplicate? |
![]() |
![]() |
![]() |
#16 | |
Member
Location: Ireland Join Date: Mar 2010
Posts: 41
|
![]() Quote:
http://picard.sourceforge.net/comman...overview.shtml You can find more information about that in this forum as well. |
|
![]() |
![]() |
![]() |
#17 |
Member
Location: Ireland Join Date: Mar 2010
Posts: 41
|
![]()
I think that is fine when you chose the -v alignment mode since the QS is good.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|