Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • expected percentage of mapped reads in chip-seq experiment

    Dear all,


    I am starting to work with Chip-seq data to identify binding sites of transcription factors and we have just received a test run from the company in order to decide to go ahead with the sequencing. We are using Hiseq. For a test run we got 100,000 reads of each library. We have sequenced for each sample the correspondent input. For the samples 40-50% reads mapped to the human genome (allowing max 2 mismaches) whereas the input had a percentage of mapped reads ~90% . I was expecting to obtain a lower percentage of mapped reads in the input and not in the sample. I used bowtie to make the alignment. I would like to know if someone as had similar percentages of mapped reads in Chip-seq experiments.
    Thanks

    Andreia

  • #2
    This could depend on a lot of factors but 40-50% sounds low on the face of it. How does the quality score distribution of the ChIPped sequences look compared to the input? What read lengths are you using?

    Comment


    • #3
      Thanks for answering! The read length is 49 bp. What do you mean by the quality score distibution? I have checked the average of the QS per position on the read and the quality score distribution is similar between the sample and the input. In the 5' end we have an average ~38 and at the 3'end ~34.

      Comment


      • #4
        one more detail is single-end sequencing

        Comment


        • #5
          I was thinking that maybe the quality scores would be lower at the 3' end for the sample, but that doesn't appear to be the case. I'm not sure what the explanation could be then, maybe something in the sample preparation?

          Comment


          • #6
            it could happen that the enrichment was far from perfect, but why does the input has such a high proportion of mapped reads? shouldn't the input have a lower proportion because it is genomic DNA, so it has repeats, telomeric regions which will map to many locations?

            Comment


            • #7
              Well, maybe ... but input DNA sequencing also does not give an unbiased representation of the genome; open-chromatin regions like TSS are overrepresented there too, see e g

              Background The growth of sequencing-based Chromatin Immuno-Precipitation studies call for a more in-depth understanding of the nature of the technology and of the resultant data to reduce false positives and false negatives. Control libraries are typically constructed to complement such studies in order to mitigate the effect of systematic biases that might be present in the data. In this study, we explored multiple control libraries to obtain better understanding of what they truly represent. Methodology First, we analyzed the genome-wide profiles of various sequencing-based libraries at a low resolution of 1 Mbp, and compared them with each other as well as against aCGH data. We found that copy number plays a major influence in both ChIP-enriched as well as control libraries. Following that, we inspected the repeat regions to assess the extent of mapping bias. Next, significantly tag-rich 5 kbp regions were identified and they were associated with various genomic landmarks. For instance, we discovered that gene boundaries were surprisingly enriched with sequenced tags. Further, profiles between different cell types were noticeably distinct although the cell types were somewhat related and similar. Conclusions We found that control libraries bear traces of systematic biases. The biases can be attributed to genomic copy number, inherent sequencing bias, plausible mapping ambiguity, and cell-type specific chromatin structure. Our results suggest careful analysis of control libraries can reveal promising biological insights.

              Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis.

              Comment


              • #8
                Just wondering "mapped reads" are unique (I mean set -m 1 in Bowtie)? If so, probably 90% of mapped reads (only 49 bp in length) seems so high as ~50% of the human genome are masked by RepeatMasker?

                BTW, It would be helpful to see the distribution of sequence quality by using fastQC:

                Comment


                • #9
                  thanks for your message. I took a bit because I was checking how many reads were uniquely mapped. In samples ~40-50% were uniquely mapped and in input ~80%. In attachment you can see the distribution of the QS, Lib1 is a sample and Lib2 is the corresponding input.
                  thanks for the help

                  Comment


                  • #10
                    I could not find the attachment . I think 40-50% in samples is normal, but 80% in input is quite high. Would it be worth marking PCR duplicate, or comparing your mapping rates with the public datasets (e.g. ENCODE - I think they had Input as well)?

                    Comment


                    • #11
                      sorry just noticed there was a problem in the attachment
                      Attached Files

                      Comment


                      • #12
                        this image is for the sample the previous one was for input
                        Attached Files

                        Comment


                        • #13
                          can you explain me what do you mean by marking PCR duplicate?

                          Comment


                          • #14
                            This is very good QS I think as the average is high and the variation is low (even though at 3' end). Could you show your Bowtie command?

                            Comment


                            • #15
                              -f -a --best --strata -v 2 hg19 fasta

                              then I selected from these the unique alignments

                              can you tell me what do you mean by marking PCR duplicate?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin


                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X