Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Read distribution at high sequence depth

    Hello everyone,

    recently, I've been looking at two coverage plots from the same human material sequenced two times at (on average) 8x (run1) and at 30x (run2) sequence depth. I noticed (only by eye) a significant difference in the read distribution leading to high peaks in run2 and a bit messy picture, while the read distribution in run1 looks very "nice" and pretty flat. Is there a problem in the data or is this a usual picture when you deal with data of very high sequencing depth (> 30x)? Is there maybe some kind of "exponential" gain on special genomic regions like gc-rich / -poor, repetetive regions etc. that getting more and more significant the higher the sequencing depth gets?

    I'd be very interested in your opinions and experiences and would be very thankful for some ideas.

    Cheers,
    Christoph

  • #2
    Originally posted by ForeignMan View Post
    recently, I've been looking at two coverage plots from the same human material sequenced two times at (on average) 8x (run1) and at 30x (run2) sequence depth. I noticed (only by eye) a significant difference in the read distribution leading to high peaks in run2 and a bit messy picture, while the read distribution in run1 looks very "nice" and pretty flat. Is there a problem in the data or is this a usual picture when you deal with data of very high sequencing depth (> 30x)? Is there maybe some kind of "exponential" gain on special genomic regions like gc-rich / -poor, repetetive regions etc. that getting more and more significant the higher the sequencing depth gets?
    Was it the same library used in each run or were two different libraries prepared?

    Assuming the latter, at a guess, i'd say the PCR step during the second library prep has biased the result.

    Comment


    • #3
      How about posting a picture of it? ("worth a thousand words")

      What *seq is it? Whole? rnaseq? chipseq?

      If your output is BAM, try the rmdup command on the bam and for take a look at the rmdup'ed output bam file.

      Comment


      • #4
        Thanks for your answers and sorry I was lacking so much information. Was hoping it might be a quite general or even normal effect.
        So, the whole genome has been (paired-end) sequenced two times (100bp per read). For each run a new library has been prepared. Additionally, the first run comes from Illumina's GA II, the second one from the new HiSeq2000.
        I have the alignment (used BWA) in BAM format and removed duplicates with Picard.
        I attached an example image of chromosome 1 (run1 is grey, run2 yellow; y-axis runs from 0-40; coverage has been computed over 100.000bp windows). It does not look that bad, but I was just wondering if these deviations, ups and downs, can only be explained by the different sequencing conditions (library, technology) or if you have to expect this in data with high sequence depth. I'm also interested in doing a copynumber analysis with this sequencing data and was asking myself if this is a common effect that can be reduced by normalization (by gc-content, mappability regions etc.) or if the data is really biased.
        Thank you for your help and interest.

        Last edited by ForeignMan; 05-25-2011, 12:01 AM.

        Comment


        • #5
          I cant see http://imiblinux05.uni-muenster.de/~...s_coverage.jpg

          Error message is : 404

          Not Found

          The requested URL /~c_bart07/sc_circos_coverage.jpg was not found on this server.

          Comment


          • #6
            Strange ... I can see the image here and is has a complete different URL.
            Maybe this direct link works:
            http://s2.postimage.org/tt5qlhll7/sc...s_coverage.jpg

            Comment


            • #7
              Yes, it should be flatter. (or "rounder", in your image). Good example: http://postimage.org/image/1ohvgyx6s/ You can see tumor copy number changes. My image is log scaled, not logged would look even flatter.

              Note the anomaly of high coverage next to centromere on short arm (a frequent occurrence near repeated regions near the centromeres). The "low coverage" has it, the high doesn't. It should be flatter.

              I do not know what is wrong and can only recommend some desperate measures: 1) don't remove duplicates and see. 2) take random sample of reads and check that they're lining up as reported (you're not displaying hg18 alignments on hg19 display, for instance).

              Otherwise, it's not flat (or flatter) but should be.
              Attached Files

              Comment


              • #8
                Originally posted by ForeignMan View Post
                Strange ... I can see the image here and is has a complete different URL.
                Maybe this direct link works:
                http://s2.postimage.org/tt5qlhll7/sc...s_coverage.jpg
                Agreed - it's a bit odd. Then again, even the first one isn't exactly flat, it has a broadly similar profile except lower.

                Incidentally, how did you do the alignment? All reads against all chromosomes - i assume the material wasn't pre-separated per chromosome? And what did you do with ambiguous reads?

                Comment


                • #9
                  These are TCGA reads from various sources. You can view the bigwig (zoomable wiggle files) coverage tracks for various diseases at cgwb.nci.nih.gov . Check the various NG tracks. I'm sure the various TCGA research institutions align whole genome reads against all chromosomes (or at least chr1-22,X,Y,M , not sure about the "random" or "unattached" genomic chunks), with no chromosome separation. BWA is the weapon of mass alignment used in most TCGA samples (all TCGA bams? ... I'm not sure). BWA assigns ambiguous reads randomly, i.e. it just picks one of the alignments. SNP calling in ambiguous regions is hard.

                  I'm wondering if there's some sort of "accordion effect" going in your circular view. Imagine taking an accordion and wrapping it around into an O shape: the inner edge is the same length as starting flat length but outer edge is wavy and longer. There may be an exaggeration effect.

                  There is some vague resemblance to high "mountain ranges" and CG content, I must admit.

                  Another desperate check : did you align all whole reads against one chromosome only? probably not i hope

                  Comment


                  • #10
                    Thanks a lot for your comments! And for the link to cgwb.nci.nih.gov.

                    Your guess was right, Richard. I used BWA for the alignment and ambiguos reads were assigned randomly. And, of course, I aligned to the complete human genome, not only to chromosome 1. Chose this one only for the example to save some space, and since no copynumber change is expected for this chromosome. The profile looks very similar over the complete genome.

                    I don't think that this "accordion effect" should be very significant here, although I really like the image. Then, apart from the radius, this effect holds for all coverage profiles. Of course, one has to be careful analysing such dense plots, but I think it works for a quick comparison since all datasets were plotted under the same conditions. If the coverage profile would have been good you could definitely see it here .

                    But I agree to tony noting the similarity to the lower profile. That's also why I got the idea of some kind of stronger deviations having a higher sequence depth depth (like, getting very naive now, having four times the deviation when having four times the depth). And this in correlation with specific regions on the genome. Although Richard's plots and the browser on cgwb.nci.nih.gov look very nice and somehow as I'd expected in my case.

                    I did a copynumber analysis on this "wavy" data (used FREEC) and the copynumber profile looked quite ok, similar to the other "good" one. Although having a few more (but not so very much) artificial gains and losses. The normalization seems to take effect. I was asking myself if there's a common tools that perform only some kind of normalization on alignment data.

                    Thanks again for all your help and ideas!
                    Last edited by ForeignMan; 05-25-2011, 08:39 AM.

                    Comment


                    • #11
                      Originally posted by ForeignMan View Post
                      That's also why I got the idea of some kind of stronger deviations having a higher sequence depth depth (like, getting very naive now, having four times the deviation when having four times the depth).
                      Not sure i understand you.

                      I would expect that 4x the coverage will have very close 4x the deviation from the mean of the coverage (so about the same coefficient of variance) - over a 100K window, poisson noise should be negligible - and every other source of bias should just scale up.

                      Comment


                      • #12
                        "Coefficient of variance" is exactly what I meant. Thanks tony! I was not aware of this measure and it confirms (a bit) that both runs are not so very different and that the deviations and bias scale up. Although it's still quite extreme and not very usual in this case, it helps me understanding the results. I know that the whole experiment is a bit biased, so I guess I had to expect this kind of image.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        8 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        8 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        49 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        67 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X