Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • overlaying coverage plots

    I'm trying to overlay coverage plots of individual chromosomes from different experiments to get a quick overview of probable CNVs. I've tried using simple xy plot, ggplot and plotrix packages of R (and I'm a real novice in R) but it seems that my linux machine with 64GB of memory is unable to handle the task. I've also reduced my file size by putting only the coordinates and the coverages derived from samtools pileup into a single file.

    Can someone comment on this and suggest a better and more memory efficient way of doing this? Thank you.

  • #2
    I've found Hilbert Plots very helpful for chromosome coverage at a glance. Try http://www.bioconductor.org/packages...ilbertVis.html
    Bioconductor can encode coverage with efficient run length encoding. Your massive 64GB will be fine.
    -r

    Comment


    • #3
      Are you doing the whole thing in R? I've been doing similar things and I've found it's a lot quicker getting the data ready in python or perl first before using R's plotting functions.
      I extract a simple list of start positions of each read from the SAM file and sort them by chromosome and position. Then I split the genome up into windows of either 50/100/500 Kb etc or 50/100/200 reads and make a file a line for each window and columns for chromosome, start, end, number of test reads and number of normal reads. I then import this file into R, and the plotting is much more painless.

      Comment


      • #4
        Rosyme, Thanks for suggesting the HikbertVis package. It seems to plot the coverage without any problem. However, do you know how to adjust the scale on the y-axis? I have some exceptionally high coverage which skew the whole plot and ylim is not working.

        Henry, how do you decide which may be the best window size to use? Do you mind sharing your script? I was actually thinking about doing something similar to the maq cns2win.

        By the way, I think overlaying is probably not the right word but what I really want to do is superimposing one plot on top of another.
        Last edited by zlu; 08-14-2010, 06:12 AM.

        Comment


        • #5
          For a quick overview you can always upload a wig/bigwig file to the ucsc browser.

          Comment


          • #6
            The best window size is one of those 'how long is a piece of string' questions. It's a signal versus noise question. If I want beautiful plots to put into a talk or impress my boss I use windows of 400 reads. If I want to see small deletions or amplifications I go down to 200 or 100, but the graphs look a bit messier. I tend to feed the data into the DNAcopy package from bioconductor. Using simulated data, it can pick up events using windows of 20 reads, even though the actual graph looks like a random mass of dots.
            My script is currently embarrassing. It's the first one I ever wrote and is a bit of an unholy mess. It needs to be manually installed onto a computer to work and uses my wife's birthday to know when to stop because I didn't know how to end a for loop. Is there any part of it you need and I might try and tidy it up. I have a colleague who is currently preparing a proper statistical package to do all this better than I ever could.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X