Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by dmacmillan View Post
    Code:
    Counting reads over window 10161 bp wide...
      Calculating GC content...
    Argument "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..." isn't numeric in division (/) at bam2windows.pl line 256, <IN> line 576287280.
    It looks like you have something strange on line 576287280 of your GC file (maybe last line?) probably "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..."

    Originally posted by dmacmillan View Post
    I still get a .tab file in the end, though it is only 10mb. Seems like it should be larger especially considering that I am running it on a tumour and normal bam file each of which is ~11gigs.
    The size of the final tab file is largely independent on the size of the input sam/bam, but on the size of the window. If you set '--window 100000' it will produce approx 30000 lines, no matter how large the original bam files. It will provide, however, higher reads per window.

    If you set '--readNum 50' instead, it will produce the number of windows required to have, on average, 50 reads per window in the sample with least number of reads. In this case, larger sam/bam files will produce larger tab files

    WARNING! CNAnorm was designed for low coverage data. In particular, if you have large bam files and set parameters to have MANY windows, function peakPloidy and addDNACopy will take very long.
    There are ways to solve this, I might write a tutorial in future...

    Originally posted by dmacmillan View Post
    However, when running R, I put my tab file into a CN object and follow the steps in the manual. But when I get to the plotPeaks this happens
    Code:
    plotPeaks(CN, special1 = 'chrX', special2 = 'chrY')
         chrX does not have enough values to estimate density. Skipping.
         chrY does not have enough values to estimate density. Skipping.
    This is just a warning, and it is beacause your reference chromosomes are not called 'chrX' and 'chrY' but, I guess, 'X' and 'Y', similarly, you should change
    Code:
    toSkip <- c("chrY", "chrM")
    to
    Code:
    toSkip <- c("Y", "M")
    I hope this helps
    Last edited by stefanoberri; 01-19-2012, 12:53 AM.

    Comment


    • #17
      Hey Stefano,

      Do you have time to explain your ways to work with large .BAM files? I am managing to run your program on my large files without any errors. However, when following the manual I end up with ~20% tumour content estimation whereas the oncologist estimated our tumour content to be 96%. I feel I may be doing something wrong in the post processing of the tab file in R. Any suggestions would be appreciated.

      Comment


      • #18
        Hi.
        To start, I would suggest to skip the smoothing step.

        Code:
        CN <- addSmooth(CN, lambda = 7)
        If you have very high coverage, the signal should be already well smoothed, and further decresing noise might over-fit mixture models.

        Lower estimation of tumour content can be due to several reasons such as high tumur heterogeneity (see the example of COLO829 as described in the sup material of the publication) or full genome duplication without homozygous deletions. In both these cases, the peak detection might be inaccurate or too "conservative" (most of the genome still diploid)

        If you trust the oncologist estimation, you could use that piece of information to obtain a more reliable normalisation. If it is really almost pure, then the main peaks should get a ratio of 0, 0.5, 1, 1.5...

        Comment


        • #19
          Hi Stefano, I had another question. If I wanted to view the ploidy at each region (like in the tab file) how would I go about doing this?

          Comment


          • #20
            I am not sure I understand what you mean. How about

            Code:
            exportTable(CN, file = "CNAnorm_table.tab", show = 'ploidy')
            It will produce a tab file with chromosome, position, ratio... and segmented normalised copy number (last column).

            alternatively, you can do
            Code:
            ploidy <- 2 * segMean.n(CN)
            However, it is the second time this question get asked. I'll put in the todo list an easier and documented way to get the ploidy from CN.

            Also, I found a bug in exportTable. Please download the latest version (1.1.4) from the developers page.

            Comment


            • #21
              Hi Stefano,
              Thanks for the CNAnorm Package.

              As PeakPloidy is really computationally time consuming, is there a way to run it as a parallel job (multithreading) using an R package such as "snow" or similar ?
              The time to compute a 60MB ".tab" file is long, very long :-)

              Thanks
              Chris

              Comment


              • #22
                Originally posted by christophe View Post
                As PeakPloidy is really computationally time consuming, is there a way to run it as a parallel job (multithreading) using an R package such as "snow" or similar ?
                No, unfortunately I haven't thought about parallel approach at the design stage. However, here a suggestion on how to make it fast and yet have high resolution.

                From the same bam files, you produce two tab files (using bam2windows.pl) One at high resolution, one at low resolution (for instance with --window 100000). At this point you load and perform the usual steps using the low resolution table (let's say creating a CNlow object) up to the "validation" step. For the normalisation, high resolution is not very important. As long as you have enough windows (a few thousand) it will work.
                Now you load the high resolution file (let's say creating CNhigh object) and perform usual GC correction, but do NOT perform peakPloidy (the computational intesive step).
                You move to the discreteNorm step
                Code:
                CNhigh <- discreteNorm(CNhigh, normBy = CNlow)
                Because it is the same sample, you can transform the high resolution data by the low resolution data. It might actually be better as there is less noise.

                alternatively, especially for testing/debugging, you can use

                Code:
                CN <- peakPloidy(CN, exclude = toSkip, method = 'density')
                this is largely independent of number of windows, so should be reasonable.

                I hope this help

                Comment


                • #23
                  Hi,

                  Is there a function to merge the segments (similar adjacent windows) in the output file resulted from exportTable to produce CNA calls?

                  Comment


                  • #24
                    Originally posted by Amjad85 View Post
                    Hi,

                    Is there a function to merge the segments (similar adjacent windows) in the output file resulted from exportTable to produce CNA calls?
                    Hi. I am afraid not. and the problem is in "similar adjacent windows". It is difficult to decide what is "similar" and what is "different enough".

                    what is, more precicely, the problem you have? Too many segmnents? Over-fragmentation?

                    Comment


                    • #25
                      Originally posted by stefanoberri View Post
                      Hi. I am afraid not. and the problem is in "similar adjacent windows". It is difficult to decide what is "similar" and what is "different enough".

                      what is, more precicely, the problem you have? Too many segmnents? Over-fragmentation?
                      Thank you for your reply. I am comparing different algorithms for CNA detection and CNAnorm is one of them. All other algorithms make CNA calls (regions of gain and loss) by merging adjacent windows and reporting mean ratio. So, I am not able to compare CNAnorm to other algorithms with this format

                      Comment


                      • #26
                        I am not sure what you mean. "merging adjacent windows and reporting mean ratio" is usually called segmentation and CNAnorm performs segmentation too (using DNAcopy).
                        In the file resulted from exportTable is in the last column. You will see several adjacent windows have the same segmented and normalised value. If in doubt please refer to dataset and example in the vignette.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-27-2024, 06:37 PM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-27-2024, 06:07 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X