Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • habbas
    Junior Member
    • Nov 2014
    • 8

    summarizeOverlaps too, too slow then freeze

    I hope someone can help in this issue.


    I have 8 bam files from mm9 alignment, each ~4-5 geg in size. When I run summarizeOverlaps over 3 files, it takes 2-3 hours to finish and it works although my computer almost freezes up. But when I put the 8 files together, then keep it on overnight (as it takes too long to wait), the computer freezes (although it is 16 geg i7 mac, so supposed to be powerful) and the command never results in anything.

    I am making my own txdb file from gtf that I used for the alignment to match the naming of the chromosomes. (script is below).

    Do you have any tips on how I can get the summzerOverlaps to work on the 8 files to create one se file without freezing up the computer? I have been trying to do that for the past 2 week and always same result.


    Any input is appreciated.

    here’s the script:


    library("DESeq2")
    library("GenomicFeatures")
    library("Rsamtools")
    library("GenomicAlignments")
    library("GenomicRanges”)

    mm9_from_cluster_gtf_txdb <- makeTranscriptDbFromGFF(file="~/Desktop/genes.gtf", format="gtf”)
    head(seqlevels(mm9_from_cluster_gtf_txdb))
    saveDb(mm9_from_cluster_gtf_txdb, file="/Path/To/Libraries/TxDB/mm9_from_cluster_Ensembl_txdb.sqlite”)
    exonsByGene<-exonsBy(mm9_from_cluster_gtf_txdb,by="gene")
    seqinfo(exonsByGene)

    fls <- list.files("/Path/To/BamFiles", pattern="paired.accepted_hits.bam", full= TRUE)
    fls

    Experiment <- c(fls[2:8], fls[1])
    Experiment

    bamLst_experiment <- BamFileList(Experiment, yieldSize=100000)
    seqinfo(bamLst_experiment)

    se_test_experiment <- summarizeOverlaps(exonsByGene,bamLst_experiment, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE) <<<This is the step that freezes the computer when I run the 8 of the files together.
    Last edited by habbas; 12-29-2014, 02:37 PM.
  • Wolfgang Huber
    Senior Member
    • Aug 2009
    • 109

    #2
    Habbas

    are you using the most recent versions of R and the packages?
    See http://www.bioconductor.org/packages...lignments.html (version 1.2.1) and in fact I would even try R-devel and http://www.bioconductor.org/packages...lignments.html (version 1.3.19).

    If the problem persists, I recommend contacting the maintainer of the "GenomicAlignments" package (which contains summarizeOverlaps) directly, possibly via the Bioconductor forum.

    Kind regards
    Wolfgang
    Wolfgang Huber
    EMBL

    Comment

    • habbas
      Junior Member
      • Nov 2014
      • 8

      #3
      Thank you for your reply. I am using the latest R version. I am not sure what's the difference between the two links you provided above. They both seem to link to the same downloadable links:

      source("http://bioconductor.org/biocLite.R")
      biocLite("GenomicAlignments")

      Is there a way to have a link to the R-devel version so i could try it? I even tried running the same thing over 30 hours. It looked like it was consuming memory (~600 megabyte ram) but it still took 30 hours without any results.

      Comment

      • habbas
        Junior Member
        • Nov 2014
        • 8

        #4
        Here's my sessionInfo(). Does it look right?

        R version 3.1.2 (2014-10-31)
        Platform: x86_64-apple-darwin13.4.0 (64-bit)

        locale:
        [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

        attached base packages:
        [1] stats4 parallel stats graphics grDevices utils datasets methods base

        other attached packages:
        [1] GenomicAlignments_1.2.0 Rsamtools_1.18.1 Biostrings_2.34.0 XVector_0.6.0
        [5] GenomicRanges_1.18.3 GenomeInfoDb_1.2.2 IRanges_2.0.0 S4Vectors_0.4.0
        [9] BiocGenerics_0.12.0 BiocInstaller_1.16.1

        loaded via a namespace (and not attached):
        [1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 BiocParallel_1.0.0 bitops_1.0-6
        [6] brew_1.0-6 checkmate_1.5.0 codetools_0.2-9 DBI_0.3.1 digest_0.6.4
        [11] fail_1.2 foreach_1.4.2 iterators_1.0.7 RSQLite_1.0.0 sendmailR_1.2-1
        [16] stringr_0.6.2 tools_3.1.2 zlibbioc_1.12.0

        Comment

        • habbas
          Junior Member
          • Nov 2014
          • 8

          #5
          I posted the question on the support website of bioconductor and found a solution for the problem. Here's the link for it:

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM
          • SEQadmin2
            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
            by SEQadmin2


            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


            Introduction

            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
            05-22-2026, 06:42 AM
          • SEQadmin2
            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
            by SEQadmin2

            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
            05-06-2026, 09:04 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 08:59 AM
          0 responses
          13 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          22 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          19 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 05-28-2026, 11:40 AM
          0 responses
          31 views
          0 reactions
          Last Post SEQadmin2  
          Working...