SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
summarizeOverlaps error colaneri Bioinformatics 0 11-13-2014 06:45 PM
summarizeOverlaps error Parharn Bioinformatics 11 06-09-2014 10:12 AM
bwa sampe very slow natpokah Bioinformatics 25 08-13-2013 11:18 AM
Do you freeze hyb cocktails? jlove Illumina/Solexa 8 09-13-2012 06:13 PM
summarizeOverlaps vs countOverlaps swaraj Bioinformatics 0 02-29-2012 09:42 AM

Reply
 
Thread Tools
Old 12-23-2014, 06:32 AM   #1
habbas
Junior Member
 
Location: texas

Join Date: Nov 2014
Posts: 8
Default summarizeOverlaps too, too slow then freeze

I hope someone can help in this issue.


I have 8 bam files from mm9 alignment, each ~4-5 geg in size. When I run summarizeOverlaps over 3 files, it takes 2-3 hours to finish and it works although my computer almost freezes up. But when I put the 8 files together, then keep it on overnight (as it takes too long to wait), the computer freezes (although it is 16 geg i7 mac, so supposed to be powerful) and the command never results in anything.

I am making my own txdb file from gtf that I used for the alignment to match the naming of the chromosomes. (script is below).

Do you have any tips on how I can get the summzerOverlaps to work on the 8 files to create one se file without freezing up the computer? I have been trying to do that for the past 2 week and always same result.


Any input is appreciated.

here’s the script:


library("DESeq2")
library("GenomicFeatures")
library("Rsamtools")
library("GenomicAlignments")
library("GenomicRanges”)

mm9_from_cluster_gtf_txdb <- makeTranscriptDbFromGFF(file="~/Desktop/genes.gtf", format="gtf”)
head(seqlevels(mm9_from_cluster_gtf_txdb))
saveDb(mm9_from_cluster_gtf_txdb, file="/Path/To/Libraries/TxDB/mm9_from_cluster_Ensembl_txdb.sqlite”)
exonsByGene<-exonsBy(mm9_from_cluster_gtf_txdb,by="gene")
seqinfo(exonsByGene)

fls <- list.files("/Path/To/BamFiles", pattern="paired.accepted_hits.bam", full= TRUE)
fls

Experiment <- c(fls[2:8], fls[1])
Experiment

bamLst_experiment <- BamFileList(Experiment, yieldSize=100000)
seqinfo(bamLst_experiment)

se_test_experiment <- summarizeOverlaps(exonsByGene,bamLst_experiment, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE) <<<This is the step that freezes the computer when I run the 8 of the files together.

Last edited by habbas; 12-29-2014 at 02:37 PM.
habbas is offline   Reply With Quote
Old 12-29-2014, 07:23 AM   #2
Wolfgang Huber
Senior Member
 
Location: Heidelberg, Germany

Join Date: Aug 2009
Posts: 109
Default

Habbas

are you using the most recent versions of R and the packages?
See http://www.bioconductor.org/packages...lignments.html (version 1.2.1) and in fact I would even try R-devel and http://www.bioconductor.org/packages...lignments.html (version 1.3.19).

If the problem persists, I recommend contacting the maintainer of the "GenomicAlignments" package (which contains summarizeOverlaps) directly, possibly via the Bioconductor forum.

Kind regards
Wolfgang
__________________
Wolfgang Huber
EMBL
Wolfgang Huber is offline   Reply With Quote
Old 12-29-2014, 07:40 AM   #3
habbas
Junior Member
 
Location: texas

Join Date: Nov 2014
Posts: 8
Default

Thank you for your reply. I am using the latest R version. I am not sure what's the difference between the two links you provided above. They both seem to link to the same downloadable links:

source("http://bioconductor.org/biocLite.R")
biocLite("GenomicAlignments")

Is there a way to have a link to the R-devel version so i could try it? I even tried running the same thing over 30 hours. It looked like it was consuming memory (~600 megabyte ram) but it still took 30 hours without any results.
habbas is offline   Reply With Quote
Old 12-29-2014, 07:45 AM   #4
habbas
Junior Member
 
Location: texas

Join Date: Nov 2014
Posts: 8
Default

Here's my sessionInfo(). Does it look right?

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] GenomicAlignments_1.2.0 Rsamtools_1.18.1 Biostrings_2.34.0 XVector_0.6.0
[5] GenomicRanges_1.18.3 GenomeInfoDb_1.2.2 IRanges_2.0.0 S4Vectors_0.4.0
[9] BiocGenerics_0.12.0 BiocInstaller_1.16.1

loaded via a namespace (and not attached):
[1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 BiocParallel_1.0.0 bitops_1.0-6
[6] brew_1.0-6 checkmate_1.5.0 codetools_0.2-9 DBI_0.3.1 digest_0.6.4
[11] fail_1.2 foreach_1.4.2 iterators_1.0.7 RSQLite_1.0.0 sendmailR_1.2-1
[16] stringr_0.6.2 tools_3.1.2 zlibbioc_1.12.0
habbas is offline   Reply With Quote
Old 12-30-2014, 06:41 AM   #5
habbas
Junior Member
 
Location: texas

Join Date: Nov 2014
Posts: 8
Default

I posted the question on the support website of bioconductor and found a solution for the problem. Here's the link for it:

https://support.bioconductor.org/p/63875/#63902
habbas is offline   Reply With Quote
Reply

Tags
genomicalignment, rnaseq, summarizeoverlaps

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:32 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO