SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
STAR: ultrafast universal RNA-seq aligner alexdobin Bioinformatics 218 04-02-2018 05:59 PM
mosaik-aligner installation problem on redhat 6 joachim.jacob Bioinformatics 5 05-10-2013 08:36 AM
Suggested aligner for local alignment of RNA-seq data Eric Fournier RNA Sequencing 9 01-23-2013 10:38 AM

Reply
 
Thread Tools
Old 01-06-2014, 08:44 PM   #1
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default STAR rna seq Aligner installation

Hello.
I am having issues installing the STAR RNAseq aligner.

I downloaded the reference genome from the website but an not sure how to generate the genome from the manual.

when I untar the hg19 folder, there exists a file titled "Genome" but has no extension, and I tried to "head" the file to check if it is the genome.fa (which I hope it is) and I'm not able to view the contents.

here are my parameters, and here is the error I get when trying to make the reference genome.


[acolombo@hpc-login2 hg19]$ /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/hg19 --genomeFastaFiles /auto/rcf-proj/sa1/data/hg19/Genome --runThreadN 16 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 0
Jan 06 21:41:42 ..... Started STAR run
Jan 06 21:41:42 ... Starting to generate Genome files
terminate called after throwing an instance of 'std:ut_of_range'
what(): vector::_M_range_check
Abort
arcolombo698 is offline   Reply With Quote
Old 01-06-2014, 10:33 PM   #2
shunyip
Member
 
Location: New York

Join Date: Oct 2013
Posts: 20
Default

Hello Arcolombo,

I believe the problem is your genome file, just as you are suspecting. A genome file should be called "genome.fa".

I hope this post will help you find what you need: http://seqanswers.com/forums/showthread.php?t=5996
shunyip is offline   Reply With Quote
Old 01-06-2014, 11:32 PM   #3
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

The genomes you can download from the STAR website have already been prepared - no need to run genomeGenerate on them again. Just skip ahead to the alignment stage.
ffinkernagel is offline   Reply With Quote
Old 01-07-2014, 10:20 AM   #4
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default

If I wish to re create the Genome directory from a previous directory used, and using the junctions bed file that was found on the STAR website, how to proceed?

I ran a STAR command line that called a previous genome.fa from the UCSC site that I use for tophat. I added the parameters that point to the genome directory (UCSC hg19 directory) and also points to the genome.fa (from the previously used hg19 file). but in the genome creation I added the junctions file (according to the manual it is more accurate).

I still get an error regarding

[acolombo@hpc-login2 STAR]$ /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta --genomeFastaFiles /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa --runThreadN 1 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 1 genomeChrBinNbits 12
Jan 07 11:10:23 ..... Started STAR run
Jan 07 11:10:24 ... Starting to generate Genome files
Jan 07 11:15:40 ... finished processing splice junctions database ...
Jan 07 11:16:55 ... starting to sort Suffix Array. This may take a long time...
Jan 07 11:17:26 ... sorting Suffix Array chunks and saving them to disk...
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Abort
arcolombo698 is offline   Reply With Quote
Old 01-07-2014, 12:53 PM   #5
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default

This issue was found in the previous announcement of STAR release and the work solution was to use the parameters

/auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/hg19/Sequence --genomeFastaFiles /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa --runThreadN 1 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 1 genomeChrBinNbits 6 --genomeSAindexNbases 4


Yet this is processing for over 45 minutes - 1 .5 hours. (quite very slow)
arcolombo698 is offline   Reply With Quote
Old 01-07-2014, 02:05 PM   #6
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default

Here are the results

it gives an error about not enough SA indices... currently re running
Attached Files
File Type: txt Log.txt (13.3 KB, 14 views)
arcolombo698 is offline   Reply With Quote
Old 01-07-2014, 11:41 PM   #7
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

How much memory do you have?
ffinkernagel is offline   Reply With Quote
Old 01-08-2014, 12:10 AM   #8
shunyip
Member
 
Location: New York

Join Date: Oct 2013
Posts: 20
Default

Quote:
Originally Posted by ffinkernagel View Post
How much memory do you have?
Agreed, you may have run out of disk space. How much free memory do you have in your hard drive?
shunyip is offline   Reply With Quote
Old 01-09-2014, 06:57 AM   #9
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Not disk space, RAM - STAR uses quite a lot of ram to generate it's index (last time I checked, 16 GB were not enough for a human genome)
ffinkernagel is offline   Reply With Quote
Old 10-25-2016, 01:30 AM   #10
anagd
Junior Member
 
Location: Spain

Join Date: Oct 2016
Posts: 1
Default

Hello I am having problmes trying to generate the index of mouse GRCm38 from Ensembl.
STAR stops when.. sorting Suffix Array chunks and saving them to disk... is running without any error so my Genome file for the next step is not generated.

I am running STAR using cygwin from windows and I have 64Gb RAM.
I heard that maybe the problem ends up with STAR's pre-compiled build. I am not an expert in informatics and RNA-seq analysis is also new for me, so I don't understand well how I have to compile STAR executable but what I did is set the working directory in cd STAR/source and runing STAR from here. Also I set the path to STAR executable in PATH enviroment variable in windows setting system. You guys did you have similar problems?

Im very stuck in this step for several days and I dont know what to do. Any help is welcoming. Could I use a already index generated from STAR in case I cannot do my own indexes?

I have a Intel Xeon CPU 3.5Ghz Number of Cores 4, Number of logical Procss 8 The mouse genome and genes.gtf files I downloaded them from iGenome website and I am using the WholeGenome.fa file from Ensembl. Is this genome too big and I have RAM limitiation? Should I generate my index chromosome per chromosome? How long could be last the index generation?

This is my command:

./STAR --runMode genomeGenerate --genomeDir /cygdrive/c/Ana_Gómez_Secuenciación/CM1_FACS/20160818_Carpeta_de_trabajo_H3YJLBGXY/index --genomeFastaFiles /cygdrive/c/Ana_Gómez_Secuenciación/Genome/reference/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa --runThreadN 6 --sjdbGTFfile /cygdrive/c/Ana_Gómez_Secuenciación/Genome/GTF_files/referenceGTF/genes.gtf --sjdbOverhang 75 --genomeSAsparseD parameter 1
anagd is offline   Reply With Quote
Old 10-25-2016, 09:31 AM   #11
cmbetts
Senior Member
 
Location: Bay Area

Join Date: Jun 2012
Posts: 105
Default

Generally, you can't just drop a linux binary into a Cygwin environment and expect it to run. As you alluded to, you almost certainly have to use MinGW to compile your own binaries. As someone who's slammed their head into a wall repeatedly trying to compile NGS analysis tools in Cygwin (I really wish I'd documented how I got samtools to compile properly that one time!), I'd highly recommend running Linux in a VM, I run ubuntu server installed under VirtualBox on my work mandated Windows PC, or natively as a dual boot. You'll find nothing but pain trying to get a usable NGS environment going on Windows, while almost everything you'd want to use was designed for and probably has a precompiled binary available for Linux (Not to mention a competent commandline, which is how nearly all of the tools are run).
cmbetts is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO