SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie, an ultrafast, memory-efficient, open source short read aligner Ben Langmead Bioinformatics 514 03-13-2020 03:57 AM
STAR vs Tophat (2.0.5/6) dvanic Bioinformatics 44 05-21-2014 07:08 AM
Using Star/ bowtie on cluster babi2305 Bioinformatics 7 02-06-2013 11:11 AM
Suggested aligner for local alignment of RNA-seq data Eric Fournier RNA Sequencing 9 01-23-2013 10:38 AM

Reply
 
Thread Tools
Old 01-20-2015, 09:10 AM   #121
amolkolte
Junior Member
 
Location: Pune, India

Join Date: Dec 2012
Posts: 8
Default saturate the junctions!!

Hi,
Would it be wise to generate the genome with GTF file and re-generate it later with the sample specific SJ.out.tab?

I have more than 100 samples for variant analysis. Another strategy that I thing could be useful is to pull 20-30 samples together (mapping being extremely fast, it was over within 40mins), so that the discovery of novel junctions will get saturated, and then use this final SJ.out.tab to re-generate the genome, and perform sample wise alignment. This has reported ~2,00,000 total junctions, which is quite closer to the number of exons in human genome ~1,80,000

Cheers,
Amol
amolkolte is offline   Reply With Quote
Old 01-23-2015, 12:56 PM   #122
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Quote:
Originally Posted by amolkolte View Post
Hi,
Would it be wise to generate the genome with GTF file and re-generate it later with the sample specific SJ.out.tab?

I have more than 100 samples for variant analysis. Another strategy that I thing could be useful is to pull 20-30 samples together (mapping being extremely fast, it was over within 40mins), so that the discovery of novel junctions will get saturated, and then use this final SJ.out.tab to re-generate the genome, and perform sample wise alignment. This has reported ~2,00,000 total junctions, which is quite closer to the number of exons in human genome ~1,80,000

Cheers,
Amol
Hi Amol,

you are talking about what we now call a "2-pass" approach, where junctions detected in the first pass are used as annotations in the 2nd pass. This is generally the most accurate way to map the RNA-seq data. As you pointed out, there are two options:

1). Using the 1st pass in the sample specific way.
This can be done with --twopass1readsN <Nreads> option, where Nreads is the number of reads to map in the 1st pass. I recommend to use a very big number (or -1) which will map all the reads in the 1st pass (of course, in any case all the reads are re-mapped in the 2nd pass). At the moment this approach cannot be used with annotations.

2). You can collect all the junctions from the 1st mapping pass over multiple samples, and then re-generate the genome with a full collection of junctions, and re-map all reads with the new genome. This has to be done "manually" and it's a bit more cumbersome, but in the end it will give you the best sensitivity. Some filtering of the 1st pass novel junctions may be required, e.g. removing junctions on mitochondrion genomes. Some users also reported a speed decrease in the 2nd pass with millions of junctions. You will also see increase in multi-mappers, since the reads will be able to map to multiple junctions, especially in cases of short splicing overhangs.

Cheers
Alex
alexdobin is offline   Reply With Quote
Old 01-27-2015, 04:05 AM   #123
ruggedtextile
Member
 
Location: Cambridge

Join Date: Nov 2011
Posts: 18
Default

I am trying to use the new (to me) twopass1readsN option in STAR, but am finding it very slow (to the point of I am not sure if STAR is crashing or not).

Using some small test data (1M PE reads) and a very small (1000) twopass1reads N parameter it takes about one minute to insert the 1st pass junctions:

STAR --genomeDir /iGenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/StarIndex --readFilesIn R1.fastq.gz R2.fastq.gz --runThreadN 12 --genomeLoad NoSharedMemory --outFilterMultimapNmax 1 --outSAMtype BAM SortedByCoordinate --twopass1readsN 1000 --sjdbOverhang 99
Jan 27 12:56:01 ..... Started STAR run
Jan 27 12:56:13 ..... Started 1st pass mapping
Jan 27 12:56:15 ..... Finished 1st pass mapping
Jan 27 12:57:25 ..... Finished inserting 1st pass junctions into genome
Jan 27 12:57:25 ..... Started mapping
Jan 27 12:57:41 ..... Started sorting BAM

Running on a real dataset (~40M PE reads with twopass1readsN set to -1) took >18h before I killed the job - assuming it had crashed. It was using 100% CPU that entire time however, so it may well have simply been doing the right thing - just very slowly.

Is this timing normal for this option? Manually doing the 1st pass, genome regeneration and 2nd pass would be much quicker I think. At this speed I won't be able to use the auto 2 pass mode in my normal pipeline.
ruggedtextile is offline   Reply With Quote
Old 01-29-2015, 02:06 PM   #124
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Quote:
Originally Posted by ruggedtextile View Post
I am trying to use the new (to me) twopass1readsN option in STAR, but am finding it very slow (to the point of I am not sure if STAR is crashing or not).

Using some small test data (1M PE reads) and a very small (1000) twopass1reads N parameter it takes about one minute to insert the 1st pass junctions:

STAR --genomeDir /iGenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/StarIndex --readFilesIn R1.fastq.gz R2.fastq.gz --runThreadN 12 --genomeLoad NoSharedMemory --outFilterMultimapNmax 1 --outSAMtype BAM SortedByCoordinate --twopass1readsN 1000 --sjdbOverhang 99
Jan 27 12:56:01 ..... Started STAR run
Jan 27 12:56:13 ..... Started 1st pass mapping
Jan 27 12:56:15 ..... Finished 1st pass mapping
Jan 27 12:57:25 ..... Finished inserting 1st pass junctions into genome
Jan 27 12:57:25 ..... Started mapping
Jan 27 12:57:41 ..... Started sorting BAM

Running on a real dataset (~40M PE reads with twopass1readsN set to -1) took >18h before I killed the job - assuming it had crashed. It was using 100% CPU that entire time however, so it may well have simply been doing the right thing - just very slowly.

Is this timing normal for this option? Manually doing the 1st pass, genome regeneration and 2nd pass would be much quicker I think. At this speed I won't be able to use the auto 2 pass mode in my normal pipeline.
Hi,

18 hours is definitely too long, 2nd pass should normally take just a bit longer than the 1st pass. Could you please send me the Log.out file of the failed run?

Cheers
Alex
alexdobin is offline   Reply With Quote
Old 01-30-2015, 07:18 AM   #125
ruggedtextile
Member
 
Location: Cambridge

Join Date: Nov 2011
Posts: 18
Default

I have narrowed the problem down to using a genome index without pre-existing junctions annotated. If I use an otherwise identical index with junctions it runs fine (takes a couple of minutes max to insert the 1st pass junctions).

I.e:

Code:
[15:49 guttea@Maul MODY] > STAR --genomeDir /mnt/nas_omics/shared_data/iGenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/StarAnnotatedIndex --readFilesIn data/fastq/D0-1_R1.fastq.gz data/fastq/D0-1_R2.fastq.gz --runThreadN 12 --genomeLoad NoSharedMemory --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --twopass1readsN -1 --sjdbOverhang 149
Jan 30 15:58:43 ..... Started STAR run
Jan 30 15:59:32 ..... Started 1st pass mapping
Jan 30 16:00:35 ..... Finished 1st pass mapping
Jan 30 16:02:23 ..... Finished inserting 1st pass junctions into genome
Jan 30 16:02:25 ..... Started mapping
Jan 30 16:03:43 ..... Started sorting BAM
Jan 30 16:04:04 ..... Finished successfully
[16:04 guttea@Maul MODY] > STAR --genomeDir /mnt/nas_omics/shared_data/iGenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/StarIndex --readFilesIn data/fastq/D0-1_R1.fastq.gz data/fastq/D0-1_R2.fastq.gz --runThreadN 12 --genomeLoad NoSharedMemory --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --twopass1readsN -1 --sjdbOverhang 149 --outFileNamePrefix NoExisitingAnnotation
Jan 30 16:06:00 ..... Started STAR run
Jan 30 16:06:15 ..... Started 1st pass mapping
Jan 30 16:07:19 ..... Finished 1st pass mapping
With this being copied at 16:15 (i.e. it has already taken ~8 minutes and last time it hung here for 18 hours using 100% CPU). I've attached the two log.out files. You can see in the NoExistingAnnotation.log.out we never get confirmation of:

Code:
Jan 30 16:01:18   Finished sorting SA indicesL nInd=50917350
Hope that helps.
Attached Files
File Type: txt Annotation.Log.out.txt (19.5 KB, 6 views)
File Type: txt NoExisitingAnnotationLog.out.txt (18.3 KB, 1 views)
ruggedtextile is offline   Reply With Quote
Old 03-18-2015, 05:26 PM   #126
vromanr_2015
Junior Member
 
Location: Australia

Join Date: Mar 2015
Posts: 5
Default memory problems while running STAR

Hi, I'm trying to use genomegenerate but every time I try to run it I have this problem:

terminate called after throwing an instance of 'std::bad_alloc'
Aborted (core dumped)

I try to use this options as well:
--runThreadN 12 --limitGenomeGenerateRAM 40

but I still get the same error.

I'm working with a genome of 6G. could that be the problem.

I really appreciate any help,

V
vromanr_2015 is offline   Reply With Quote
Old 03-20-2015, 07:32 PM   #127
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Hi,

for a 6Gb genome, you would need ~60GB of RAM, and you would need to specify --limitGenomeGenerateRAM 60000000000 (i.e. the number of bytes).

Cheers
Alex

Quote:
Originally Posted by vromanr_2015 View Post
Hi, I'm trying to use genomegenerate but every time I try to run it I have this problem:

terminate called after throwing an instance of 'std::bad_alloc'
Aborted (core dumped)

I try to use this options as well:
--runThreadN 12 --limitGenomeGenerateRAM 40

but I still get the same error.

I'm working with a genome of 6G. could that be the problem.

I really appreciate any help,

V
alexdobin is offline   Reply With Quote
Old 03-26-2015, 05:47 PM   #128
neokao
Member
 
Location: USA

Join Date: Mar 2015
Posts: 30
Default

Hey,

I just switched from BWA to STAR for my RNAseq analysis yesterday. I got stuck with this: Mar 27 08:14:36 ..... Started STAR run
Mar 27 08:24:33 ..... Started mapping
Killed: 9

Then STAR stopped.
I have 20 of 50bp SE samples (~40M reads/sample) and the analysis was done with my poor Macmini (Core i7 / 16GB RAM).
I used Ensembl mouse genome (Mus_musculus.GRCm38.dna.primary_assembly.fa.gz and
Mus_musculus.GRCm38.79.gtf.gz) as reference.

The code I used to run the mapping:
> STAR --genomeDir ./GenomeDir --readFilesIn ./BGI_RNAseq_data_2015/01.fq,./BGI_RNAseq_data_2015/02.fq,./BGI_RNAseq_data_2015/03.fq,./BGI_RNAseq_data_2015/04.fq,./BGI_RNAseq_data_2015/05.fq,./BGI_RNAseq_data_2015/06.fq,./BGI_RNAseq_data_2015/07.fq,./BGI_RNAseq_data_2015/08.fq,./BGI_RNAseq_data_2015/09.fq,./BGI_RNAseq_data_2015/10.fq,./BGI_RNAseq_data_2015/11.fq,./BGI_RNAseq_data_2015/12.fq,./BGI_RNAseq_data_2015/13.fq,./BGI_RNAseq_data_2015/14.fq,./BGI_RNAseq_data_2015/15.fq,./BGI_RNAseq_data_2015/16.fq,./BGI_RNAseq_data_2015/17.fq,./BGI_RNAseq_data_2015/18.fq,./BGI_RNAseq_data_2015/19.fq,./BGI_RNAseq_data_2015/20.fq --runThreadN 8

I knew that STAR demands much more computing power (and memory) than BWA but I am not sure if my code is correct. I had no issues with BWA on the same Macmini though.
The log files are attached.
Attached Files
File Type: txt Log.progress.out.txt (236 Bytes, 1 views)
File Type: zip Log.out.txt.zip (4.9 KB, 5 views)
neokao is offline   Reply With Quote
Old 03-26-2015, 07:24 PM   #129
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Quote:
Originally Posted by neokao View Post
Hey,

I just switched from BWA to STAR for my RNAseq analysis yesterday. I got stuck with this: Mar 27 08:14:36 ..... Started STAR run
Mar 27 08:24:33 ..... Started mapping
Killed: 9

Then STAR stopped.
I have 20 of 50bp SE samples (~40M reads/sample) and the analysis was done with my poor Macmini (Core i7 / 16GB RAM).
I used Ensembl mouse genome (Mus_musculus.GRCm38.dna.primary_assembly.fa.gz and
Mus_musculus.GRCm38.79.gtf.gz) as reference.

The code I used to run the mapping:
> STAR --genomeDir ./GenomeDir --readFilesIn ./BGI_RNAseq_data_2015/01.fq,./BGI_RNAseq_data_2015/02.fq,./BGI_RNAseq_data_2015/03.fq,./BGI_RNAseq_data_2015/04.fq,./BGI_RNAseq_data_2015/05.fq,./BGI_RNAseq_data_2015/06.fq,./BGI_RNAseq_data_2015/07.fq,./BGI_RNAseq_data_2015/08.fq,./BGI_RNAseq_data_2015/09.fq,./BGI_RNAseq_data_2015/10.fq,./BGI_RNAseq_data_2015/11.fq,./BGI_RNAseq_data_2015/12.fq,./BGI_RNAseq_data_2015/13.fq,./BGI_RNAseq_data_2015/14.fq,./BGI_RNAseq_data_2015/15.fq,./BGI_RNAseq_data_2015/16.fq,./BGI_RNAseq_data_2015/17.fq,./BGI_RNAseq_data_2015/18.fq,./BGI_RNAseq_data_2015/19.fq,./BGI_RNAseq_data_2015/20.fq --runThreadN 8

I knew that STAR demands much more computing power (and memory) than BWA but I am not sure if my code is correct. I had no issues with BWA on the same Macmini though.
The log files are attached.
Hi,
to fit the mouse genome into the 16GB of RAM, you need to use at the genome generation step:
--genomeSAsparseD 2 --genomeSAindexNbases 13
Hopefully this will solve the problem.
Cheers
Alex
alexdobin is offline   Reply With Quote
Old 03-26-2015, 09:36 PM   #130
neokao
Member
 
Location: USA

Join Date: Mar 2015
Posts: 30
Default

Hey, Alex:

Just tried with reindexing genome.
Code:
STAR --runMode genomeGenerate --genomeDir ./GenomeDir --genomeFastaFiles ./Mus_musculus.GRCm38.dna.primary_assembly.fa --runThreadN 8 --sjdbGTFfile ./Mus_musculus.GRCm38.79.gtf --sjdbOverhang 48 --genomeSAsparseD 2 --genomeSAindexNbases 13

It was completed and then I tried the same code to map the files.

Still got the same error:
Mar 27 13:16:44 ..... Started STAR run
Mar 27 13:18:05 ..... Started mapping
Killed: 9

Any more suggestions?
Thanks.
neokao is offline   Reply With Quote
Old 03-27-2015, 05:17 PM   #131
neokao
Member
 
Location: USA

Join Date: Mar 2015
Posts: 30
Default

Somehow I could not send private message successfully, the new log.out file is in this link https://app.box.com/s/0nqn3jknqq1ta66jl1evme8c8ldunq9k

I guess that the issue started when it was creating the 8th thread. So I changed to --runThreadN 1 and then it seems OK so far for 5 hours already (still mapping). Is it normal to take more than 5 hours for 20 files like these described in the question? Thanks for your suggestions.
neokao is offline   Reply With Quote
Old 03-27-2015, 08:05 PM   #132
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Quote:
Originally Posted by neokao View Post
Somehow I could not send private message successfully, the new log.out file is in this link https://app.box.com/s/0nqn3jknqq1ta66jl1evme8c8ldunq9k

I guess that the issue started when it was creating the 8th thread. So I changed to --runThreadN 1 and then it seems OK so far for 5 hours already (still mapping). Is it normal to take more than 5 hours for 20 files like these described in the question? Thanks for your suggestions.
You can check Log.progress.out to see how many reads have been mapped. Since you are using one thread, I would expect the speed to be ~100M reads per hour, so it could take ~8 hours for ~800M reads.
You can try --runThreadN 7 or 6, or you can reduce the per-trhead IO buffer size with --limitIObufferSize 100000000.

Cheers
Alex
alexdobin is offline   Reply With Quote
Old 03-28-2015, 02:38 PM   #133
neokao
Member
 
Location: USA

Join Date: Mar 2015
Posts: 30
Default

So it went through successfully. With the code like this:
> STAR --genomeDir ./GenomeDir/ --readFilesIn ./BGI_RNAseq_data_2015/01.fq,./BGI_RNAseq_data_2015/02.fq,./BGI_RNAseq_data_2015/03.fq,./BGI_RNAseq_data_2015/04.fq,./BGI_RNAseq_data_2015/05.fq,./BGI_RNAseq_data_2015/06.fq,./BGI_RNAseq_data_2015/07.fq,./BGI_RNAseq_data_2015/08.fq,./BGI_RNAseq_data_2015/09.fq,./BGI_RNAseq_data_2015/10.fq,./BGI_RNAseq_data_2015/11.fq,./BGI_RNAseq_data_2015/12.fq,./BGI_RNAseq_data_2015/13.fq,./BGI_RNAseq_data_2015/14.fq,./BGI_RNAseq_data_2015/15.fq,./BGI_RNAseq_data_2015/16.fq,./BGI_RNAseq_data_2015/17.fq,./BGI_RNAseq_data_2015/18.fq,./BGI_RNAseq_data_2015/19.fq,./BGI_RNAseq_data_2015/20.fq --runThreadN 1
, it took ~20 hours at the speed of ~37M reads per hour (That Mac mini was in use for other stuff then).

Somehow --runThreadN 2, --runThreadN 4 or --runThreadN 6 all got the same errors.

Anyway, the output is one ~200GB Aligned.out.sam file. I thought I will get 20 individual sam files to do featurecounts. Is there anyway to parse it ? Or I need to redo the STAR mapping all over with the input fq file individually? Thanks.

Last edited by neokao; 03-28-2015 at 03:46 PM.
neokao is offline   Reply With Quote
Old 03-29-2015, 07:23 PM   #134
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Quote:
Originally Posted by neokao View Post
So it went through successfully. With the code like this:
> STAR --genomeDir ./GenomeDir/ --readFilesIn ./BGI_RNAseq_data_2015/01.fq,./BGI_RNAseq_data_2015/02.fq,./BGI_RNAseq_data_2015/03.fq,./BGI_RNAseq_data_2015/04.fq,./BGI_RNAseq_data_2015/05.fq,./BGI_RNAseq_data_2015/06.fq,./BGI_RNAseq_data_2015/07.fq,./BGI_RNAseq_data_2015/08.fq,./BGI_RNAseq_data_2015/09.fq,./BGI_RNAseq_data_2015/10.fq,./BGI_RNAseq_data_2015/11.fq,./BGI_RNAseq_data_2015/12.fq,./BGI_RNAseq_data_2015/13.fq,./BGI_RNAseq_data_2015/14.fq,./BGI_RNAseq_data_2015/15.fq,./BGI_RNAseq_data_2015/16.fq,./BGI_RNAseq_data_2015/17.fq,./BGI_RNAseq_data_2015/18.fq,./BGI_RNAseq_data_2015/19.fq,./BGI_RNAseq_data_2015/20.fq --runThreadN 1
, it took ~20 hours at the speed of ~37M reads per hour (That Mac mini was in use for other stuff then).

Somehow --runThreadN 2, --runThreadN 4 or --runThreadN 6 all got the same errors.

Anyway, the output is one ~200GB Aligned.out.sam file. I thought I will get 20 individual sam files to do featurecounts. Is there anyway to parse it ? Or I need to redo the STAR mapping all over with the input fq file individually? Thanks.
If read names in each of the .fastq files have distinct prefixes, you could split the resulting SAM into separate files.
Otherwise, you would have to map each of the files separately in a cycle. Something like "--readFilesIn XX.fq --outFileNamePrefix XX" will allow storing separate output files in one directory.
It's strange that multi-threading does not work. Is it possible that other stuff takes up a significant chunk of RAM?

Cheers
Alex
alexdobin is offline   Reply With Quote
Old 03-29-2015, 07:51 PM   #135
neokao
Member
 
Location: USA

Join Date: Mar 2015
Posts: 30
Default

Thanks, Alex.

I don't know what's causing the problem for multi-threading on my Mac mini.
I got the same error even after I rebooted the computer and started the mapping freshly.

Anyways, I went ahead and started the mapping one by one with --runThreadN 1 already.
It is weird that even with --runThreadN 1 in code like this:

> STAR --genomeDir ./GenomeDir/ --readFilesIn ./BGI_RNAseq_data_2015/01.fq --runThreadN 1

, it sometimes worked but sometimes did not.

For the same .fq file, it could give the the Killed: 9 error and when I redid with the exact same code, it went through successfully. Very strange.

My .fq files do have distinct prefixes and they are ordered by two digits number as described before: 01.fq, 02.fq, etc. Could you shed more light on --readFilesIn XX.fq --outFileNamePrefix XX ? Thanks.
neokao is offline   Reply With Quote
Old 03-29-2015, 11:40 PM   #136
dietmar13
Senior Member
 
Location: Vienna

Join Date: Mar 2010
Posts: 107
Default 2-pass speed about 7 M/hr

hello,

is it possible that mapping speed for the 2nd pass decreases to 7 M/hr if 900.000 new splice sites and a comprehensive gene model (gencode v19) were used for index generation.
first pass was ~100-fold faster (700 M/hr).

my concrete syntax:
/home/ws/SW_install/STAR/STAR/source/STAR --runThreadN 31 --outSAMstrandField intronMotif --outSAMtype BAM SortedByCoordinate --genomeDir $indices --readFilesCommand zcat --readFilesIn /home/ws/data/PatientData/$b/m*/$s/t*/date*/$f1 /home/ws/data/PatientData/$b/m*/$s/t*/date*/$f2 --outFileNamePrefix $name

wiht 31 cores on a 192 GByte Scientific Linux 7 workstation.

Can I do something to improve speed?

dietmar
dietmar13 is offline   Reply With Quote
Old 03-30-2015, 09:00 PM   #137
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Quote:
Originally Posted by neokao View Post
Thanks, Alex.

I don't know what's causing the problem for multi-threading on my Mac mini.
I got the same error even after I rebooted the computer and started the mapping freshly.

Anyways, I went ahead and started the mapping one by one with --runThreadN 1 already.
It is weird that even with --runThreadN 1 in code like this:

> STAR --genomeDir ./GenomeDir/ --readFilesIn ./BGI_RNAseq_data_2015/01.fq --runThreadN 1

, it sometimes worked but sometimes did not.

For the same .fq file, it could give the the Killed: 9 error and when I redid with the exact same code, it went through successfully. Very strange.

My .fq files do have distinct prefixes and they are ordered by two digits number as described before: 01.fq, 02.fq, etc. Could you shed more light on --readFilesIn XX.fq --outFileNamePrefix XX ? Thanks.
You can map each of the FASTQ files in the separate directory, e.g. 01/, 02/ ... The output files in all directories will have the same names, such as 01/Aligned.out.sam 02/Aligned.out.sam ...
Alternatively, you can run all STAR jobs in one directory but with different prefixes corresponding to your FASTQ files, i.e.
STAR --readFilesIn 01.fastq --outFileNamePrefix 01_
STAR --readFilesIn 02.fastq --outFileNamePrefix 02_
In this case the output file will have the specified prefixes for each of the runs, i.e.
01_Aligned.out.sam, 02_Aligned.out.sam ...

I suspect that there is some problem with RAM management as STAR takes almost all of the available RAM.
Can you try to reboot your machine - I heard that this helps some Mac systems to "declutter" RAM?
Also, please run "top" command while running STAR to see how much memory is being used.

Cheers
Alex
alexdobin is offline   Reply With Quote
Old 03-30-2015, 09:08 PM   #138
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Quote:
Originally Posted by dietmar13 View Post
hello,

is it possible that mapping speed for the 2nd pass decreases to 7 M/hr if 900.000 new splice sites and a comprehensive gene model (gencode v19) were used for index generation.
first pass was ~100-fold faster (700 M/hr).

my concrete syntax:
/home/ws/SW_install/STAR/STAR/source/STAR --runThreadN 31 --outSAMstrandField intronMotif --outSAMtype BAM SortedByCoordinate --genomeDir $indices --readFilesCommand zcat --readFilesIn /home/ws/data/PatientData/$b/m*/$s/t*/date*/$f1 /home/ws/data/PatientData/$b/m*/$s/t*/date*/$f2 --outFileNamePrefix $name

wiht 31 cores on a 192 GByte Scientific Linux 7 workstation.

Can I do something to improve speed?

dietmar
Hi Dietmar,

there were some reports of the slowdown in the 2nd pass. In one of the cases it was caused by the splice junctions (likely false positive) in the mitochondrion genome: https://groups.google.com/d/msg/rna-...Y/0jSn0vy0ccgJ.
If filtering out chrM junctions does not help, please send me the list of junctions from the 1st pass and a few million reads for testing.

Cheers
Alex
alexdobin is offline   Reply With Quote
Old 03-31-2015, 07:02 AM   #139
neokao
Member
 
Location: USA

Join Date: Mar 2015
Posts: 30
Default

Quote:
Originally Posted by alexdobin View Post
I suspect that there is some problem with RAM management as STAR takes almost all of the available RAM.
Can you try to reboot your machine - I heard that this helps some Mac systems to "declutter" RAM?
Also, please run "top" command while running STAR to see how much memory is being used.

Cheers
Alex
I guess so too. I manually did these 20 .fq files with occasional Killed: 9 error. I found that it could usually go through if I run the EXACT code again (even without rebooting the OSX). However now I really got stuck with one biggest .fq file (~ 6.6G). For that particular .fg file, I got Abort trap: 6 error at ..... Started sorting BAM step. It happens everytime (tried 6~7 times so far even with a fresh reboot). I did not see anything weird with top command.
I also tried the --limitIObufferSize 100000000 but still got the Abort trap: 6 error.
It is frustrating since this is the last file to map. The particular log.out file is attached here. Thanks for the advice.
Attached Files
File Type: txt Log.out.txt (15.1 KB, 1 views)

Last edited by neokao; 03-31-2015 at 07:05 AM.
neokao is offline   Reply With Quote
Old 04-01-2015, 03:41 PM   #140
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Quote:
Originally Posted by neokao View Post
I guess so too. I manually did these 20 .fq files with occasional Killed: 9 error. I found that it could usually go through if I run the EXACT code again (even without rebooting the OSX). However now I really got stuck with one biggest .fq file (~ 6.6G). For that particular .fg file, I got Abort trap: 6 error at ..... Started sorting BAM step. It happens everytime (tried 6~7 times so far even with a fresh reboot). I did not see anything weird with top command.
I also tried the --limitIObufferSize 100000000 but still got the Abort trap: 6 error.
It is frustrating since this is the last file to map. The particular log.out file is attached here. Thanks for the advice.
Please try the latest STAR release https://github.com/alexdobin/STAR/re...ag/STAR_2.4.0k - I have improved the BAM sorting and it now should require less RAM. Also, it may be safer to use a separate BAM sorting limit for RAM, say --limitBAMsortRAM 10000000000 .

Cheers
Alex
alexdobin is offline   Reply With Quote
Reply

Tags
alignment, genome, mapping, rna-seq, transcirptome

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:10 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO