SEQanswers

Go Back   SEQanswers > General



Reply
 
Thread Tools
Old 06-12-2013, 10:48 PM   #21
shi
Wei Shi
 
Location: Australia

Join Date: Feb 2010
Posts: 235
Default

Dear Jon,

No worries. Thanks for letting me know.

Please make sure you are using the latest version (1.3.5-p3). We made changed to featureCounts today. Let me know if you run into any problems.

Best wishes,

Wei
shi is offline   Reply With Quote
Old 08-03-2013, 01:00 AM   #22
kdmurray91
Junior Member
 
Location: Canberra

Join Date: Sep 2012
Posts: 2
Default

Hello!

I've hit a problem with subread-align. I receive the following output from subread-align when trying to map paired end RNAseq data.

Code:
$ subread-align -r sample_R1.trimmed.fq.gz -R sample_R2.trimmed.fq.gz -o sample.bam -i ../refseqs/TAIR10_gen/TAIR10_gen

Number of selected subreads = 10
Consensus threshold = 3
Number of threads=1
Number of indels allowed=5


Performing paired-end alignment:
Maximum fragment length=600
Minimum fragment length=50
Threshold on number of subreads for a successful mapping (the minor end in the pair)=1
Number of anchors=10
The directions of the two input files are: forward, reversed

Out of memory. If you are using Rsubread in R, please save your working environment and restart R.
This is on the following platform:
Code:
$ uname -a
Linux host 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux

$ free -m
             total       used       free     shared    buffers     cached
Mem:        129161      56634      72526          0        196      54215
-/+ buffers/cache:       2222     126938
Swap:        95356         12      95344
subread-align -v gives "Subread 1.3.5-p4", even though it's from the 1.3.5-p5 tarball


I can attach the Fastq files if you would like, as these are ~7mb test files i've used to debug my analysis pipeline.

Kevin
kdmurray91 is offline   Reply With Quote
Old 08-03-2013, 01:18 AM   #23
shi
Wei Shi
 
Location: Australia

Join Date: Feb 2010
Posts: 235
Default

Dear Kevin,

It seems you provided gzipped fastq file to subread-aligner for alignment. But subread-align does not support gzipped input files. Please unzip them and run it again to see if you still have the same problem.

Best wishes,

Wei
shi is offline   Reply With Quote
Old 08-03-2013, 01:24 AM   #24
kdmurray91
Junior Member
 
Location: Canberra

Join Date: Sep 2012
Posts: 2
Default

Quote:
Originally Posted by shi View Post
Dear Kevin,

It seems you provided gzipped fastq file to subread-aligner for alignment. But subread-align does not support gzipped input files. Please unzip them and run it again to see if you still have the same problem.

Best wishes,

Wei
Hi Wei,

That fixed the problem, thanks very much. Apologies for the stupid question.

Kevin
kdmurray91 is offline   Reply With Quote
Old 08-03-2013, 01:27 AM   #25
shi
Wei Shi
 
Location: Australia

Join Date: Feb 2010
Posts: 235
Default

There is no stupid question here. I'm glad that fixed the problem.

Cheers,
Wei
shi is offline   Reply With Quote
Old 09-22-2014, 02:33 AM   #26
ndaniel
Member
 
Location: Helsinki

Join Date: Feb 2009
Posts: 33
Default

Hi Shi,

how one should run subjunc for a RNA-seq experiment such that the (i) input reads should have only on insertion, and (ii) the maximum length of the insertion is specified by the user (or how to specify the maximum length of a intron; this is needed because this depends from organism to organism)!
ndaniel is offline   Reply With Quote
Old 09-22-2014, 03:35 PM   #27
shi
Wei Shi
 
Location: Australia

Join Date: Feb 2010
Posts: 235
Default

Hi @ndaniel,

I don't quite understand your questions. It looks like you asked how to specify the maximum intron size in subjunc? Firstly, an exon-spanning read may span more than one exon, so what do you want to limit the detection of introns in each read to only one intron? One of the strengths of subjunc is that it can detect up to 4 introns in each read.

Secondly, you do not know what is the maximun length of introns in your data, so you'd better let subjunc detect it for you. Subjunc uses donor/receptor sites to accurately detect the boundaries of introns.

Wei
shi is offline   Reply With Quote
Old 09-24-2014, 11:24 AM   #28
ndaniel
Member
 
Location: Helsinki

Join Date: Feb 2009
Posts: 33
Default

Quote:
Originally Posted by shi View Post
Hi @ndaniel,

I don't quite understand your questions. It looks like you asked how to specify the maximum intron size in subjunc?

Wei
Sorry for not explaining very well my question. :-(

Yes, what is the maximum size of the intron which subjunc can handle? Is there a (affine?) penalty related to the intron length? Does subjuncs weights an intron of length 10,000,000 bp long as one of 10,000 bp long? Is subjuncs able to find an intron of length 10,000,000 bp long? What is the minimum read overhang which subjuncs can handle (e.g. 10bp, 17bp, 20bp)?

How subjuncs treats this case when a read of 100 bp is split in 80+20 with an intron of length (a) 1,000 bp, or (b) 100,000 bp (that is 20bp maps 1,000 bp away from the 80 bp or 20bp maps equally well to 100,000 bp away) ?

Quote:
Originally Posted by shi View Post
Firstly, an exon-spanning read may span more than one exoni
I know but my question is not about those kind of reads. I am interested only in reads which spans two and only two exons (and one intron).

Quote:
Originally Posted by shi View Post
so what do you want to limit the detection of introns in each read to only one intron?
Yes. :-)

Quote:
Originally Posted by shi View Post
Secondly, you do not know what is the maximun length of introns in your data
Yes, I do know the maximum length of the introns in my data. Actually there are really exact estimates for annotated genomes about this!

Quote:
Originally Posted by shi View Post
Hi @ndaniel,

so you'd better let subjunc detect it for you
I prefer to look/search for introns which have their lengths within a given range in order to limit the search space for subjuncs.

Quote:
Originally Posted by shi View Post
Hi @ndaniel,
Subjunc uses donor/receptor sites to accurately detect the boundaries of introns.
Is able subjuncs look for boundaries of introns without using the donor/receptor sites (i.e. conventional sites)? Or does subjuncs allow to weight equally the not-conventional donor/acceptor sites and the conventional ones?

Last edited by ndaniel; 09-24-2014 at 11:28 AM.
ndaniel is offline   Reply With Quote
Old 09-25-2014, 09:49 PM   #29
shi
Wei Shi
 
Location: Australia

Join Date: Feb 2010
Posts: 235
Default

Quote:
Yes, what is the maximum size of the intron which subjunc can handle? Is there a (affine?) penalty related to the intron length? Does subjuncs weights an intron of length 10,000,000 bp long as one of 10,000 bp long? Is subjuncs able to find an intron of length 10,000,000 bp long? What is the minimum read overhang which subjuncs can handle (e.g. 10bp, 17bp, 20bp)?
The maximum allowed intron size is 500,000 bases in subjunc. There is no penalty applied for intron length. Long introns and short introns are treated in the same manner. Subjunc is capable of detecting introns at any position of the reads.

Quote:
How subjuncs treats this case when a read of 100 bp is split in 80+20 with an intron of length (a) 1,000 bp, or (b) 100,000 bp (that is 20bp maps 1,000 bp away from the 80 bp or 20bp maps equally well to 100,000 bp away) ?
Subjunc treats them as the equally best mapping locations.

Quote:
I prefer to look/search for introns which have their lengths within a given range in order to limit the search space for subjuncs.
Subjunc is very fast. You do not need to limit the search space for it.

Quote:
Is able subjuncs look for boundaries of introns without using the donor/receptor sites (i.e. conventional sites)? Or does subjuncs allow to weight equally the not-conventional donor/acceptor sites and the conventional ones?
Use the '--allJunctions' option, which allows the detection of exon splicing that uses non-canonical donor/receptor sites.
shi is offline   Reply With Quote
Old 09-27-2014, 12:18 AM   #30
ndaniel
Member
 
Location: Helsinki

Join Date: Feb 2009
Posts: 33
Default

Quote:
Originally Posted by shi View Post
The maximum allowed intron size is 500,000 bases in subjunc. There is no penalty applied for intron length. Long introns and short introns are treated in the same manner. Subjunc is capable of detecting introns at any position of the reads.



Subjunc treats them as the equally best mapping locations.



Subjunc is very fast. You do not need to limit the search space for it.



Use the '--allJunctions' option, which allows the detection of exon splicing that uses non-canonical donor/receptor sites.
Thanks Shi! The answers are really great (for me at least)!

Is there any way for the user to change the 500,000 bp limit (besides doing changes in the source code)?

Also what is the minimum overhang of a read which subjuncs is able to handle (for example, is it able to map/split a read of 100 bp as: 80bp+20bp, or 83bp+17bp, or 85bp+15bp, or 90bp+10bp, or 95bp+5bp)? There must be a limit for overhang and as far as I know no aligner would split a read of 100bp as 95bp+5bp (here most of the aligners would just soft clip the last 5bp and this is ok)!

Last edited by ndaniel; 09-27-2014 at 12:22 AM.
ndaniel is offline   Reply With Quote
Old 09-29-2014, 07:09 PM   #31
shi
Wei Shi
 
Location: Australia

Join Date: Feb 2010
Posts: 235
Default

Quote:
Is there any way for the user to change the 500,000 bp limit (besides doing changes in the source code)?
This is currently hard coded, although it is possible that we allow users to change it in the future. But I believe the best way is possibly to let subjunc determine the intron length for you and you can perform filtering in subjunc output later. Subjunc does not have bias toward long or short introns.

Quote:
Also what is the minimum overhang of a read which subjuncs is able to handle (for example, is it able to map/split a read of 100 bp as: 80bp+20bp, or 83bp+17bp, or 85bp+15bp, or 90bp+10bp, or 95bp+5bp)? There must be a limit for overhang and as far as I know no aligner would split a read of 100bp as 95bp+5bp (here most of the aligners would just soft clip the last 5bp and this is ok)!
As I said in my last reply, subjunc can detect the splicing site at any location of the read. It can split a 100bp read as 95bp+5bp, or even 99bp+1bp, if a confident splicing site was discovered. Subjunc achieves this by firstly generating a complete list of splicing sites by using all high-confidence junction reads (these reads typically contain splicing sites at the middle positions of the reads), and then re-aligning all the reads using these discovered splicing sites. Have a look at the paper below for more details about the algorithm:

http://www.ncbi.nlm.nih.gov/pubmed/23558742
shi is offline   Reply With Quote
Old 10-02-2014, 10:54 PM   #32
ndaniel
Member
 
Location: Helsinki

Join Date: Feb 2009
Posts: 33
Smile

Quote:
Originally Posted by shi View Post
This is currently hard coded, although it is possible that we allow users to change it in the future. But I believe the best way is possibly to let subjunc determine the intron length for you and you can perform filtering in subjunc output later. Subjunc does not have bias toward long or short introns.



As I said in my last reply, subjunc can detect the splicing site at any location of the read. It can split a 100bp read as 95bp+5bp, or even 99bp+1bp, if a confident splicing site was discovered. Subjunc achieves this by firstly generating a complete list of splicing sites by using all high-confidence junction reads (these reads typically contain splicing sites at the middle positions of the reads), and then re-aligning all the reads using these discovered splicing sites. Have a look at the paper below for more details about the algorithm:

http://www.ncbi.nlm.nih.gov/pubmed/23558742
Thanks! Definitely I will try subjuncs!
ndaniel is offline   Reply With Quote
Old 05-14-2015, 04:08 PM   #33
IonTom
Member
 
Location: Germany

Join Date: Apr 2014
Posts: 32
Default

So subjunc is now able to find non canonical junctions ?
Does this influence the aligment ?

Because the subjunc help still says:

Subjunc requires donor/receptor sites to be present when detecting exon-exon junctions. It can detect up to four junction locations in each exon-spanning read.


Are there plans to make subjunc support annotation files ?

What happens when of a mate pair only one read was mapped are both reported as unpaired ?

Which settings would you recommend for 150bp single end reads ?

When building the reference index ungapped i observed a decrease in the percentage of aligned reads when compared to a gapped index. Does this make sense ?


Many thanks for your help
IonTom is offline   Reply With Quote
Old 05-14-2015, 05:18 PM   #34
shi
Wei Shi
 
Location: Australia

Join Date: Feb 2010
Posts: 235
Default

Quote:
So subjunc is now able to find non canonical junctions ?
Does this influence the aligment ?

Because the subjunc help still says:

Subjunc requires donor/receptor sites to be present when detecting exon-exon junctions. It can detect up to four junction locations in each exon-spanning read.
Yes, subjunc can now detect non-canonical exon-exon junctions. Use the "--allJunctions" option. This will affect your alignment in that not only more junctions will be reported, but will more exon spanning reads be reported. However because the alignment now becomes more aggressive, you may have a increase on false alignments as well. You do not need to do this if you just want to perform an expression analysis.

Quote:
Are there plans to make subjunc support annotation files ?
Yes, this is on our to-do list. Hope this will further improve subjunc's accuracy on junction detection. Subjunc collects all candidate junction locations from the initial scan of reads and then uses the collected junctions to realign all the reads and to remove spurious junctions. We found this already works very well.

Quote:
What happens when of a mate pair only one read was mapped are both reported as unpaired ?
Not sure what your question was. If only one read was mapped, this read will be reported as unpaired. But the unmapped read from the same pair will also be reported along with the mapped one.

Quote:
Which settings would you recommend for 150bp single end reads ?
The default setting should work well. Subjunc has an excellent scalability and it has no problem in mapping reads of hundreds of bases long.

Quote:
When building the reference index ungapped i observed a decrease in the percentage of aligned reads when compared to a gapped index. Does this make sense ?
Could you please show me your commands with mapping using ungapped and gapped indices and also the percentages of aligned reads from each approach? What is version of Subread package you are using?

Wei
shi is offline   Reply With Quote
Reply

Tags
bowtie2, subread

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO