Seqanswers Leaderboard Ad

**Jon_Keats** · 06-06-2013, 10:49 PM

Didn't happen in the previous version so I'm guessing its a result of the fix they put in for my last issue. I'm impressed with how fast they are turning around fixes so it makes it worth the effort.

**sdriscoll** · 06-06-2013, 10:58 PM

Customer service makes all the difference! I don't know if you have ever tried to get help from the Tophat/cufflinks people but man...those guys are impossible. So far Wei has been on top of things. The same goes for the BWA, STAR and RSEM devs. I've had pretty quick responses from all of them.

It's sort of convenient for me that these subread people are just getting to work when I'm about to go to sleep. I get back to the lab the next day and they've fixed stuff during the night.

**shi** · 06-07-2013, 04:21 AM

Dear sdriscoll and Jon,

Many thanks for your nice comments. We really appreciate you putting up with the bugs and helping us to improve our programs.

As you said, the sam output bug was introduced in v1.3.5. We have fixed it in v1.3.5-p1. We also enhanced the subread-buildindex to let it check the integrity of the provided reference sequences and report any unexpected characters in a more informative way.

The latest version v1.3.5-p1 can be downloaded from http://subread.sourceforge.net . We have done a more thorough test by using much bigger test datasets. Hope it works for you. But please let me know if found any other bugs.

Best wishes,

Wei

**Jon_Keats** · 06-09-2013, 11:02 PM

Running a test tonight...

**Jon_Keats** · 06-12-2013, 09:44 PM

Hi Shi,

Just wanted to thank you this last patch seems to be much improved. Now moving on to test featureCounts.

**shi** · 06-12-2013, 09:48 PM

Dear Jon,

No worries. Thanks for letting me know.

Please make sure you are using the latest version (1.3.5-p3). We made changed to featureCounts today. Let me know if you run into any problems.

Best wishes,

Wei

**kdmurray91** · 08-03-2013, 12:00 AM

Hello!

I've hit a problem with subread-align. I receive the following output from subread-align when trying to map paired end RNAseq data.

Code:

$ subread-align -r sample_R1.trimmed.fq.gz -R sample_R2.trimmed.fq.gz -o sample.bam -i ../refseqs/TAIR10_gen/TAIR10_gen

Number of selected subreads = 10
Consensus threshold = 3
Number of threads=1
Number of indels allowed=5


Performing paired-end alignment:
Maximum fragment length=600
Minimum fragment length=50
Threshold on number of subreads for a successful mapping (the minor end in the pair)=1
Number of anchors=10
The directions of the two input files are: forward, reversed

Out of memory. If you are using Rsubread in R, please save your working environment and restart R.

This is on the following platform:

Code:

$ uname -a
Linux host 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux

$ free -m
             total       used       free     shared    buffers     cached
Mem:        129161      56634      72526          0        196      54215
-/+ buffers/cache:       2222     126938
Swap:        95356         12      95344

subread-align -v gives "Subread 1.3.5-p4", even though it's from the 1.3.5-p5 tarball

I can attach the Fastq files if you would like, as these are ~7mb test files i've used to debug my analysis pipeline.

Kevin

**shi** · 08-03-2013, 12:18 AM

Dear Kevin,

It seems you provided gzipped fastq file to subread-aligner for alignment. But subread-align does not support gzipped input files. Please unzip them and run it again to see if you still have the same problem.

Best wishes,

Wei

**kdmurray91** · 08-03-2013, 12:24 AM

Originally posted by shi View Post

Dear Kevin,

It seems you provided gzipped fastq file to subread-aligner for alignment. But subread-align does not support gzipped input files. Please unzip them and run it again to see if you still have the same problem.

Best wishes,

Wei

Hi Wei,

That fixed the problem, thanks very much. Apologies for the stupid question.

Kevin

**shi** · 08-03-2013, 12:27 AM

There is no stupid question here. I'm glad that fixed the problem.

Cheers,
Wei

**ndaniel** · 09-22-2014, 01:33 AM

Hi Shi,

how one should run subjunc for a RNA-seq experiment such that the (i) input reads should have only on insertion, and (ii) the maximum length of the insertion is specified by the user (or how to specify the maximum length of a intron; this is needed because this depends from organism to organism)!

**shi** · 09-22-2014, 02:35 PM

Hi @ndaniel,

I don't quite understand your questions. It looks like you asked how to specify the maximum intron size in subjunc? Firstly, an exon-spanning read may span more than one exon, so what do you want to limit the detection of introns in each read to only one intron? One of the strengths of subjunc is that it can detect up to 4 introns in each read.

Secondly, you do not know what is the maximun length of introns in your data, so you'd better let subjunc detect it for you. Subjunc uses donor/receptor sites to accurately detect the boundaries of introns.

Wei

**ndaniel** · 09-24-2014, 10:24 AM

Originally posted by shi View Post

Hi @ndaniel,

I don't quite understand your questions. It looks like you asked how to specify the maximum intron size in subjunc?

Wei

Sorry for not explaining very well my question. :-(

Yes, what is the maximum size of the intron which subjunc can handle? Is there a (affine?) penalty related to the intron length? Does subjuncs weights an intron of length 10,000,000 bp long as one of 10,000 bp long? Is subjuncs able to find an intron of length 10,000,000 bp long? What is the minimum read overhang which subjuncs can handle (e.g. 10bp, 17bp, 20bp)?

How subjuncs treats this case when a read of 100 bp is split in 80+20 with an intron of length (a) 1,000 bp, or (b) 100,000 bp (that is 20bp maps 1,000 bp away from the 80 bp or 20bp maps equally well to 100,000 bp away) ?

Originally posted by shi View Post

Firstly, an exon-spanning read may span more than one exoni

I know but my question is not about those kind of reads. I am interested only in reads which spans two and only two exons (and one intron).

Originally posted by shi View Post

so what do you want to limit the detection of introns in each read to only one intron?

Yes. :-)

Originally posted by shi View Post

Secondly, you do not know what is the maximun length of introns in your data

Yes, I do know the maximum length of the introns in my data. Actually there are really exact estimates for annotated genomes about this!

Originally posted by shi View Post

Hi @ndaniel,

so you'd better let subjunc detect it for you

I prefer to look/search for introns which have their lengths within a given range in order to limit the search space for subjuncs.

Originally posted by shi View Post

Hi @ndaniel,
Subjunc uses donor/receptor sites to accurately detect the boundaries of introns.

Is able subjuncs look for boundaries of introns without using the donor/receptor sites (i.e. conventional sites)? Or does subjuncs allow to weight equally the not-conventional donor/acceptor sites and the conventional ones?

**shi** · 09-25-2014, 08:49 PM

Yes, what is the maximum size of the intron which subjunc can handle? Is there a (affine?) penalty related to the intron length? Does subjuncs weights an intron of length 10,000,000 bp long as one of 10,000 bp long? Is subjuncs able to find an intron of length 10,000,000 bp long? What is the minimum read overhang which subjuncs can handle (e.g. 10bp, 17bp, 20bp)?

The maximum allowed intron size is 500,000 bases in subjunc. There is no penalty applied for intron length. Long introns and short introns are treated in the same manner. Subjunc is capable of detecting introns at any position of the reads.

How subjuncs treats this case when a read of 100 bp is split in 80+20 with an intron of length (a) 1,000 bp, or (b) 100,000 bp (that is 20bp maps 1,000 bp away from the 80 bp or 20bp maps equally well to 100,000 bp away) ?

Subjunc treats them as the equally best mapping locations.

I prefer to look/search for introns which have their lengths within a given range in order to limit the search space for subjuncs.

Subjunc is very fast. You do not need to limit the search space for it.

Is able subjuncs look for boundaries of introns without using the donor/receptor sites (i.e. conventional sites)? Or does subjuncs allow to weight equally the not-conventional donor/acceptor sites and the conventional ones?

Use the '--allJunctions' option, which allows the detection of exon splicing that uses non-canonical donor/receptor sites.

**ndaniel** · 09-26-2014, 11:18 PM

Originally posted by shi View Post

The maximum allowed intron size is 500,000 bases in subjunc. There is no penalty applied for intron length. Long introns and short introns are treated in the same manner. Subjunc is capable of detecting introns at any position of the reads.

Subjunc treats them as the equally best mapping locations.

Subjunc is very fast. You do not need to limit the search space for it.

Use the '--allJunctions' option, which allows the detection of exon splicing that uses non-canonical donor/receptor sites.

Thanks Shi! The answers are really great (for me at least)!

Is there any way for the user to change the 500,000 bp limit (besides doing changes in the source code)?

Also what is the minimum overhang of a read which subjuncs is able to handle (for example, is it able to map/split a read of 100 bp as: 80bp+20bp, or 83bp+17bp, or 85bp+15bp, or 90bp+10bp, or 95bp+5bp)? There must be a limit for overhang and as far as I know no aligner would split a read of 100bp as 95bp+5bp (here most of the aligners would just soft clip the last 5bp and this is ok)!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 48 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News