Seqanswers Leaderboard Ad

**alexdobin** · 02-09-2016, 04:26 PM

Hi @harrike,

--outSJfilterCountUniqueMin 10 10 10 10 --outFilterType BySJout
will remove all junctions that are supported by <10 uniquely mapping reads. The 4 numbers are for different intron motifs: non-canonical, GT/AG, GC/AG, AT/AC.
You can also use --outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout, which will remove junctions supported by <10 reads including unique and multimappers.

This option will work with 1-pass or 2-pass mapping.

There are also other splice junction filtering options - please check the manual for outSJfilter* otpions, they can all be used together in "AND" fashion.

Cheers
Alex

**harrike** · 02-09-2016, 07:13 PM

Hi Alex,

Great ! Thank you very much for your reply. I am testing those settings.

I am curious about what happens to those uniquely mapped reads spanning a junction with <10 reads support. Are they considered as unmapped reads?

Thanks,

Rui

**harrike** · 02-09-2016, 07:58 PM

Hi Alex,

I tried one of my libraries, using the "--outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout". In the *SJ.out.tab file, some of the junctions are still of <10 reads support (see below). Should the sum of the column 7 and 8 be >=10?

scaffold2131 22287 22614 1 1 0 62 0 39
scaffold2131 22322 22614 1 1 0 3 0 23
scaffold2131 22504 22614 1 1 0 1 0 35
scaffold2131 22527 22614 1 1 0 1 0 16
scaffold2131 22702 22827 1 1 0 40 0 44
scaffold2131 22933 23197 1 1 0 31 0 44
scaffold2131 23273 23348 1 1 0 157 0 45
scaffold2131 23479 23582 1 1 0 94 61 44
scaffold2131 23675 23875 1 1 0 134 24 44
scaffold2131 23927 24382 1 1 0 150 0 44
scaffold2131 24460 24565 1 1 0 131 23 44
scaffold2131 24621 24694 1 1 0 2 0 34
scaffold2131 24621 24699 1 1 0 1 0 40

The command I used is:

STAR --genomeDir /home/harrikex/workspot/litchi/RNAseq/star_index/ --runThreadN 24 --readFilesIn 121224_I663_FCC1L8RACXX_L6_LITgqjTABRAAPEI-62_1.fq 121224_I663_FCC1L8RACXX_L6_LITgqjTABRAAPEI-62_2.fq --outSAMtype BAM Unsorted --outFilterMultimapNmax 20 --alignIntronMax 10000 --alignMatesGapMax 10000 --outSAMstrandField intronMotif --alignEndsType EndToEnd --outFilterIntronMotifs RemoveNoncanonical --outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout --outFileNamePrefix FMO_121224_

Thanks,

Rui

**Macspider** · 02-10-2016, 02:02 AM

The 2-pass pipeline for STAR works very fine, I used it many times. Moreover, STAR with Cufflinks scores among the best results in alignment-based quantification pipelines when dealing with not very specific samples, e.g. tissue samples or whole-organism samples.

**alexdobin** · 02-10-2016, 08:02 AM

Originally posted by harrike View Post

Hi Alex,

I tried one of my libraries, using the "--outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout". In the *SJ.out.tab file, some of the junctions are still of <10 reads support (see below). Should the sum of the column 7 and 8 be >=10?

...

Thanks,

Rui

Hi Rui,

sorry, I forgot that --outSJfilterCountUniqueMin and --outSJfilterCountTotalMin filters are used in the OR fashion, i.e. if one or the other is satisifed, the junction will be kept - so you have to specify both options. If you want only the Total filter, you need to make --outSJfilterCountUniqueMin numbers larger or -1, e.g. --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10 .

The reads that cross the junctions that do not pass this filter will be re-mapped forcing them not to cross these junctions. Most often this will result in soft-clipped reads, i.e. donor or acceptor part of the read over the prohibited junction will be dropped.

Cheers
Alex

**harrike** · 02-10-2016, 08:14 PM

Originally posted by alexdobin View Post

Hi Rui,

sorry, I forgot that --outSJfilterCountUniqueMin and --outSJfilterCountTotalMin filters are used in the OR fashion, i.e. if one or the other is satisifed, the junction will be kept - so you have to specify both options. If you want only the Total filter, you need to make --outSJfilterCountUniqueMin numbers larger or -1, e.g. --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10 .

The reads that cross the junctions that do not pass this filter will be re-mapped forcing them not to cross these junctions. Most often this will result in soft-clipped reads, i.e. donor or acceptor part of the read over the prohibited junction will be dropped.

Cheers
Alex

Hi Alex,

Thanks for your explanation. I tried the combination of "--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10", it works. It has only half of the junctions (using the default setting) reported. It seems to have greater reliability.

One more question: After the 1st-pass mapping, using the above setting, only the junctions with >=10 reads support will be reported into the *SJ.out.tab file. As I have a couple of samples, I will get a few *SJ.out.tab files. In the 2nd-pass, all these *SJ.out.tab will be used as references (annotated junctions) for the mapping, do I still need to use the setting of "--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10" ? as it is possible, a few reads, <10 in one library, are mapped to a junction which is of >10 read support in another library; in this case, I still want those read not to be filtered out. On the other hand, if I don't use those settings, I feel many false positives will be introduced in the 2nd-pass mapping.

In other words, in the 2nd-pass mapping, will all the reads be preferentially mapped according the annotated junctions (provide in the *SJ.out.tab files using --sjdbFileChrStartEnd)? If this is true, I feel it is likely good without the >10 filter (--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10) in the 2nd-pass mapping.

Thanks,

Rui

**alexdobin** · 02-11-2016, 08:57 AM

Hi Rui,

the --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10 filter will not apply to the "annotated" junctions, i.e. junctions that are introduced via --sjdbFileChrStartEnd or --sjdbGTFfile options. So in the 2nd pass the junctions from your combined-SJ file will not be filtered out.
However, STAR will still be detecting the junctions which never made it to combined-SJ file, and you will have to use this filter to get rid of them.

Cheers
Alex

**harrike** · 02-11-2016, 11:23 AM

Hi Alex,

Great, thanks for your clarification.

I decide to using the filter "--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10" for my 2nd-pass mapping.

Rui

**biokloot** · 02-25-2016, 05:38 PM

Dear Rui and Alex,
Many thanks for asking and explaining the SJout filter combinations so well!
I want to completely suppress detection of any and all SJ in the second pass. Do need to set --outSJfilterCountUniqueMin -1 -1 -1 -1 AND --outSJfilterCountTotalMin -1 -1 -1 -1 for the second passes to achieve that?

**harrike** · 02-25-2016, 06:25 PM

Hi biokloot,

To me, all "-1" does suppress all SJ, as "-1 means no output for that motif". But I am not sure how a read spanning an intron will be mapped in this case.

I am sure Alex will give a good clarification.

Rui

**alexdobin** · 02-26-2016, 03:26 PM

Originally posted by biokloot View Post

Dear Rui and Alex,
Many thanks for asking and explaining the SJout filter combinations so well!
I want to completely suppress detection of any and all SJ in the second pass. Do need to set --outSJfilterCountUniqueMin -1 -1 -1 -1 AND --outSJfilterCountTotalMin -1 -1 -1 -1 for the second passes to achieve that?

Hi @biokloot,

the best way to suppress detection of any new junctions in 2nd pass (i.e. junctions not detected in the 1st pass) is to use --alignSJoverhang 100000 (or any number > read length).
Alternatively, you can use
--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin -1 -1 -1 -1 --outFilterType BySJout
or
--outSJfilterOverhangMin -1 -1 -1 -1 --outFilterType BySJout
but it's slightly slower.

Cheers
Alex

**biokloot** · 02-28-2016, 01:08 AM

Thanks for the quick replies, Rui and Alex!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

STAR coupled with Cufflinks

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News