Seqanswers Leaderboard Ad

**mikesh** · 09-03-2013, 01:42 AM

Originally posted by ymc View Post

I tried filtering. For >= 2 spanning reads, there are 80215 fusions. It crashed. Then I tried >=3 spanning reads. There are 26413 fusions. It also crashed. Next, >=4, 16920 fusions. Crashed. Next, >=5, 12334 fusions. Crashed. Next, >=6, 10651 fusions. Crashed. Next, >=7, 9248 fusions. Crashed. Next, >=8, 8352 fusions. Crashed. Next, >=9, 7652 fusions. Crashed. Next, >=10, 7066 fusions. Crashed. Next, >=11, 6532 fusions. Crashed. Next, >=12, 6107 fusions. Crashed. Next, >=13, 5707 fusions. Crashed. Next, >=14, 5389 fusions. Crashed. Next, >=15, 5109 fusions. Crashed. Next, >=16, 4867 fusions. Crashed. Next, >=17, 4641 fusions. Woohoo! Finally finishes!

I think it will be better if you just give a warning and skip the fusions you can't process...

I've checked the file you've uploaded. It was indeed the problem that I've mentioned (lack of expression data for some protein interaction partners). The package that I've shared (https://s3-eu-west-1.amazonaws.com/o...use-v1.0.3.zip) is working fine with no exceptions, while I was able to reproduce the problem with the old version.

As for filtering of fusions for tophat there are several types of reads: spanning, encompassing and contradictory. The should be set like >1 spanning, >10 encompassing and no contradictory.

Please also provide RNA-Star fusion output so I can implement and debug this input type.

**ymc** · 09-03-2013, 01:56 AM

Originally posted by mikesh View Post

I've checked the file you've uploaded. It was indeed the problem that I've mentioned (lack of expression data for some protein interaction partners). The package that I've shared (https://s3-eu-west-1.amazonaws.com/o...use-v1.0.3.zip) is working fine with no exceptions, while I was able to reproduce the problem with the old version.

As for filtering of fusions for tophat there are several types of reads: spanning, encompassing and contradictory. The should be set like >1 spanning, >10 encompassing and no contradictory.

Please also provide RNA-Star fusion output so I can implement and debug this input type.

I can't download from the URL you provided (even when I replace the ... with ncof). Can you upload it to where your home page is hosted? Thanks!

**ymc** · 09-03-2013, 02:01 AM

Originally posted by mikesh View Post

I've checked the file you've uploaded. It was indeed the problem that I've mentioned (lack of expression data for some protein interaction partners). The package that I've shared (https://s3-eu-west-1.amazonaws.com/o...use-v1.0.3.zip) is working fine with no exceptions, while I was able to reproduce the problem with the old version.

As for filtering of fusions for tophat there are several types of reads: spanning, encompassing and contradictory. The should be set like >1 spanning, >10 encompassing and no contradictory.

Please also provide RNA-Star fusion output so I can implement and debug this input type.

if I understand correctly, the 5th column is spanning, the 8th column is contradictory. But which column is "encompassing" in fusions.out?

**mikesh** · 09-03-2013, 02:13 AM

The new version will be uploaded to home page later in the day.
I've fixed the URL, it is: https://s3-eu-west-1.amazonaws.com/o...use-v1.0.3.zip. There seems to be some trouble with URL pasting here..

As for the tophat output, the columns 5-8 are:
5: span junction, 6: encompassing, 7: one mate spans, other is encompassing, 8: contradictory (supporting unbroken transcript)

**ymc** · 09-03-2013, 03:53 AM

By using the

awk '$5>1&&$6>10&&$8==0'

filter, my fusions.out comes down from 1.1mil lines to 2361 lines. The program can finish and I end up with 320 fusions.

There are five fusions in my sample that is confirmed experimentally:

MKL1-NIPA1 MKL1 22q13 NIPA1 15q11.2
HSPG2-TMCO4 HSPG2 1p36.1 TMCO4 1p36.13
NIPAL3-ATAD3B* NIPAL3 1p36.12 ATAD3B 1p36.33
UBFD1-CDH11* UBFD1 16p12 CDH11 16q21
SLC7A6-LRRC36 SLC7A6 16q22.1 LRRC36 16q22.1

But none of them show up in the oncofuse output with the aforementioned filter. They all show up if I use the unfiltered 333,687 fusions original fusions.out as input however. But then how can I find out these five from the 333,687 candidates?

FYI, in contrast, 4 out of 5 show up among the 97 fusions identified by tophat-fusion-post.

**mikesh** · 09-03-2013, 05:05 AM

Originally posted by ymc View Post

By using the

awk '$5>1&&$6>10&&$8==0'

filter, my fusions.out comes down from 1.1mil lines to 2361 lines. The program can finish and I end up with 320 fusions.

There are five fusions in my sample that is confirmed experimentally:

MKL1-NIPA1 MKL1 22q13 NIPA1 15q11.2
HSPG2-TMCO4 HSPG2 1p36.1 TMCO4 1p36.13
NIPAL3-ATAD3B* NIPAL3 1p36.12 ATAD3B 1p36.33
UBFD1-CDH11* UBFD1 16p12 CDH11 16q21
SLC7A6-LRRC36 SLC7A6 16q22.1 LRRC36 16q22.1

But none of them show up in the oncofuse output with the aforementioned filter. They all show up if I use the unfiltered 333,687 fusions original fusions.out as input however. But then how can I find out these five from the 333,687 candidates?

FYI, in contrast, 4 out of 5 show up among the 97 fusions identified by tophat-fusion-post.

It seems that tophat-fusion-post thresholds are more soft, like
awk '$5>1&&$6>3&&$8==0'. The selection of fusions that are detected in sample is the task that should be solved by fusion detection software. Anyways even if the fusion is detected in a sample it is far more likely to be a passenger one.

**ymc** · 09-03-2013, 06:21 PM

You can download the output generated from the same data from the URL below. It took only 30min for RNA-STAR to generate it versus 10 hours from tophat-fusion.

Home - Uploading.com

http://uploading.com/18e39ff1/Chimeric-out-junction-gz

Loading Your Brain with Interesting Articles

As to what those fields refer to, I suppose you can consult the RNA-STAR manual...

**ymc** · 09-03-2013, 06:38 PM

Originally posted by mikesh View Post

It seems that tophat-fusion-post thresholds are more soft, like
awk '$5>1&&$6>3&&$8==0'. The selection of fusions that are detected in sample is the task that should be solved by fusion detection software. Anyways even if the fusion is detected in a sample it is far more likely to be a passenger one.

I tried this awk '$5>1&&$6>3&&$8==0' filter. It has 439 fusions mapped. But it also has none of the five experimentally confirmed fusions.

You said "The selection of fusions that are detected in sample is the task that should be solved by fusion detection software." But which part in the tophat-fusion pipeline is the fusion detection software? I think tophat-fusion is just a mapping step. I think it is oncofuse or tophat-fusion-post's job to do the fusion detection. What you said can make sense if your oncofuse is taking tophat-fusion-post's output.

**mikesh** · 09-04-2013, 12:31 AM

Originally posted by ymc View Post

I tried this awk '$5>1&&$6>3&&$8==0' filter. It has 439 fusions mapped. But it also has none of the five experimentally confirmed fusions.

You said "The selection of fusions that are detected in sample is the task that should be solved by fusion detection software." But which part in the tophat-fusion pipeline is the fusion detection software? I think tophat-fusion is just a mapping step. I think it is oncofuse or tophat-fusion-post's job to do the fusion detection. What you said can make sense if your oncofuse is taking tophat-fusion-post's output.

I meant that Oncofuse is for functional analysis and that the confidence that fusion is present in sample at all should be first evaluated from the distribution of mapped reads.
Of course it is a good practice to pass tophat-fusion-post output to Oncofuse. But its quite strange that your fusions were missing, I believe detectable fusions should be quite robust to filter. How much spanning reads, etc they have exactly?

**ymc** · 09-04-2013, 07:16 AM

First 10 fields in fusions.out for MKL1-NIPA1. Spanning is 10 but encompassing is 0

chr15-chr22 23062318 40990677 ff 10 0 7
0 42 59 9.290000

First 10 fields in fusions.out for HSPG2-TMCO4. Spanning is 12 but encompassing is 6 and contradictory is 344

chr1-chr1 20107258 22198678 ff 12 6 13 344 65 17.409725

First 10 fields in fusions.out for NIPAL3-ATAD3B. Spanning is 39 and encompassing is 28 but contradictory is 43

chr1-chr1 1425636 24787033 rr 39 28 5 43 65 64 4.535832

First 10 fields in fusions.out for UBFD1-CDH11. Spanning is 14 and encompassing is 27 but contradictory is 852

chr16-chr16 23574050 64984920 fr 14 27 13 852 48 10.530613

First 10 fields in fusions.out for SLC7A6-LRRC36. Spanning is 6 and encompassing is 7 but contradictory is 664

chr16-chr16 67409160 68309151 rr 6 7 0 664 43 3.388890

Looks like contradictory>0 is not a reason to exclude tophat-fusion-post to exclude a fusion gene in its output.

**ymc** · 09-04-2013, 07:55 AM

I looked at the Chimeric.out.junction.gz generated by RNA-STAR.

For MKL1-NIPA1, there are 8 spanning reads. The junction becomes chr15-chr22 23062320 40990677.

For HSPG2-TMCO4, there are 8 spanning reads. The junction becomes chr1-chr1 20107260 22198678

For NIPAL3-ATAD3B, there are 17 spanning reads. The junction becomes chr1-chr1 1425636 24787035

For UBFD1-CDH11, there are 14 spanning reads. The junction becomes chr16-chr16 23574052 64984920

For SLC7A6-LRRC36, there are 4 spanning reads. The junction becomes chr16-chr16 67409160 68309153

Looks like some of the junctions are off by 2? I noticed that there are junctions that is off by a few bases. If we include them, the spanning reads count might be closer to tophat-fusion?

**mikesh** · 09-14-2013, 03:45 PM

Originally posted by ymc View Post

First 10 fields in fusions.out for MKL1-NIPA1. Spanning is 10 but encompassing is 0

chr15-chr22 23062318 40990677 ff 10 0 7
0 42 59 9.290000

First 10 fields in fusions.out for HSPG2-TMCO4. Spanning is 12 but encompassing is 6 and contradictory is 344

chr1-chr1 20107258 22198678 ff 12 6 13 344 65 17.409725

First 10 fields in fusions.out for NIPAL3-ATAD3B. Spanning is 39 and encompassing is 28 but contradictory is 43

chr1-chr1 1425636 24787033 rr 39 28 5 43 65 64 4.535832

First 10 fields in fusions.out for UBFD1-CDH11. Spanning is 14 and encompassing is 27 but contradictory is 852

chr16-chr16 23574050 64984920 fr 14 27 13 852 48 10.530613

First 10 fields in fusions.out for SLC7A6-LRRC36. Spanning is 6 and encompassing is 7 but contradictory is 664

chr16-chr16 67409160 68309151 rr 6 7 0 664 43 3.388890

Looks like contradictory>0 is not a reason to exclude tophat-fusion-post to exclude a fusion gene in its output.

Indeed, for a patient sample it could be meaningless. The examples from Tophat-fusion pages don't contain any contradictory reads, but they are derived from homogenous cell lines. Patient samples could contain the majority of cells with normal copies of fused genes.

Originally posted by ymc View Post

I looked at the Chimeric.out.junction.gz generated by RNA-STAR.

For MKL1-NIPA1, there are 8 spanning reads. The junction becomes chr15-chr22 23062320 40990677.

For HSPG2-TMCO4, there are 8 spanning reads. The junction becomes chr1-chr1 20107260 22198678

For NIPAL3-ATAD3B, there are 17 spanning reads. The junction becomes chr1-chr1 1425636 24787035

For UBFD1-CDH11, there are 14 spanning reads. The junction becomes chr16-chr16 23574052 64984920

For SLC7A6-LRRC36, there are 4 spanning reads. The junction becomes chr16-chr16 67409160 68309153

Looks like some of the junctions are off by 2? I noticed that there are junctions that is off by a few bases. If we include them, the spanning reads count might be closer to tophat-fusion?

Are these missing reads lying inside fused exons? If yes, then the total count of supporting reads should match. I think it would not be easy to compare the results of these tools. However if choosing a low threshold (1-2) of spanning reads and a high threshold (5-10) of spanning + encompassing reads the lists of putative fusions from these tools should match.

I've finally modified the code so Oncofuse now takes RNASTAR output as input, please see http://www.unav.es/genetica/oncofuse.html.

**NKAkers** · 09-09-2014, 10:57 AM

Can someone confirm my interpretation of these lines in the manual for Oncofuse?

P_VAL_CORR: The Bayesian probability of fusion being a passenger (class 0), given as Bonferroni-corrected P-value.
DRIVER_PROB: The Bayesian probability of fusion being a driver (class 1).

So one is a p value, the other a probability? Therefore driver-fusions should have high values (close to 1) for both columns?

Thanks in advance.

**mikesh** · 09-10-2014, 04:53 AM

Originally posted by NKAkers View Post

Can someone confirm my interpretation of these lines in the manual for Oncofuse?

P_VAL_CORR: The Bayesian probability of fusion being a passenger (class 0), given as Bonferroni-corrected P-value.
DRIVER_PROB: The Bayesian probability of fusion being a driver (class 1).

So one is a p value, the other a probability? Therefore driver-fusions should have high values (close to 1) for both columns?

Thanks in advance.

Hello!

Both initially are Bayesian probabilities. The first one is probability of being a passenger (class 0), the second a driver (class 1), with the sum of them being 1. As usually RNA-Seq experiment produces a plenty of novel fusions, the multiple testing correction should be performed. The H0 here is a fusion being a passenger, the probability of H0 is p(class 0), which is called P-value. Those are corrected using Bonferroni method. Values of p(class 1) are also provided for reference purposes.

So the P_VAL_CORR should be close to 0 and DRIVER_PROB should be close to 1

**NKAkers** · 09-10-2014, 06:20 AM

Thank you for the explanation mikesh!

I find it a little confusing, I would suggest a more direct explanation of the value:

P_VAL_CORR:The Bonferroni-corrected P-value for the hypothesis test where H0: Fusion is passenger, and H1: Fusion is driver.

Just my 2 cents though. Thanks for the great tool!!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News