Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by ymc View Post
    I tried filtering. For >= 2 spanning reads, there are 80215 fusions. It crashed. Then I tried >=3 spanning reads. There are 26413 fusions. It also crashed. Next, >=4, 16920 fusions. Crashed. Next, >=5, 12334 fusions. Crashed. Next, >=6, 10651 fusions. Crashed. Next, >=7, 9248 fusions. Crashed. Next, >=8, 8352 fusions. Crashed. Next, >=9, 7652 fusions. Crashed. Next, >=10, 7066 fusions. Crashed. Next, >=11, 6532 fusions. Crashed. Next, >=12, 6107 fusions. Crashed. Next, >=13, 5707 fusions. Crashed. Next, >=14, 5389 fusions. Crashed. Next, >=15, 5109 fusions. Crashed. Next, >=16, 4867 fusions. Crashed. Next, >=17, 4641 fusions. Woohoo! Finally finishes!

    I think it will be better if you just give a warning and skip the fusions you can't process...
    I've checked the file you've uploaded. It was indeed the problem that I've mentioned (lack of expression data for some protein interaction partners). The package that I've shared (https://s3-eu-west-1.amazonaws.com/o...use-v1.0.3.zip) is working fine with no exceptions, while I was able to reproduce the problem with the old version.

    As for filtering of fusions for tophat there are several types of reads: spanning, encompassing and contradictory. The should be set like >1 spanning, >10 encompassing and no contradictory.

    Please also provide RNA-Star fusion output so I can implement and debug this input type.
    Last edited by mikesh; 09-03-2013, 02:09 AM.

    Comment


    • #17
      Originally posted by mikesh View Post
      I've checked the file you've uploaded. It was indeed the problem that I've mentioned (lack of expression data for some protein interaction partners). The package that I've shared (https://s3-eu-west-1.amazonaws.com/o...use-v1.0.3.zip) is working fine with no exceptions, while I was able to reproduce the problem with the old version.

      As for filtering of fusions for tophat there are several types of reads: spanning, encompassing and contradictory. The should be set like >1 spanning, >10 encompassing and no contradictory.

      Please also provide RNA-Star fusion output so I can implement and debug this input type.
      I can't download from the URL you provided (even when I replace the ... with ncof). Can you upload it to where your home page is hosted? Thanks!

      Comment


      • #18
        Originally posted by mikesh View Post
        I've checked the file you've uploaded. It was indeed the problem that I've mentioned (lack of expression data for some protein interaction partners). The package that I've shared (https://s3-eu-west-1.amazonaws.com/o...use-v1.0.3.zip) is working fine with no exceptions, while I was able to reproduce the problem with the old version.

        As for filtering of fusions for tophat there are several types of reads: spanning, encompassing and contradictory. The should be set like >1 spanning, >10 encompassing and no contradictory.

        Please also provide RNA-Star fusion output so I can implement and debug this input type.
        if I understand correctly, the 5th column is spanning, the 8th column is contradictory. But which column is "encompassing" in fusions.out?

        Comment


        • #19
          The new version will be uploaded to home page later in the day.
          I've fixed the URL, it is: https://s3-eu-west-1.amazonaws.com/o...use-v1.0.3.zip. There seems to be some trouble with URL pasting here..

          As for the tophat output, the columns 5-8 are:
          5: span junction, 6: encompassing, 7: one mate spans, other is encompassing, 8: contradictory (supporting unbroken transcript)

          Comment


          • #20
            By using the

            awk '$5>1&&$6>10&&$8==0'

            filter, my fusions.out comes down from 1.1mil lines to 2361 lines. The program can finish and I end up with 320 fusions.

            There are five fusions in my sample that is confirmed experimentally:

            MKL1-NIPA1 MKL1 22q13 NIPA1 15q11.2
            HSPG2-TMCO4 HSPG2 1p36.1 TMCO4 1p36.13
            NIPAL3-ATAD3B* NIPAL3 1p36.12 ATAD3B 1p36.33
            UBFD1-CDH11* UBFD1 16p12 CDH11 16q21
            SLC7A6-LRRC36 SLC7A6 16q22.1 LRRC36 16q22.1

            But none of them show up in the oncofuse output with the aforementioned filter. They all show up if I use the unfiltered 333,687 fusions original fusions.out as input however. But then how can I find out these five from the 333,687 candidates?

            FYI, in contrast, 4 out of 5 show up among the 97 fusions identified by tophat-fusion-post.

            Comment


            • #21
              Originally posted by ymc View Post
              By using the

              awk '$5>1&&$6>10&&$8==0'

              filter, my fusions.out comes down from 1.1mil lines to 2361 lines. The program can finish and I end up with 320 fusions.

              There are five fusions in my sample that is confirmed experimentally:

              MKL1-NIPA1 MKL1 22q13 NIPA1 15q11.2
              HSPG2-TMCO4 HSPG2 1p36.1 TMCO4 1p36.13
              NIPAL3-ATAD3B* NIPAL3 1p36.12 ATAD3B 1p36.33
              UBFD1-CDH11* UBFD1 16p12 CDH11 16q21
              SLC7A6-LRRC36 SLC7A6 16q22.1 LRRC36 16q22.1

              But none of them show up in the oncofuse output with the aforementioned filter. They all show up if I use the unfiltered 333,687 fusions original fusions.out as input however. But then how can I find out these five from the 333,687 candidates?

              FYI, in contrast, 4 out of 5 show up among the 97 fusions identified by tophat-fusion-post.
              It seems that tophat-fusion-post thresholds are more soft, like
              awk '$5>1&&$6>3&&$8==0'. The selection of fusions that are detected in sample is the task that should be solved by fusion detection software. Anyways even if the fusion is detected in a sample it is far more likely to be a passenger one.

              Comment


              • #22
                You can download the output generated from the same data from the URL below. It took only 30min for RNA-STAR to generate it versus 10 hours from tophat-fusion.



                As to what those fields refer to, I suppose you can consult the RNA-STAR manual...

                Comment


                • #23
                  Originally posted by mikesh View Post
                  It seems that tophat-fusion-post thresholds are more soft, like
                  awk '$5>1&&$6>3&&$8==0'. The selection of fusions that are detected in sample is the task that should be solved by fusion detection software. Anyways even if the fusion is detected in a sample it is far more likely to be a passenger one.
                  I tried this awk '$5>1&&$6>3&&$8==0' filter. It has 439 fusions mapped. But it also has none of the five experimentally confirmed fusions.

                  You said "The selection of fusions that are detected in sample is the task that should be solved by fusion detection software." But which part in the tophat-fusion pipeline is the fusion detection software? I think tophat-fusion is just a mapping step. I think it is oncofuse or tophat-fusion-post's job to do the fusion detection. What you said can make sense if your oncofuse is taking tophat-fusion-post's output.

                  Comment


                  • #24
                    Originally posted by ymc View Post
                    I tried this awk '$5>1&&$6>3&&$8==0' filter. It has 439 fusions mapped. But it also has none of the five experimentally confirmed fusions.

                    You said "The selection of fusions that are detected in sample is the task that should be solved by fusion detection software." But which part in the tophat-fusion pipeline is the fusion detection software? I think tophat-fusion is just a mapping step. I think it is oncofuse or tophat-fusion-post's job to do the fusion detection. What you said can make sense if your oncofuse is taking tophat-fusion-post's output.
                    I meant that Oncofuse is for functional analysis and that the confidence that fusion is present in sample at all should be first evaluated from the distribution of mapped reads.
                    Of course it is a good practice to pass tophat-fusion-post output to Oncofuse. But its quite strange that your fusions were missing, I believe detectable fusions should be quite robust to filter. How much spanning reads, etc they have exactly?

                    Comment


                    • #25
                      First 10 fields in fusions.out for MKL1-NIPA1. Spanning is 10 but encompassing is 0

                      chr15-chr22 23062318 40990677 ff 10 0 7
                      0 42 59 9.290000

                      First 10 fields in fusions.out for HSPG2-TMCO4. Spanning is 12 but encompassing is 6 and contradictory is 344

                      chr1-chr1 20107258 22198678 ff 12 6 13 344 65 17.409725

                      First 10 fields in fusions.out for NIPAL3-ATAD3B. Spanning is 39 and encompassing is 28 but contradictory is 43

                      chr1-chr1 1425636 24787033 rr 39 28 5 43 65 64 4.535832

                      First 10 fields in fusions.out for UBFD1-CDH11. Spanning is 14 and encompassing is 27 but contradictory is 852

                      chr16-chr16 23574050 64984920 fr 14 27 13 852 48 10.530613

                      First 10 fields in fusions.out for SLC7A6-LRRC36. Spanning is 6 and encompassing is 7 but contradictory is 664

                      chr16-chr16 67409160 68309151 rr 6 7 0 664 43 3.388890

                      Looks like contradictory>0 is not a reason to exclude tophat-fusion-post to exclude a fusion gene in its output.

                      Comment


                      • #26
                        I looked at the Chimeric.out.junction.gz generated by RNA-STAR.

                        For MKL1-NIPA1, there are 8 spanning reads. The junction becomes chr15-chr22 23062320 40990677.

                        For HSPG2-TMCO4, there are 8 spanning reads. The junction becomes chr1-chr1 20107260 22198678

                        For NIPAL3-ATAD3B, there are 17 spanning reads. The junction becomes chr1-chr1 1425636 24787035

                        For UBFD1-CDH11, there are 14 spanning reads. The junction becomes chr16-chr16 23574052 64984920

                        For SLC7A6-LRRC36, there are 4 spanning reads. The junction becomes chr16-chr16 67409160 68309153

                        Looks like some of the junctions are off by 2? I noticed that there are junctions that is off by a few bases. If we include them, the spanning reads count might be closer to tophat-fusion?

                        Comment


                        • #27
                          Originally posted by ymc View Post
                          First 10 fields in fusions.out for MKL1-NIPA1. Spanning is 10 but encompassing is 0

                          chr15-chr22 23062318 40990677 ff 10 0 7
                          0 42 59 9.290000

                          First 10 fields in fusions.out for HSPG2-TMCO4. Spanning is 12 but encompassing is 6 and contradictory is 344

                          chr1-chr1 20107258 22198678 ff 12 6 13 344 65 17.409725

                          First 10 fields in fusions.out for NIPAL3-ATAD3B. Spanning is 39 and encompassing is 28 but contradictory is 43

                          chr1-chr1 1425636 24787033 rr 39 28 5 43 65 64 4.535832

                          First 10 fields in fusions.out for UBFD1-CDH11. Spanning is 14 and encompassing is 27 but contradictory is 852

                          chr16-chr16 23574050 64984920 fr 14 27 13 852 48 10.530613

                          First 10 fields in fusions.out for SLC7A6-LRRC36. Spanning is 6 and encompassing is 7 but contradictory is 664

                          chr16-chr16 67409160 68309151 rr 6 7 0 664 43 3.388890

                          Looks like contradictory>0 is not a reason to exclude tophat-fusion-post to exclude a fusion gene in its output.
                          Indeed, for a patient sample it could be meaningless. The examples from Tophat-fusion pages don't contain any contradictory reads, but they are derived from homogenous cell lines. Patient samples could contain the majority of cells with normal copies of fused genes.

                          Originally posted by ymc View Post
                          I looked at the Chimeric.out.junction.gz generated by RNA-STAR.

                          For MKL1-NIPA1, there are 8 spanning reads. The junction becomes chr15-chr22 23062320 40990677.

                          For HSPG2-TMCO4, there are 8 spanning reads. The junction becomes chr1-chr1 20107260 22198678

                          For NIPAL3-ATAD3B, there are 17 spanning reads. The junction becomes chr1-chr1 1425636 24787035

                          For UBFD1-CDH11, there are 14 spanning reads. The junction becomes chr16-chr16 23574052 64984920

                          For SLC7A6-LRRC36, there are 4 spanning reads. The junction becomes chr16-chr16 67409160 68309153

                          Looks like some of the junctions are off by 2? I noticed that there are junctions that is off by a few bases. If we include them, the spanning reads count might be closer to tophat-fusion?
                          Are these missing reads lying inside fused exons? If yes, then the total count of supporting reads should match. I think it would not be easy to compare the results of these tools. However if choosing a low threshold (1-2) of spanning reads and a high threshold (5-10) of spanning + encompassing reads the lists of putative fusions from these tools should match.

                          I've finally modified the code so Oncofuse now takes RNASTAR output as input, please see http://www.unav.es/genetica/oncofuse.html.

                          Comment


                          • #28
                            Can someone confirm my interpretation of these lines in the manual for Oncofuse?

                            P_VAL_CORR: The Bayesian probability of fusion being a passenger (class 0), given as Bonferroni-corrected P-value.
                            DRIVER_PROB: The Bayesian probability of fusion being a driver (class 1).

                            So one is a p value, the other a probability? Therefore driver-fusions should have high values (close to 1) for both columns?

                            Thanks in advance.

                            Comment


                            • #29
                              Originally posted by NKAkers View Post
                              Can someone confirm my interpretation of these lines in the manual for Oncofuse?

                              P_VAL_CORR: The Bayesian probability of fusion being a passenger (class 0), given as Bonferroni-corrected P-value.
                              DRIVER_PROB: The Bayesian probability of fusion being a driver (class 1).

                              So one is a p value, the other a probability? Therefore driver-fusions should have high values (close to 1) for both columns?

                              Thanks in advance.
                              Hello!

                              Both initially are Bayesian probabilities. The first one is probability of being a passenger (class 0), the second a driver (class 1), with the sum of them being 1. As usually RNA-Seq experiment produces a plenty of novel fusions, the multiple testing correction should be performed. The H0 here is a fusion being a passenger, the probability of H0 is p(class 0), which is called P-value. Those are corrected using Bonferroni method. Values of p(class 1) are also provided for reference purposes.

                              So the P_VAL_CORR should be close to 0 and DRIVER_PROB should be close to 1

                              Comment


                              • #30
                                Thank you for the explanation mikesh!

                                I find it a little confusing, I would suggest a more direct explanation of the value:

                                P_VAL_CORR:The Bonferroni-corrected P-value for the hypothesis test where H0: Fusion is passenger, and H1: Fusion is driver.

                                Just my 2 cents though. Thanks for the great tool!!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X