Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • genomeHunter
    Member
    • Apr 2013
    • 26

    Generating splicing graph

    Hello everyone,

    I have a set of gapped aligned RNAseq reads and I want to generate the splicing graph. I was wondering if you could introduce a tool.

    Cheers,
    GH
  • dietmar13
    Senior Member
    • Mar 2010
    • 107

    #2
    e.g. SpliceGrapher

    Comment

    • genomeHunter
      Member
      • Apr 2013
      • 26

      #3
      Thanks dietmar13. We have tried it, but its slow and some results are very strange. We also tried Cufflinks with the --GTF-guide option, but it takes a lifetime to run and generates a ton of two-exon transcripts.

      I am looking for a simple and reliable tool that just generates all possible isoforms from the reads.

      GH

      Comment

      • shi
        Wei Shi
        • Feb 2010
        • 236

        #4
        Hi GH,

        Not sure if this is useful to you, but you may try the Subjunc program included in the Subread package (http://subread.sourceforge.net). Subjunc finds all possible exon-exon junctions from RNA-seq reads. It uses a novel read mapping paradigm called 'seed-and-vote' to map reads and discover exon-exon junctions (http://nar.oxfordjournals.org/conten...kt214.abstract). It is an extremely fast junction detector.

        Cheers
        Wei

        Comment

        • genomeHunter
          Member
          • Apr 2013
          • 26

          #5
          Thank you so much Wei! I saw the paper a while ago and I will definitely give it a try.

          GH

          Comment

          • dietmar13
            Senior Member
            • Mar 2010
            • 107

            #6
            RNA-Seq Unified Mapper (RUM)

            comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

            RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

            RUM provides several output files for these spliced reads / junctions...

            Comment

            • genomeHunter
              Member
              • Apr 2013
              • 26

              #7
              Very interesting. We have been using STAR because we found it to be much (~25-50X) faster than Bowtie2, while being more accurate, but we have not tried RUM yet.

              Your stats indicate a nearly 50% improvement over STAR. Have you seen any other performance evaluations for RUM?

              GH

              Comment

              • alexdobin
                Senior Member
                • Feb 2009
                • 161

                #8
                Originally posted by dietmar13 View Post
                comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

                RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

                RUM provides several output files for these spliced reads / junctions...
                Hi Dietmar,

                I was wondering if you could share the details of this evaluation. I have compared RUM with STAR in our paper, and RUM showed similar or lower sensitivity to junctions on both simulated and real data. Are you using annotations for both RUM and STAR in this evaluation? If you used STAR without annotations, you would see approximately ~50% fewer spliced reads, which could explain this large difference.

                Cheers
                Alex

                Comment

                • dietmar13
                  Senior Member
                  • Mar 2010
                  • 107

                  #9
                  Hi Alex,
                  ... you could share the detail ...
                  of course (STAR without annotations, but RUM with annotations) - perhaps you are right, I have to test the new STAR with annotations:
                  RUM 1.10
                  STAR 2.0.0
                  (I know, somewhat outdated, but perhaps more for RUM)

                  Illumina 2 x 76 PE.

                  For STAR default parameter file (<parametersDefault>, typical for mapping of 2 x 76 Illumina reads):
                  Code:
                  STAR --genomeDir <genome> --genomeLoad LoadAndKeep              \
                     --outFilterMismatchNmax 4 --outFilterMismatchNoverLmax 0.1 \
                     -- outFilterMatchNmin 40 --readFilesIn                          \
                     <sample#1_1.fastq> <sample#1_2.fastq>
                  Code:
                  perl RUM_runner.pl lib/rum.config_hg19 <sample#1_1.fastq>,,,<sample#1_2.fastq> \
                  $tmp 12 $name -limitBowtieNU
                  RSeQC results see picture:
                  Attached Files
                  Last edited by dietmar13; 04-07-2013, 01:35 PM.

                  Comment

                  • shi
                    Wei Shi
                    • Feb 2010
                    • 236

                    #10
                    The comparisons should not only be performed in terms of mapping percentage, but more importantly they should be carried out in terms of accuracy. Our evaluation results shows that Subjunc is much more accurate than competing methods using simulation data and SEQC data (Tables 6 and 7 in http://nar.oxfordjournals.org/conten...kt214.abstract).

                    Comment

                    • shi
                      Wei Shi
                      • Feb 2010
                      • 236

                      #11
                      Originally posted by dietmar13 View Post
                      comparing different mapper with 3 mio reads 70 bases PE RNA-seq data:

                      RUM (189 k) >> STAR (127 k) > tophat (124 k) > subread/subjunc (99 k) spliced reads mapped.

                      RUM provides several output files for these spliced reads / junctions...
                      Did you run subread or subjunc here? For mapping junction reads, you should run subjunc. Also, what was the version you used?

                      Wei

                      Comment

                      • dietmar13
                        Senior Member
                        • Mar 2010
                        • 107

                        #12
                        alex, STAR won: + 2% (STAR with annotation: 192,899 - RUM 1.10: 189,054 (I should try RUM 2...))
                        why are there differences in statistics: RSeQC says 192,899 splice reads and the log file of STAR even 202,082 ?

                        wei: I can't decide accuracy, because I don't know the right number of spliced reads ...
                        subread (-J) -> subjunc (v.1.3.1)

                        STAR 2.3.0 with gencode.v14 annotation:

                        RSeQC:
                        Total Records: 2424478
                        QC failed: 0
                        Optical/PCR duplicate: 0
                        Non Primary Hits 131750
                        Unmapped reads: 0
                        Multiple mapped reads: 110496

                        Uniquely mapped: 2182232
                        Read-1: 1091116
                        Read-2: 1091116
                        Reads map to '+': 1091116
                        Reads map to '-': 1091116
                        Non-splice reads: 1989333
                        Splice reads: 192899
                        Reads mapped in proper pairs: 2182232
                        whereas STAR-log file says:
                        Mapping speed, Million of reads per hour | 270.55

                        Number of input reads | 3081258
                        Average input read length | 152
                        UNIQUE READS:
                        Uniquely mapped reads number | 1091116
                        Uniquely mapped reads % | 35.41%
                        Average mapped length | 145.84
                        Number of splices: Total | 202082
                        Number of splices: Annotated (sjdb) | 193745
                        Number of splices: GT/AG | 199463
                        Number of splices: GC/AG | 1217
                        Number of splices: AT/AC | 164
                        Number of splices: Non-canonical | 1238
                        Mismatch rate per base, % | 1.97%
                        Deletion rate per base | 0.01%
                        Deletion average length | 1.48
                        Insertion rate per base | 0.01%
                        Insertion average length | 1.93
                        MULTI-MAPPING READS:
                        Number of reads mapped to multiple loci | 55248
                        % of reads mapped to multiple loci | 1.79%
                        Number of reads mapped to too many loci | 76
                        % of reads mapped to too many loci | 0.00%
                        UNMAPPED READS:
                        % of reads unmapped: too many mismatches | 0.00%
                        % of reads unmapped: too short | 62.79%
                        % of reads unmapped: other | 0.01%

                        Comment

                        • shi
                          Wei Shi
                          • Feb 2010
                          • 236

                          #13
                          Hi Dietmar,

                          To make a rigorous evaluation for the junction detectors, you may have to create some simulation data to test them. For example, you can create exon-spanning reads from the human genome using the annotated exon information and this will enable you to assess both sensitivity and accuracy of alternative methods. It will be interesting to see the speed differences between these methods as well.

                          You many consider using 100bp reads instead of 75bp reads because state of the art sequencers are now typically generating ~100bp reads. You may see different methods behave differently when you use longer reads.

                          Cheers,
                          Wei

                          Comment

                          • shi
                            Wei Shi
                            • Feb 2010
                            • 236

                            #14
                            We will be happy to share with you the simulation data and also the code for generating these data if you want to use them in your evaluation.

                            Cheers,
                            Wei

                            Comment

                            • alexdobin
                              Senior Member
                              • Feb 2009
                              • 161

                              #15
                              Originally posted by dietmar13 View Post
                              alex, STAR won: + 2% (STAR with annotation: 192,899 - RUM 1.10: 189,054 (I should try RUM 2...))
                              why are there differences in statistics: RSeQC says 192,899 splice reads and the log file of STAR even 202,082 ?
                              In the Log.final.out, STAR counts the total number of "splices" - you can get it by counting the total number of N-operations in CIGARs of all unique alignments. Since some spliced reads can have more than one splice, the number of splices is bigger than the number of spliced reads, which is output by RSeQC, I guess.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              57 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              200 views
                              0 reactions
                              Last Post seqadmin  
                              Working...