Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat with/without annotation,and cufflink with annotation?

    Hi,

    I want to know if I

    (1)run tophat with annotation and use accepted_hits.bam to run cufflink with annotation

    (2)run tophat without annotation and use accepted_hits.bam to run cufflink with annotation

    Does any one know what is difference in output result?

    Thanks

    Best Regard!

  • #2
    Annotation

    I am interested in this issue as well. The manual describes that TopHat will use exon records in the annotation file to build a set of known splice junctions for each gene, and will attempt to align reads to these junctions. I have a suspicion that without supplying annotation the software might be more sensitive to finding novel junctions, but that's just a guess. To test this, I have recently run several lanes/samples of data using both criteria with TopHat and the last of the Cufflinks data should be completed by this weekend. Is there anything specific in which you are interested? If so, let me know and I'd gladly look into it.

    Comment


    • #3
      Annotation

      Also, keep me in the loop and let me know if you find anything. Thanks!
      Last edited by rhcr56; 08-10-2011, 07:41 PM.

      Comment


      • #4
        how can you use tophat without annotation? You need reference to map the reads.

        Comment


        • #5
          Originally posted by chenyao View Post
          how can you use tophat without annotation? You need reference to map the reads.
          You can see paper or ask them why that can detect splicing junction without annotation.


          http://bioinformatics.oxfordjournals.org/content/25/9/1105.abstract


          TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

          You can choose with annotation or without annotation while using tophat.

          Of course it need to reference sequence;Annotation is mean gtf file,not reference seqerence.
          Last edited by louis7781x; 08-11-2011, 01:36 AM.

          Comment


          • #6
            Originally posted by louis7781x View Post
            You can see paper or ask them why that can detect splicing junction without annotation.


            http://bioinformatics.oxfordjournals.org/content/25/9/1105.abstract


            TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

            You can choose with annotation or without annotation while using tophat.

            Of course it need to reference sequence;Annotation is mean gtf file,not reference seqerence.
            But I don't see any command of tophat which provide an option to supply the annotation file.

            Comment


            • #7
              Originally posted by chenyao View Post
              But I don't see any command of tophat which provide an option to supply the annotation file.
              Supplying your own junctions:

              The options below allow you validate your own junctions with your RNA-Seq data. Note that the chromosome names in the files provided with the options below must match the names in the Bowtie index. These names are case-senstitive.

              -G/--GTF <GTF 2.2 file>

              Supply TopHat with a list of gene model annotations. TopHat will use the exon records in this file to build a set of known splice junctions for each gene, and will attempt to align reads to these junctions even if they would not normally be covered by the initial mapping.
              -j/--raw-juncs <.juncs file>

              Supply TopHat with a list of raw junctions. Junctions are specified one per line, in a tab-delimited format. Records look like:
              <chrom> <left> <right> <+/->

              left and right are zero-based coordinates, and specify the last character of the left sequenced to be spliced to the first character of the right sequence, inclusive. That is, the last and the first positions of the flanking exons. Users can convert junctions.bed (one of the TopHat outputs) to this format using bed_to_juncs < junctions.bed > new_list.juncs where bed_to_juncs can be found under the same folder as tophat
              --no-novel-juncs Only look for reads across junctions indicated in the supplied GFF or junctions file. (ignored without -G/-j)

              Comment


              • #8
                TopHat with or without GTF file?

                I have been using TopHat without a GTF file and counted the mapped reads with htseq. The results looked good. Now I am wondering whether I should be using a GTF file? I will run again using GTF file to look at differences. Anyone else notice which is best?

                Comment


                • #9
                  I run tophat without annotation, percentage of alignment is not as good as other alignment, say Elandrna. of course, when we test the alignment coverage, we chose refseq annotation. Anyone knows what's going on?

                  Comment


                  • #10
                    a few more reads with annotion file

                    Hi,
                    I found a few more reads when designating an annotation file.
                    Lana

                    Comment


                    • #11
                      The advantage of using the annotation GTF is for the mapping of reads from low expressed transcripts that cross an exon-exon junction. In the de novo mode a junction is only defined if a read flanks on both sides by at least 8 bp. (you can modify this setting:

                      -a/--min-anchor-length <int> The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3 and the default is 8.

                      So when you provide the GTF many of the reads in low coverage regions now align even if only 4 bp exist in one exon as the GTF basically says this is a known junction, go ahead and align to it.

                      In all my comparisons, the alignment rate is slightly better when using the GTF. The rate of improvement depends on the number of reads as a low read count increases the number of undefined junctions due to the anchor-length setting. As the number of reads increase so does the chance of defining the same junction in de novo mode. Obviously, the read length used also makes a difference, so the longer your read length, the less the GTF annotation improved the percent aligned.

                      By default I always use the GTF option.

                      Comment


                      • #12
                        What comparison (tools files) are you using to determine better alignment rates? For example do you look at coverageBed output or FPKM, and confidence levels?

                        Can you discuss a bit on the tools and resources you use to compare your tophat/cufflink results with and with out using a GTF?

                        Thanks!

                        Cynthia

                        Comment


                        • #13
                          Cynthia,
                          I use HTSeq to count the numbers of reads per gene.



                          Then I see that I am getting more reads per some genes when I use the GTP option.

                          Lana

                          Comment


                          • #14
                            In my case just samtools flagstat or picard collect alignment summary metrics

                            Code:
                            samtools flagstat MyTophatBam.bam > MyMetrics.txt
                            
                            or
                            
                            java -Xmx2g -jar CollectAlignmentSummaryMetrics.jar INPUT=MyTophatBam.bam OUTPUT=MyMetrics.txt VALIDATION_STRINGENCY=SILENT REFERENCE_SEQUENCE=MyGenome.fa ASSUME_SORTED=true IS_BISULFITE_SEQUENCED=false

                            Comment


                            • #15
                              some confusion

                              Originally posted by Jon_Keats View Post
                              The advantage of using the annotation GTF is for the mapping of reads from low expressed transcripts that cross an exon-exon junction. In the de novo mode a junction is only defined if a read flanks on both sides by at least 8 bp. (you can modify this setting:

                              -a/--min-anchor-length <int> The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3 and the default is 8.

                              So when you provide the GTF many of the reads in low coverage regions now align even if only 4 bp exist in one exon as the GTF basically says this is a known junction, go ahead and align to it.

                              In all my comparisons, the alignment rate is slightly better when using the GTF. The rate of improvement depends on the number of reads as a low read count increases the number of undefined junctions due to the anchor-length setting. As the number of reads increase so does the chance of defining the same junction in de novo mode. Obviously, the read length used also makes a difference, so the longer your read length, the less the GTF annotation improved the percent aligned.

                              By default I always use the GTF option.
                              sorry i may confused "anchor-length " vs "segmet-length" ..
                              if i used "--segment-length" 25, means a reads cut into segment at least this length, but "--min-anchor-length " 8 sure smaller this length
                              is says actually my reads also can cut up into 8bp to supported a junction not must longer my segment-length?

                              thanks

                              song

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X