Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat with/without annotation,and cufflink with annotation?

    Hi,

    I want to know if I

    (1)run tophat with annotation and use accepted_hits.bam to run cufflink with annotation

    (2)run tophat without annotation and use accepted_hits.bam to run cufflink with annotation

    Does any one know what is difference in output result?

    Thanks

    Best Regard!

  • #2
    Annotation

    I am interested in this issue as well. The manual describes that TopHat will use exon records in the annotation file to build a set of known splice junctions for each gene, and will attempt to align reads to these junctions. I have a suspicion that without supplying annotation the software might be more sensitive to finding novel junctions, but that's just a guess. To test this, I have recently run several lanes/samples of data using both criteria with TopHat and the last of the Cufflinks data should be completed by this weekend. Is there anything specific in which you are interested? If so, let me know and I'd gladly look into it.

    Comment


    • #3
      Annotation

      Also, keep me in the loop and let me know if you find anything. Thanks!
      Last edited by rhcr56; 08-10-2011, 07:41 PM.

      Comment


      • #4
        how can you use tophat without annotation? You need reference to map the reads.

        Comment


        • #5
          Originally posted by chenyao View Post
          how can you use tophat without annotation? You need reference to map the reads.
          You can see paper or ask them why that can detect splicing junction without annotation.


          http://bioinformatics.oxfordjournals.org/content/25/9/1105.abstract


          TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

          You can choose with annotation or without annotation while using tophat.

          Of course it need to reference sequence;Annotation is mean gtf file,not reference seqerence.
          Last edited by louis7781x; 08-11-2011, 01:36 AM.

          Comment


          • #6
            Originally posted by louis7781x View Post
            You can see paper or ask them why that can detect splicing junction without annotation.


            http://bioinformatics.oxfordjournals.org/content/25/9/1105.abstract


            TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

            You can choose with annotation or without annotation while using tophat.

            Of course it need to reference sequence;Annotation is mean gtf file,not reference seqerence.
            But I don't see any command of tophat which provide an option to supply the annotation file.

            Comment


            • #7
              Originally posted by chenyao View Post
              But I don't see any command of tophat which provide an option to supply the annotation file.
              Supplying your own junctions:

              The options below allow you validate your own junctions with your RNA-Seq data. Note that the chromosome names in the files provided with the options below must match the names in the Bowtie index. These names are case-senstitive.

              -G/--GTF <GTF 2.2 file>

              Supply TopHat with a list of gene model annotations. TopHat will use the exon records in this file to build a set of known splice junctions for each gene, and will attempt to align reads to these junctions even if they would not normally be covered by the initial mapping.
              -j/--raw-juncs <.juncs file>

              Supply TopHat with a list of raw junctions. Junctions are specified one per line, in a tab-delimited format. Records look like:
              <chrom> <left> <right> <+/->

              left and right are zero-based coordinates, and specify the last character of the left sequenced to be spliced to the first character of the right sequence, inclusive. That is, the last and the first positions of the flanking exons. Users can convert junctions.bed (one of the TopHat outputs) to this format using bed_to_juncs < junctions.bed > new_list.juncs where bed_to_juncs can be found under the same folder as tophat
              --no-novel-juncs Only look for reads across junctions indicated in the supplied GFF or junctions file. (ignored without -G/-j)

              Comment


              • #8
                TopHat with or without GTF file?

                I have been using TopHat without a GTF file and counted the mapped reads with htseq. The results looked good. Now I am wondering whether I should be using a GTF file? I will run again using GTF file to look at differences. Anyone else notice which is best?

                Comment


                • #9
                  I run tophat without annotation, percentage of alignment is not as good as other alignment, say Elandrna. of course, when we test the alignment coverage, we chose refseq annotation. Anyone knows what's going on?

                  Comment


                  • #10
                    a few more reads with annotion file

                    Hi,
                    I found a few more reads when designating an annotation file.
                    Lana

                    Comment


                    • #11
                      The advantage of using the annotation GTF is for the mapping of reads from low expressed transcripts that cross an exon-exon junction. In the de novo mode a junction is only defined if a read flanks on both sides by at least 8 bp. (you can modify this setting:

                      -a/--min-anchor-length <int> The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3 and the default is 8.

                      So when you provide the GTF many of the reads in low coverage regions now align even if only 4 bp exist in one exon as the GTF basically says this is a known junction, go ahead and align to it.

                      In all my comparisons, the alignment rate is slightly better when using the GTF. The rate of improvement depends on the number of reads as a low read count increases the number of undefined junctions due to the anchor-length setting. As the number of reads increase so does the chance of defining the same junction in de novo mode. Obviously, the read length used also makes a difference, so the longer your read length, the less the GTF annotation improved the percent aligned.

                      By default I always use the GTF option.

                      Comment


                      • #12
                        What comparison (tools files) are you using to determine better alignment rates? For example do you look at coverageBed output or FPKM, and confidence levels?

                        Can you discuss a bit on the tools and resources you use to compare your tophat/cufflink results with and with out using a GTF?

                        Thanks!

                        Cynthia

                        Comment


                        • #13
                          Cynthia,
                          I use HTSeq to count the numbers of reads per gene.



                          Then I see that I am getting more reads per some genes when I use the GTP option.

                          Lana

                          Comment


                          • #14
                            In my case just samtools flagstat or picard collect alignment summary metrics

                            Code:
                            samtools flagstat MyTophatBam.bam > MyMetrics.txt
                            
                            or
                            
                            java -Xmx2g -jar CollectAlignmentSummaryMetrics.jar INPUT=MyTophatBam.bam OUTPUT=MyMetrics.txt VALIDATION_STRINGENCY=SILENT REFERENCE_SEQUENCE=MyGenome.fa ASSUME_SORTED=true IS_BISULFITE_SEQUENCED=false

                            Comment


                            • #15
                              some confusion

                              Originally posted by Jon_Keats View Post
                              The advantage of using the annotation GTF is for the mapping of reads from low expressed transcripts that cross an exon-exon junction. In the de novo mode a junction is only defined if a read flanks on both sides by at least 8 bp. (you can modify this setting:

                              -a/--min-anchor-length <int> The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3 and the default is 8.

                              So when you provide the GTF many of the reads in low coverage regions now align even if only 4 bp exist in one exon as the GTF basically says this is a known junction, go ahead and align to it.

                              In all my comparisons, the alignment rate is slightly better when using the GTF. The rate of improvement depends on the number of reads as a low read count increases the number of undefined junctions due to the anchor-length setting. As the number of reads increase so does the chance of defining the same junction in de novo mode. Obviously, the read length used also makes a difference, so the longer your read length, the less the GTF annotation improved the percent aligned.

                              By default I always use the GTF option.
                              sorry i may confused "anchor-length " vs "segmet-length" ..
                              if i used "--segment-length" 25, means a reads cut into segment at least this length, but "--min-anchor-length " 8 sure smaller this length
                              is says actually my reads also can cut up into 8bp to supported a junction not must longer my segment-length?

                              thanks

                              song

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X