Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat building Bowtie index from gtf file

    Not really a problem, as much as an inquiry. When I run Tophat it tells me that it is building a Bowtie index from my .gtf file I've supplied. However, I already supplied a Bowtie2 index in the command.

    Was wondering if I am doing something wrong or if it needs both types of indexes?

  • #2
    Ok just checked and my CuffDiff output file gene_exp.diff is showing the first three columns as the same thing instead of showing me the geneid and gene so I'm assuming something is definitely wrong in my process.

    I got the 22 chromosome files (including x and y) from NCBI and the genes.gtf file from them as well. I built a Bowtie2 index with the 22 chr. files with x and y. I checked to make sure they had the same first column identifiers.

    I ran Tophat for the 2 samples (and their replicates) and then took their accepted_hits.bam and used CuffDiff to find the difference. I tried many different ways but can't get the gene_exp.diff file to have information in the first three columns.

    tldr; Ran Tophat then CuffDiff, gene_exp.diff file's first 3 columns are the same, not supposed to be like this,help

    Comment


    • #3
      Bump, still having the reoccuring problem.

      Comment


      • #4
        Originally posted by Aholton View Post
        Not really a problem, as much as an inquiry. When I run Tophat it tells me that it is building a Bowtie index from my .gtf file I've supplied. However, I already supplied a Bowtie2 index in the command.

        Was wondering if I am doing something wrong or if it needs both types of indexes?
        If you supply TopHat with a GTF annotation file it will first extract transcript sequences from your genome based on the annotations and build a Bowtie index of the transcript fasta file. I assume the index you supplied to TopHat is for the genome sequence. When using -G/--GTF TopHat will first attempt to align reads directly to transcripts then unaligned reads to the genome.

        If you are planning on aligning several data sets to the same genome/annotation it would be a waste of time for TopHat to rebuild the transcript index every time. For this reason TopHat also has the --transcriptome-index option which you supply the first time you run TopHat, along with the -G option to direct TopHat where to store the index it builds. In subsequent runs you can omit the -G option and use the --transcriptome-index parameter to direct TopHat to where it can locate the prebuilt transcript indexes. Check out the "Supplying your own transcript annotation data:" section in the TopHat Manual.

        Comment


        • #5
          Originally posted by Aholton View Post
          Ok just checked and my CuffDiff output file gene_exp.diff is showing the first three columns as the same thing instead of showing me the geneid and gene so I'm assuming something is definitely wrong in my process.

          I got the 22 chromosome files (including x and y) from NCBI and the genes.gtf file from them as well. I built a Bowtie2 index with the 22 chr. files with x and y. I checked to make sure they had the same first column identifiers.

          I ran Tophat for the 2 samples (and their replicates) and then took their accepted_hits.bam and used CuffDiff to find the difference. I tried many different ways but can't get the gene_exp.diff file to have information in the first three columns.

          tldr; Ran Tophat then CuffDiff, gene_exp.diff file's first 3 columns are the same, not supposed to be like this,help
          This is expected depending on the annotation supplied.

          First of all there is a slight error in the cuffdiff manual describing the format of the gene_exp.diff file; it says that there are 13 columns with the first three being Tested id, gene, locus. In the current output there are actually 14 columns with a 'gene_id' column added between test_id and gene. Here is the header and two lines from a recent output of mine (and realize that the headers don't line up directly over the corresponding data columns due to text formatting):

          Code:
          test_id	gene_id	gene	locus	sample_1	sample_2	status	value_1	value_2	log2(fold_change)	test_stat	p_value	q_value	significant
          AT1G01080	AT1G01080	AT1G01080	1:45295-47019	fae1_7-8	fae1_9-10	OK	23.3894	15.1937	-0.622382	1.24939	0.211521	0.706688	no
          AT1G01090	AT1G01090	PDH-E1 ALPHA	1:47484-49286	fae1_7-8	fae1_9-10	OK	609.513	569.592	-0.0977292	0.153759	0.8778	0.999999	no
          You can see that in the first line the test_id, gene_id and gene are all the same, whereas in the second there is a common name in the gene column. This is all a function of what information is present in your annotation (GTF) file which cufflinks/cuffdiff is able to parse out.

          Comment


          • #6
            Ok thank you so much! That makes much more sense now

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            33 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            72 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            80 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X