Unconfigured Ad

**simonandrews** · 09-15-2009, 07:08 AM

The gi lines you see are the fasta file headers from the NCBI human assembly. Each chromosome in that assembly comes in a separate file and it is the accession codes for those separate files that you are seeing.

The full header for the first accession you found is:

>gi|29823169|ref|NT_025004.13|Hs18_25160 Homo sapiens chromosome 18 genomic contig, reference assembly

You therefore need to find all the accessions for the different chromosomes and replace them with the corresponding chromosome name.

Alternatively you could edit the original fasta files and change the first lines to just contain a chromosome name, eg:

>chr18

..and then reindex the genome and run tophat again. This should put usable chromosome names into your output files.

**statsteam** · 09-15-2009, 07:20 AM

Thank you simon.
I just started bowtie-build with fasta files containing only chromosome names.

Statsteam

**melody** · 09-15-2009, 07:33 AM

as the output above
:A few lines of coverage.wig file are:

track type=bedGraph name="TopHat - read coverage"
gi|29823169|ref|NT_025004.13|Hs18_25160 0 9580 0
gi|29823169|ref|NT_025004.13|Hs18_25160 9580 9655 1
then 9580 has 1 or 0 hit??

**statsteam** · 09-15-2009, 08:36 AM

Originally posted by melody View Post

as the output above
:A few lines of coverage.wig file are:

track type=bedGraph name="TopHat - read coverage"
gi|29823169|ref|NT_025004.13|Hs18_25160 0 9580 0
gi|29823169|ref|NT_025004.13|Hs18_25160 9580 9655 1
then 9580 has 1 or 0 hit??

No, that is a data column because the output is in bedGraph format.
When you copy and paste with correct chromosome name, it will draw a bedGraph based on the value of the data column.

In this example, it will draw 0 for chr18:0-9580 then draw 1 for chr18:9580-9655.

-Statsteam

**sdriscoll** · 09-23-2009, 02:57 PM

just to add to this discussion, i found when using sequencing data from mice it worked best for all of my source references to come from UCSC. i used FASTA files for each chromosome downloaded from UCSC's downloads area to build my Bowtie index and I also used UCSC's table browser to produce the GTF file (which i converted to GFF3 using scripts from seq ontology). only when I had built everything from those sources did i have reliable output files that work straight away with the UCSC browser. in fact, when I used the NCBI reference (and swapped the chromosome names out with UCSC's names) the output from Tophat didn't even align with the genome.

**RockChalkJayhawk** · 03-23-2010, 07:40 AM

Originally posted by sdriscoll View Post

just to add to this discussion, i found when using sequencing data from mice it worked best for all of my source references to come from UCSC. i used FASTA files for each chromosome downloaded from UCSC's downloads area to build my Bowtie index and I also used UCSC's table browser to produce the GTF file (which i converted to GFF3 using scripts from seq ontology). only when I had built everything from those sources did i have reliable output files that work straight away with the UCSC browser. in fact, when I used the NCBI reference (and swapped the chromosome names out with UCSC's names) the output from Tophat didn't even align with the genome.

What is the config file needed to use the Seq ontology script? I can't find the documentation for it.

**NGS newbie** · 05-16-2011, 06:09 PM

Originally posted by sdriscoll View Post

just to add to this discussion, i found when using sequencing data from mice it worked best for all of my source references to come from UCSC. i used FASTA files for each chromosome downloaded from UCSC's downloads area to build my Bowtie index and I also used UCSC's table browser to produce the GTF file (which i converted to GFF3 using scripts from seq ontology). only when I had built everything from those sources did i have reliable output files that work straight away with the UCSC browser. in fact, when I used the NCBI reference (and swapped the chromosome names out with UCSC's names) the output from Tophat didn't even align with the genome.

I have that exact problem but is there a way to fix it if all I have is either the raw file or the bam. or bam.bai files? Do I need to ask my core personnel to realign using the UCSC files? Any help would be greatly appreciated..

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 45 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 105 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Using TopHat output files with UCSC genome browser

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News