SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat2 on multiple samples, avoid building Bowtie index from genes.fa each time? LeonDK RNA Sequencing 5 08-11-2015 06:37 AM
Tophat2 with GFF3 annotation fails to produce Bowtie index. seeker Bioinformatics 11 05-19-2015 07:17 AM
TopHat2 failed to read known junctions from GTF file (from ensembl or UCSC) Alex234 RNA Sequencing 4 08-05-2013 08:31 AM
Tophat building Bowtie index from gtf file Aholton RNA Sequencing 5 08-31-2012 12:18 PM
bowtie index file jay2008 Bioinformatics 1 09-15-2011 01:42 AM

Reply
 
Thread Tools
Old 05-15-2015, 10:36 AM   #1
rodrigo.duarte88
Member
 
Location: London

Join Date: Jan 2015
Posts: 10
Default bowtie index file & matching GTF for tophat2, for specific human genome cytoband

Are there examples of bowtie2 ref file (used for indexing with bowtie2-build) and the gtf file that matches that, so I can use them as standard for my mapping?

I am having trouble in creating the bowtie index file that matches the GTF file for a specific cytoband of the human genome..

Also, if you guys have any comments on narrowing down the region of interest I'd appreciate.. (I am currently doing to test my files as it's my first time doing these analyses)
rodrigo.duarte88 is offline   Reply With Quote
Old 05-15-2015, 10:39 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Illumina's iGenomes site hosts bundles (sequence/index/annotation) for several genomes: http://support.illumina.com/sequenci...e/igenome.html
GenoMax is offline   Reply With Quote
Old 05-17-2015, 02:19 AM   #3
rodrigo.duarte88
Member
 
Location: London

Join Date: Jan 2015
Posts: 10
Default

But, for example, say I downloaded the hg19 version at the iGenomes page.. in the file I find the chr10.fa reference file, which I create the bowtie index (bt2 files).. then theres also a GTF file for the whole genome..
so is it correct if i run tophat like this:
tophat -p 8 -G hg19.gtf chr10 reads1.fastq,reads2.fastq

Everytime I put the GTF file it simply won't work.. I also tried assempling transcripts with cufflinks using a bam file generated on tophat without the GTF file (only the chr10 bowtie index reference)..

The thing is: do I need to manipulate these files before putting them for a run?

I am sorry! I am verry new to this and tried several times changing small things in these files but simply didn't work.
rodrigo.duarte88 is offline   Reply With Quote
Old 05-17-2015, 02:25 AM   #4
rodrigo.duarte88
Member
 
Location: London

Join Date: Jan 2015
Posts: 10
Default

And yeah, I am aware the first column of the gtf file needs to match the headers of my chr10.fa reference files (that I check with command "bowtie2-inspect --names chr10").. So i manually changed the headers of the fasta file, created new index files, tried to run and it also didn't work (which is expected since it lost all the genomic coordinates information haha).

I don't know how to do this, that's why I was asking for an example of files ready for a run.. hehe

Thanks a lot for your attention!
rodrigo.duarte88 is offline   Reply With Quote
Old 05-17-2015, 04:15 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

I am not sure why you are trying to recreate the index files since the bundle contains BowtieIndex and Bowtie2Index, which already have the index files.

GTF files define features in the genome and if you want to add additional ones as long as you stick to the correct format you should not need to mess with the reference or the index files.

If you make any changes to the reference itself (add/delete, even a base) then you will need to recreate the index files and edit GTF file as well since you will have effectively changed co-ordinates of the original genome reference.

Are you trying to look at a specific region of the chromosome? If so you can extract just the reads aligning in that region using samtools (check out the view command). Then use original (or modified) GTF file with this extracted region in the new BAM file.
GenoMax is offline   Reply With Quote
Old 05-17-2015, 04:38 AM   #6
rodrigo.duarte88
Member
 
Location: London

Join Date: Jan 2015
Posts: 10
Default

Quote:
Originally Posted by GenoMax View Post
Are you trying to look at a specific region of the chromosome? If so you can extract just the reads aligning in that region using samtools (check out the view command). Then use original (or modified) GTF file with this extracted region in the new BAM file.
Ah!! This is what I needed to know!! Yes, I am trying to align my reads to a specific part of the chromosome due to computational limitations..

Thanks so much for your help!
rodrigo.duarte88 is offline   Reply With Quote
Old 05-17-2015, 05:56 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Quote:
Originally Posted by rodrigo.duarte88 View Post
Ah!! This is what I needed to know!! Yes, I am trying to align my reads to a specific part of the chromosome due to computational limitations..
There are two ways of doing this as I alluded to above.

1. Doing the alignment to the entire genome followed by the samtools view option to extract a particular region would be more straightforward (with a sorted and indexed bam file).
Code:
$ samtools view your.bam chr1:10000-20000
2. If you really want to work with just the region of interest then (using the iGenomes file bundle)
a. Get that sequence from the "genome.fa" (use the appropriate chromosome sequence from this file) using the bedtools "getfasta" option: http://bedtools.readthedocs.org/en/l.../getfasta.html
b. Create appropriate indexes (bowtie/bowtie2) using this file.
c. Select appropriate regions from the GTF file. You will have to adjust the coordinates appropriately (not sure how big a region you are looking at).
Accomplishing a and b is straightforward. c would be difficult.
3. You could use BioMart or UCSC table browser to extract the sequence and the annotations. Make the indexes using that sequence file.

Unless you are really constrained for computational power going with option 1 is straightforward.

Last edited by GenoMax; 05-17-2015 at 06:01 AM.
GenoMax is offline   Reply With Quote
Old 05-17-2015, 06:09 AM   #8
rodrigo.duarte88
Member
 
Location: London

Join Date: Jan 2015
Posts: 10
Default

I will try running option 1 since it's more correct, as far as I've been reading.

But anyway, you've been very helpful! Thanks so much!!
rodrigo.duarte88 is offline   Reply With Quote
Reply

Tags
narrow region

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO