SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   bowtie index file & matching GTF for tophat2, for specific human genome cytoband (http://seqanswers.com/forums/showthread.php?t=58600)

rodrigo.duarte88 05-15-2015 10:36 AM

bowtie index file & matching GTF for tophat2, for specific human genome cytoband
 
Are there examples of bowtie2 ref file (used for indexing with bowtie2-build) and the gtf file that matches that, so I can use them as standard for my mapping?

I am having trouble in creating the bowtie index file that matches the GTF file for a specific cytoband of the human genome..

Also, if you guys have any comments on narrowing down the region of interest I'd appreciate.. (I am currently doing to test my files as it's my first time doing these analyses)

GenoMax 05-15-2015 10:39 AM

Illumina's iGenomes site hosts bundles (sequence/index/annotation) for several genomes: http://support.illumina.com/sequenci...e/igenome.html

rodrigo.duarte88 05-17-2015 02:19 AM

But, for example, say I downloaded the hg19 version at the iGenomes page.. in the file I find the chr10.fa reference file, which I create the bowtie index (bt2 files).. then theres also a GTF file for the whole genome..
so is it correct if i run tophat like this:
tophat -p 8 -G hg19.gtf chr10 reads1.fastq,reads2.fastq

Everytime I put the GTF file it simply won't work.. I also tried assempling transcripts with cufflinks using a bam file generated on tophat without the GTF file (only the chr10 bowtie index reference)..

The thing is: do I need to manipulate these files before putting them for a run?

I am sorry! I am verry new to this and tried several times changing small things in these files but simply didn't work.

rodrigo.duarte88 05-17-2015 02:25 AM

And yeah, I am aware the first column of the gtf file needs to match the headers of my chr10.fa reference files (that I check with command "bowtie2-inspect --names chr10").. So i manually changed the headers of the fasta file, created new index files, tried to run and it also didn't work (which is expected since it lost all the genomic coordinates information haha).

I don't know how to do this, that's why I was asking for an example of files ready for a run.. hehe

Thanks a lot for your attention!

GenoMax 05-17-2015 04:15 AM

I am not sure why you are trying to recreate the index files since the bundle contains BowtieIndex and Bowtie2Index, which already have the index files.

GTF files define features in the genome and if you want to add additional ones as long as you stick to the correct format you should not need to mess with the reference or the index files.

If you make any changes to the reference itself (add/delete, even a base) then you will need to recreate the index files and edit GTF file as well since you will have effectively changed co-ordinates of the original genome reference.

Are you trying to look at a specific region of the chromosome? If so you can extract just the reads aligning in that region using samtools (check out the view command). Then use original (or modified) GTF file with this extracted region in the new BAM file.

rodrigo.duarte88 05-17-2015 04:38 AM

Quote:

Originally Posted by GenoMax (Post 172670)
Are you trying to look at a specific region of the chromosome? If so you can extract just the reads aligning in that region using samtools (check out the view command). Then use original (or modified) GTF file with this extracted region in the new BAM file.

Ah!! This is what I needed to know!! Yes, I am trying to align my reads to a specific part of the chromosome due to computational limitations..

Thanks so much for your help!

GenoMax 05-17-2015 05:56 AM

Quote:

Originally Posted by rodrigo.duarte88 (Post 172671)
Ah!! This is what I needed to know!! Yes, I am trying to align my reads to a specific part of the chromosome due to computational limitations..

There are two ways of doing this as I alluded to above.

1. Doing the alignment to the entire genome followed by the samtools view option to extract a particular region would be more straightforward (with a sorted and indexed bam file).
Code:

$ samtools view your.bam chr1:10000-20000
2. If you really want to work with just the region of interest then (using the iGenomes file bundle)
a. Get that sequence from the "genome.fa" (use the appropriate chromosome sequence from this file) using the bedtools "getfasta" option: http://bedtools.readthedocs.org/en/l.../getfasta.html
b. Create appropriate indexes (bowtie/bowtie2) using this file.
c. Select appropriate regions from the GTF file. You will have to adjust the coordinates appropriately (not sure how big a region you are looking at).
Accomplishing a and b is straightforward. c would be difficult.
3. You could use BioMart or UCSC table browser to extract the sequence and the annotations. Make the indexes using that sequence file.

Unless you are really constrained for computational power going with option 1 is straightforward.

rodrigo.duarte88 05-17-2015 06:09 AM

I will try running option 1 since it's more correct, as far as I've been reading.

But anyway, you've been very helpful! Thanks so much!!


All times are GMT -8. The time now is 11:54 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.