Seqanswers Leaderboard Ad

**GenoMax** · 05-15-2015, 10:39 AM

Illumina's iGenomes site hosts bundles (sequence/index/annotation) for several genomes: http://support.illumina.com/sequenci...e/igenome.html

**rodrigo.duarte88** · 05-17-2015, 02:19 AM

But, for example, say I downloaded the hg19 version at the iGenomes page.. in the file I find the chr10.fa reference file, which I create the bowtie index (bt2 files).. then theres also a GTF file for the whole genome..
so is it correct if i run tophat like this:
tophat -p 8 -G hg19.gtf chr10 reads1.fastq,reads2.fastq

Everytime I put the GTF file it simply won't work.. I also tried assempling transcripts with cufflinks using a bam file generated on tophat without the GTF file (only the chr10 bowtie index reference)..

The thing is: do I need to manipulate these files before putting them for a run?

I am sorry! I am verry new to this and tried several times changing small things in these files but simply didn't work.

**rodrigo.duarte88** · 05-17-2015, 02:25 AM

And yeah, I am aware the first column of the gtf file needs to match the headers of my chr10.fa reference files (that I check with command "bowtie2-inspect --names chr10").. So i manually changed the headers of the fasta file, created new index files, tried to run and it also didn't work (which is expected since it lost all the genomic coordinates information haha).

I don't know how to do this, that's why I was asking for an example of files ready for a run.. hehe

Thanks a lot for your attention!

**GenoMax** · 05-17-2015, 04:15 AM

I am not sure why you are trying to recreate the index files since the bundle contains BowtieIndex and Bowtie2Index, which already have the index files.

GTF files define features in the genome and if you want to add additional ones as long as you stick to the correct format you should not need to mess with the reference or the index files.

If you make any changes to the reference itself (add/delete, even a base) then you will need to recreate the index files and edit GTF file as well since you will have effectively changed co-ordinates of the original genome reference.

Are you trying to look at a specific region of the chromosome? If so you can extract just the reads aligning in that region using samtools (check out the view command). Then use original (or modified) GTF file with this extracted region in the new BAM file.

**rodrigo.duarte88** · 05-17-2015, 04:38 AM

Originally posted by GenoMax View Post

Are you trying to look at a specific region of the chromosome? If so you can extract just the reads aligning in that region using samtools (check out the view command). Then use original (or modified) GTF file with this extracted region in the new BAM file.

Ah!! This is what I needed to know!! Yes, I am trying to align my reads to a specific part of the chromosome due to computational limitations..

Thanks so much for your help!

**GenoMax** · 05-17-2015, 05:56 AM

Originally posted by rodrigo.duarte88 View Post

Ah!! This is what I needed to know!! Yes, I am trying to align my reads to a specific part of the chromosome due to computational limitations..

There are two ways of doing this as I alluded to above.

1. Doing the alignment to the entire genome followed by the samtools view option to extract a particular region would be more straightforward (with a sorted and indexed bam file).

Code:

$ samtools view your.bam chr1:10000-20000

2. If you really want to work with just the region of interest then (using the iGenomes file bundle)

a. Get that sequence from the "genome.fa" (use the appropriate chromosome sequence from this file) using the bedtools "getfasta" option: http://bedtools.readthedocs.org/en/l.../getfasta.html
b. Create appropriate indexes (bowtie/bowtie2) using this file.
c. Select appropriate regions from the GTF file. You will have to adjust the coordinates appropriately (not sure how big a region you are looking at).

Accomplishing a and b is straightforward. c would be difficult.

3. You could use BioMart or UCSC table browser to extract the sequence and the annotations. Make the indexes using that sequence file.

Unless you are really constrained for computational power going with option 1 is straightforward.

**rodrigo.duarte88** · 05-17-2015, 06:09 AM

I will try running option 1 since it's more correct, as far as I've been reading.

But anyway, you've been very helpful! Thanks so much!!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

bowtie index file & matching GTF for tophat2, for specific human genome cytoband

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News