Seqanswers Leaderboard Ad

**mrawlins** · 10-21-2010, 09:40 AM

Using blast is likely a terrible idea if you have tons of data. It just isn't fast enough. A wide array of software have been designed to do match your Illumina reads to the genome. My personal favorite (after attempting to use BFAST, BWA and BioScope) is Bowtie. The Bowtie/TopHat/Cufflinks pipeline is very popular for this sort of thing.

What you want to do first is map your reads to the genome (which comes in the fasta file). This fasta should include all the contigs for all chromosomes and/or plasmids. That way it will map all the intergenic reads properly. The tab file with annotations comes later for quantifying how many reads mapped to each gene. I wrote a little java program to do this, but I'm pretty sure TopHat/Cufflinks will do this for you. Merging the sequences and annotations before mapping will result in losing all information from intergenic sequences which can provide quite a bit of biologically relevant data, so I wouldn't recommend it.

If you're not comfortable programming I suggest TopHat/Cufflinks. There are quite a few discussions on using them already in this forum, as well as some pretty good tutorials on the website.

Good luck, and welcome to the community!

**arg** · 10-21-2010, 10:53 AM

Hey mrawlins,
thanks for your quick response. I will try Bowtie right away...I hope I can understand it

Originally posted by mrawlins View Post

Using blast is likely a terrible idea if you have tons of data. It just isn't fast enough. A wide array of software have been designed to do match your Illumina reads to the genome. My personal favorite (after attempting to use BFAST, BWA and BioScope) is Bowtie. The Bowtie/TopHat/Cufflinks pipeline is very popular for this sort of thing.

What you want to do first is map your reads to the genome (which comes in the fasta file). This fasta should include all the contigs for all chromosomes and/or plasmids. That way it will map all the intergenic reads properly. The tab file with annotations comes later for quantifying how many reads mapped to each gene. I wrote a little java program to do this, but I'm pretty sure TopHat/Cufflinks will do this for you. Merging the sequences and annotations before mapping will result in losing all information from intergenic sequences which can provide quite a bit of biologically relevant data, so I wouldn't recommend it.

If you're not comfortable programming I suggest TopHat/Cufflinks. There are quite a few discussions on using them already in this forum, as well as some pretty good tutorials on the website.

Good luck, and welcome to the community!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

merging a tab and a fasta file

Comment

Comment

Latest Articles

ad_right_rmr

News