View Single Post
Old 08-17-2011, 11:51 PM   #1
AEB
Junior Member
 
Location: Denmark

Join Date: Aug 2011
Posts: 1
Default Bowtie. Different outputs from equivalent(?) inputs.

Hello all

I'm currently trying to align & annotate lots of short sequences to the human genome (from Ensembl) using Bowtie (and R).

When the query sequences are given on the command line (with -c) as a comma-separated list I cannot make get Bowtie to yield the same result when using a self-created FASTQ-file. The suspected error is what I choose as default (Phred) read qualities in the FASTQ-file. It is clear that if Bowtie is given sequences on the command line it must assume some default read qualities, but what is the default value? I cannot find the answer in the Bowtie manual but I suspect, that the answer is Phred quality 40 (corresponding to ASCII character "h"(?)) since this quality is used with other commands.

Using "h" as default read-quality, however, does not give exactly the same results? Where am I taking the wrong turn?

Minimal example: Running

bowtie -a --fullref Homo_sapiens.GRCh37.63.cdna.all TestFASTQ.fq test1.txt
bowtie -c -a --fullref Homo_sapiens.GRCh37.63.cdna.all AAATTGCTCTTAGCATA test2.txt

where the TestFASTQ.fq is simply

@Seq1
AAATTGCTCTTAGCATA
+
hhhhhhhhhhhhhhhhh

does not give the the same results.

The output from my R-script is (which filters and formats the bowtie output)

> genes1
[1] "ENSG00000135829" "ENSG00000135829" "ENSG00000135829" "ENSG00000151789"
[5] "ENSG00000127081" "ENSG00000122042" "ENSG00000162894" "ENSG00000187699"
[9] "ENSG00000187699" "ENSG00000231890" "ENSG00000182749" "ENSG00000233124"
[13] "ENSG00000228002" "ENSG00000101040" "ENSG00000101040" "ENSG00000112773"
[17] "ENSG00000112773" "ENSG00000112773"
> genes2
[1] "ENSG00000135829" "ENSG00000135829" "ENSG00000135829" "ENSG00000151789"
[5] "ENSG00000127081" "ENSG00000122042" "ENSG00000162894" "ENSG00000231890"
[9] "ENSG00000187699" "ENSG00000187699" "ENSG00000182749" "ENSG00000233124"
[13] "ENSG00000228002" "ENSG00000101040" "ENSG00000101040" "ENSG00000112773"
[17] "ENSG00000112773" "ENSG00000112773"

(EDIT: The two vectors above differs at positions 8 and 10)

Can anyone help me?

Thanks!
AEB

ps. does anyone know, how to make Bowtie return Gene Symbols. I.e. get DHX9 for ENSG00000135829 and so on.

Last edited by AEB; 08-18-2011 at 01:41 AM.
AEB is offline   Reply With Quote