SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Bowtie. Different outputs from equivalent(?) inputs. (http://seqanswers.com/forums/showthread.php?t=13527)

AEB 08-17-2011 11:51 PM

Bowtie. Different outputs from equivalent(?) inputs.
 
Hello all

I'm currently trying to align & annotate lots of short sequences to the human genome (from Ensembl) using Bowtie (and R).

When the query sequences are given on the command line (with -c) as a comma-separated list I cannot make get Bowtie to yield the same result when using a self-created FASTQ-file. The suspected error is what I choose as default (Phred) read qualities in the FASTQ-file. It is clear that if Bowtie is given sequences on the command line it must assume some default read qualities, but what is the default value? I cannot find the answer in the Bowtie manual but I suspect, that the answer is Phred quality 40 (corresponding to ASCII character "h"(?)) since this quality is used with other commands.

Using "h" as default read-quality, however, does not give exactly the same results? Where am I taking the wrong turn?

Minimal example: Running

bowtie -a --fullref Homo_sapiens.GRCh37.63.cdna.all TestFASTQ.fq test1.txt
bowtie -c -a --fullref Homo_sapiens.GRCh37.63.cdna.all AAATTGCTCTTAGCATA test2.txt

where the TestFASTQ.fq is simply

@Seq1
AAATTGCTCTTAGCATA
+
hhhhhhhhhhhhhhhhh

does not give the the same results.

The output from my R-script is (which filters and formats the bowtie output)

> genes1
[1] "ENSG00000135829" "ENSG00000135829" "ENSG00000135829" "ENSG00000151789"
[5] "ENSG00000127081" "ENSG00000122042" "ENSG00000162894" "ENSG00000187699"
[9] "ENSG00000187699" "ENSG00000231890" "ENSG00000182749" "ENSG00000233124"
[13] "ENSG00000228002" "ENSG00000101040" "ENSG00000101040" "ENSG00000112773"
[17] "ENSG00000112773" "ENSG00000112773"
> genes2
[1] "ENSG00000135829" "ENSG00000135829" "ENSG00000135829" "ENSG00000151789"
[5] "ENSG00000127081" "ENSG00000122042" "ENSG00000162894" "ENSG00000231890"
[9] "ENSG00000187699" "ENSG00000187699" "ENSG00000182749" "ENSG00000233124"
[13] "ENSG00000228002" "ENSG00000101040" "ENSG00000101040" "ENSG00000112773"
[17] "ENSG00000112773" "ENSG00000112773"

(EDIT: The two vectors above differs at positions 8 and 10)

Can anyone help me?

Thanks!
AEB

ps. does anyone know, how to make Bowtie return Gene Symbols. I.e. get DHX9 for ENSG00000135829 and so on.

zee 08-18-2011 12:34 AM

You should use the org.Hs.eg.db bioconductor package to convert between human gene symbol and Ensembl IDs


All times are GMT -8. The time now is 09:00 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.