SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa question on file outputs/inputs ikrier Bioinformatics 12 10-12-2012 08:22 AM
Which DNA Pol is equivalent to TruSeq kit? jlove Illumina/Solexa 4 04-29-2011 10:42 AM
which one indicate strand on bowtie outputs? tujchl Epigenetics 3 03-23-2011 05:20 PM
gsAssembly (Newbler) de novo behaviour, inputs and outputs nicolallias 454 Pyrosequencing 6 10-29-2010 12:16 AM
Bowtie: ?'s about h_sapiens_37_asm index/outputs cutcopy11 Bioinformatics 2 11-10-2009 06:56 PM

Reply
 
Thread Tools
Old 08-17-2011, 11:51 PM   #1
AEB
Junior Member
 
Location: Denmark

Join Date: Aug 2011
Posts: 1
Default Bowtie. Different outputs from equivalent(?) inputs.

Hello all

I'm currently trying to align & annotate lots of short sequences to the human genome (from Ensembl) using Bowtie (and R).

When the query sequences are given on the command line (with -c) as a comma-separated list I cannot make get Bowtie to yield the same result when using a self-created FASTQ-file. The suspected error is what I choose as default (Phred) read qualities in the FASTQ-file. It is clear that if Bowtie is given sequences on the command line it must assume some default read qualities, but what is the default value? I cannot find the answer in the Bowtie manual but I suspect, that the answer is Phred quality 40 (corresponding to ASCII character "h"(?)) since this quality is used with other commands.

Using "h" as default read-quality, however, does not give exactly the same results? Where am I taking the wrong turn?

Minimal example: Running

bowtie -a --fullref Homo_sapiens.GRCh37.63.cdna.all TestFASTQ.fq test1.txt
bowtie -c -a --fullref Homo_sapiens.GRCh37.63.cdna.all AAATTGCTCTTAGCATA test2.txt

where the TestFASTQ.fq is simply

@Seq1
AAATTGCTCTTAGCATA
+
hhhhhhhhhhhhhhhhh

does not give the the same results.

The output from my R-script is (which filters and formats the bowtie output)

> genes1
[1] "ENSG00000135829" "ENSG00000135829" "ENSG00000135829" "ENSG00000151789"
[5] "ENSG00000127081" "ENSG00000122042" "ENSG00000162894" "ENSG00000187699"
[9] "ENSG00000187699" "ENSG00000231890" "ENSG00000182749" "ENSG00000233124"
[13] "ENSG00000228002" "ENSG00000101040" "ENSG00000101040" "ENSG00000112773"
[17] "ENSG00000112773" "ENSG00000112773"
> genes2
[1] "ENSG00000135829" "ENSG00000135829" "ENSG00000135829" "ENSG00000151789"
[5] "ENSG00000127081" "ENSG00000122042" "ENSG00000162894" "ENSG00000231890"
[9] "ENSG00000187699" "ENSG00000187699" "ENSG00000182749" "ENSG00000233124"
[13] "ENSG00000228002" "ENSG00000101040" "ENSG00000101040" "ENSG00000112773"
[17] "ENSG00000112773" "ENSG00000112773"

(EDIT: The two vectors above differs at positions 8 and 10)

Can anyone help me?

Thanks!
AEB

ps. does anyone know, how to make Bowtie return Gene Symbols. I.e. get DHX9 for ENSG00000135829 and so on.

Last edited by AEB; 08-18-2011 at 01:41 AM.
AEB is offline   Reply With Quote
Old 08-18-2011, 12:34 AM   #2
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

You should use the org.Hs.eg.db bioconductor package to convert between human gene symbol and Ensembl IDs
zee is offline   Reply With Quote
Reply

Tags
bowtie

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:29 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO