SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa perfect match only guil Bioinformatics 4 08-30-2013 10:18 AM
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead Literature Watch 2 03-04-2013 03:06 AM
Bowtie and Tophat in disagreement with alignment NM_010117 Bioinformatics 4 12-20-2010 01:21 PM
perfect alignment match NicoBxl Bioinformatics 3 10-07-2010 05:28 AM
Blat vs bowtie/tophat on Rnaseq data oliviera Bioinformatics 10 04-23-2010 11:07 AM

Reply
 
Thread Tools
Old 03-09-2009, 09:12 AM   #1
danielsbrewer
Member
 
Location: UK

Join Date: Feb 2009
Posts: 33
Default Perfect match disagreement between bowtie and BLAT on human genome

Hello,

I am currently testing a number of aligners with application to miRNA sequenceing and have come across a curious problem with bowtie. I run bowtie with the following options so should get all the perfect matches:
Code:
./bowtie -p 4 --solexa-quals --best -k 100  -t h_sapiens_asm ../GDB1.fastq GDB1.map
The index file is the human genome as supplied by the makers of bowtie.

For the sequence "TGGGAATACCGGGTGCTGTAGGCTTT" I get two hits one on chromosome 12 and the other on the X. When I blat this sequence I get 22 hits (chr1 * 16, 12*2, X*2, 17, 19).

Does anyone know why there is a difference?

I also applied the same dataset to novoalign,
Code:
./novoalign -rAll -f ../GDB1.fastq -d hsapiens > GDB1.map
, and get 23 perfect matches, with an extra chromosome 1 match.

I am very confused as to why there is so many differences and would welcome any help in this area.

Thanks
danielsbrewer is offline   Reply With Quote
Old 03-09-2009, 11:24 AM   #2
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Hi there,

I can't reproduce this. When I run "./bowtie -c --solexa-quals --best -k 100 h_sapiens_asm TGGGAATACCGGGTGCTGTAGGCTTT", I get 23 hits; presumably the same ones as novoalign:

sycamore:~/research/bowtie $ ./bowtie -c --solexa-quals --best -k 100 /fs/szasmg/langmead/ebwts/h_sapiens_asm TGGGAATACCGGGTGCTGTAGGCTTT
0 + gi|89161218|ref|NC_000023.9|NC_000023 68809142 TGGGAATACCGGGTGCTGTAGGCTTT IIIIIIIIIIIIIIIIIIIIIIIIII 1
0 + gi|89161190|ref|NC_000012.10|NC_000012 34249995 TGGGAATACCGGGTGCTGTAGGCTTT IIIIIIIIIIIIIIIIIIIIIIIIII 1
0 - gi|89161185|ref|NC_000001.9|NC_000001 226823814 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226826034 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226819358 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226837238 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226821599 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226814876 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226832757 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226812635 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226817117 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226828276 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226848407 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226839463 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226834998 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226841704 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226846176 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226843935 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161185|ref|NC_000001.9|NC_000001 226830515 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161190|ref|NC_000012.10|NC_000012 36841532 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|42406306|ref|NC_000019.8|NC_000019 21087771 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161218|ref|NC_000023.9|NC_000023 28910887 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
0 - gi|89161213|ref|NC_000007.12|NC_000007 139733047 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
Reported 23 alignments to 1 output stream(s)

Is there another example where Bowtie does not produce the expected output that I can try?

Ben
Ben Langmead is offline   Reply With Quote
Old 03-10-2009, 02:59 AM   #3
danielsbrewer
Member
 
Location: UK

Join Date: Feb 2009
Posts: 33
Default

I think I have worked out the problem. Novoalign outputs the source read sequence no matter what strand it is on whereas bowtie always takes the sequence on the same strand, no matter what strand the match was to. So I was just filtering on "TGGGAATACCGGGTGCTGTAGGCTTT" whereas the other hits that came on the reverse strand were reported under "AAAGCCTACAGCACCCGGTATTCCCA".

Sorry for the confusion.
danielsbrewer is offline   Reply With Quote
Old 03-10-2009, 10:10 PM   #4
BioWizard
Member
 
Location: Houston, TX

Join Date: Mar 2009
Posts: 27
Default

When I enter your sequence in ISAS I get 23 perfect matches. Below is a transcript of an interactive session.

========================================================
Enter next command, or type "?" (and ENTER) for list of commands.

limit=30
For each sequence, the search will stop if 30 hits are found.
Allocated buffer for 58.4 million sequences (0.0 sec.)

Enter next command, or type "?" (and ENTER) for list of commands.

sequence=TGGGAATACCGGGTGCTGTAGGCTTT

23 matches found in 9.0 micro seconds.

Match no. 1: Reverse Chr. 1 Positions 226812661..226812636, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 2: Reverse Chr. 1 Positions 226814902..226814877, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 3: Reverse Chr. 1 Positions 226817143..226817118, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 4: Reverse Chr. 1 Positions 226819384..226819359, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 5: Reverse Chr. 1 Positions 226821625..226821600, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 6: Reverse Chr. 1 Positions 226823840..226823815, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 7: Reverse Chr. 1 Positions 226826060..226826035, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 8: Reverse Chr. 1 Positions 226828302..226828277, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 9: Reverse Chr. 1 Positions 226830541..226830516, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 10: Reverse Chr. 1 Positions 226832783..226832758, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 11: Reverse Chr. 1 Positions 226835024..226834999, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 12: Reverse Chr. 1 Positions 226837264..226837239, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 13: Reverse Chr. 1 Positions 226839489..226839464, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 14: Reverse Chr. 1 Positions 226841730..226841705, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 15: Reverse Chr. 1 Positions 226843961..226843936, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 16: Reverse Chr. 1 Positions 226846202..226846177, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 17: Reverse Chr. 1 Positions 226848433..226848408, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 18: Reverse Chr. 7 Positions 139733073..139733048, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 19: Forward Chr. 12 Positions 34249996..34250021, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 20: Reverse Chr. 12 Positions 36841558..36841533, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 21: Reverse Chr. 19 Positions 21087797..21087772, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 22: Reverse Chr. 23 Positions 28910913..28910888, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions

Match no. 23: Forward Chr. 23 Positions 68809143..68809168, 0 Mismatches

TGGGAATACCGGGTGCTGTAGGCTTT
TGGGAATACCGGGTGCTGTAGGCTTT
0 substitutions
BioWizard is offline   Reply With Quote
Reply

Tags
blat, bowtie, disagreement, novoalign

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:27 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO