SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead Literature Watch 2 03-04-2013 03:06 AM
The best short read aligner Deutsche Bioinformatics 4 04-14-2011 08:12 PM
Short Read Micro re-Aligner Paper nilshomer Literature Watch 0 10-29-2010 10:59 AM
New Short Read Aligner sparks Bioinformatics 48 08-26-2009 09:01 AM
Very Short Read aligner Rupinder Bioinformatics 1 06-02-2009 08:10 PM

Reply
 
Thread Tools
Old 11-05-2009, 07:17 AM   #261
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 199
Default

Quote:
Originally Posted by liu3zhen View Post
Sorry for the confusing question.
If I specify -k 4 --best, multiple alignments will be reported even one best hit is found. I'm wondering is it possible only the best hit (only 1) is report for this case, rather than 4 alignments reported in best-to-worst order.
Please take a look at the documentation for the --strata option in the manual. If that doesn't do what you'd like, please post a small example where Bowtie won't report what you'd like.

Ben
Ben Langmead is offline   Reply With Quote
Old 11-05-2009, 07:20 AM   #262
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 316
Default

Quote:
Originally Posted by liu3zhen View Post
Sorry for the confusing question.
If I specify -k 4 --best, multiple alignments will be reported even one best hit is found. I'm wondering is it possible only the best hit (only 1) is report for this case, rather than 4 alignments reported in best-to-worst order.
Maybe you can specify the option "-m 1". Then all the multiple reads will not be reported.
Xi Wang is offline   Reply With Quote
Old 11-08-2009, 11:42 PM   #263
ramouz87
Member
 
Location: Doha, Qatar

Join Date: Oct 2009
Posts: 34
Default

Quote:
Originally Posted by Ben Langmead View Post
Hi Ramzi,

The options you're looking for are almost certainly -I/-X and --ff/--fr/--rf. You need to have a reasonably good idea of the expected insert size and specify an appropriate range with -I/-X. You should also confirm that your paired-end protocol produces pairs in the fw/rev orientation. This is the typical configuration for Illumina. If your paired-end data has a different orientation, change it with --ff or --rf.

Hope that helps,
Ben
Hi Ben,
thanks for your reply, Indeed setting the -X parameter to 5000 and the orientation to (--rf [although it's the standard illumina paired end protocol]) and clipping 24 pair on the reverse sequence lead to 52.35% :
#bowtie -t -p 16 -X 5000 --rf ./ref/h_sapiens_37_asm -1 ./fastq/s_8_1_sequence.fq -2 ./fastq/s_8_2_sequence_50b.fq ./align/pronest_5000_rf_2_50b.bowtie.align --un ./unalign/pronest_5000_rf_2_50b.unalign.fq
# reads with at least one reported alignment: 3486518 (52.35%)
# reads that failed to align: 3173993 (47.65%)
Do you think alignment could be improved further ?
thanks again for help and congratulation for the good work.
Regards,
Ramzi
ramouz87 is offline   Reply With Quote
Old 11-11-2009, 05:07 PM   #264
amaer
Member
 
Location: San Diego

Join Date: Oct 2009
Posts: 15
Default Gzipped files?

Hi Ben, Thanks for the update. Do you have an estimate about when Bowtie would be able to handle gzipped files for input and output?

Thanks,
Andreia

Quote:
Originally Posted by Ben Langmead View Post
Hi amaer,

Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules . But by end-of-year is a reasonable guess.

Thanks,
Ben
amaer is offline   Reply With Quote
Old 11-11-2009, 05:44 PM   #265
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 199
Default

Hi Andreia,

Quote:
Originally Posted by amaer View Post
Hi Ben, Thanks for the update. Do you have an estimate about when Bowtie would be able to handle gzipped files for input and output?

Thanks,
Andreia
This is the first time I've heard that requested. Of course, it's easy to pipe or 'tee' bowtie's output directly to gzip; e.g.:

Quote:
$ ./bowtie -u 10 e_coli reads/e_coli_1000.fq | gzip -c > alns.gz
# reads processed: 10
# reads with at least one reported alignment: 7 (70.00%)
# reads that failed to align: 3 (30.00%)
Reported 7 alignments to 1 output stream(s)
$ gzip -dc alns.gz
r0 - gi|110640213|ref|NC_008253.1| 3658049 ATGCTGGAATGGCGATAGTTGGGTGGGTATCGTTC 45567778999:9;;<===>?@@@@AAAABCCCDE 0 32:T>G,34:G>A
r1 - gi|110640213|ref|NC_008253.1| 1902085 CGGATGATTTTTATCCCATGAGACATCCAGTTCGG 45567778999:9;;<===>?@@@@AAAABCCCDE 0
r2 - gi|110640213|ref|NC_008253.1| 3989609 CATAAAGCAACAGTGTTATACTATAACAATTTTGA 45567778999:9;;<===>?@@@@AAAABCCCDE 0
r5 + gi|110640213|ref|NC_008253.1| 4249841 CAGCATAAGTGGATATTCAAAGTTTTGCTGTTTTA EDCCCBAAAA@@@@?>===<;;9:99987776554 0
r7 + gi|110640213|ref|NC_008253.1| 4086913 GCATATTGCCAATTTTCGCTTCGGGGATCAGGCTA EDCCCBAAAA@@@@?>===<;;9:99987776554 0
r8 + gi|110640213|ref|NC_008253.1| 2679194 GGTTCAGTTCAGTATACGCCTTATCCGGCCTACGG EDCCCBAAAA@@@@?>===<;;9:99987776554 0 14:A>T,33:C>G
r9 - gi|110640213|ref|NC_008253.1| 2430559 GCCTGTTCGGCGTTGAGGGTAATGAAATCATCGCC 45567778999:9;;<===>?@@@@AAAABCCCDE 0
For unpaired input, it's also possible to pipe 'gzip -dc's output directly to bowtie and specify '-' (meaning "standard in") as the input file. E.g.:

Quote:
$ head -40 reads/e_coli_1000.fq | gzip -c > tmpreads.gz
$ gzip -dc tmpreads.gz | ./bowtie e_coli -
r0 - gi|110640213|ref|NC_008253.1| 3658049 ATGCTGGAATGGCGATAGTTGGGTGGGTATCGTTC 45567778999:9;;<===>?@@@@AAAABCCCDE 0 32:T>G,34:G>A
r1 - gi|110640213|ref|NC_008253.1| 1902085 CGGATGATTTTTATCCCATGAGACATCCAGTTCGG 45567778999:9;;<===>?@@@@AAAABCCCDE 0
r2 - gi|110640213|ref|NC_008253.1| 3989609 CATAAAGCAACAGTGTTATACTATAACAATTTTGA 45567778999:9;;<===>?@@@@AAAABCCCDE 0
r5 + gi|110640213|ref|NC_008253.1| 4249841 CAGCATAAGTGGATATTCAAAGTTTTGCTGTTTTA EDCCCBAAAA@@@@?>===<;;9:99987776554 0
r7 + gi|110640213|ref|NC_008253.1| 4086913 GCATATTGCCAATTTTCGCTTCGGGGATCAGGCTA EDCCCBAAAA@@@@?>===<;;9:99987776554 0
r8 + gi|110640213|ref|NC_008253.1| 2679194 GGTTCAGTTCAGTATACGCCTTATCCGGCCTACGG EDCCCBAAAA@@@@?>===<;;9:99987776554 0 14:A>T,33:C>G
r9 - gi|110640213|ref|NC_008253.1| 2430559 GCCTGTTCGGCGTTGAGGGTAATGAAATCATCGCC 45567778999:9;;<===>?@@@@AAAABCCCDE 0
# reads processed: 10
# reads with at least one reported alignment: 7 (70.00%)
# reads that failed to align: 3 (30.00%)
Reported 7 alignments to 1 output stream(s)
But paired-end reads, unfortunately, generally come as two separate files. Bowtie supports a simple alternative format for paired-end reads (1 pair per line, tab-delimited fields are read name, mate 1 sequence, mate 1 qualities, mate 2 sequence, mate 2 qualities) which is enabled with the --12 option. If the reads are in that format and you use the --12 option, you can again pipe the output of 'gzip -dc' and use '-' as the input file. I have a script (that I'll include in the next Bowtie release) that converts a pair of (perhaps gzipped) FASTQ files into that format. That script's output can be conveniently piped to Bowtie e.g.:

Quote:
$ gzip -c reads/e_coli_1000_1.fq > tmp1.fq.gz
$ gzip -c reads/e_coli_1000_2.fq > tmp2.fq.gz
$ perl scripts/fastq_to_tabbed.pl -1 tmp1.fq.gz -2 tmp2.fq.gz | ./bowtie -u 10 --12 - e_coli
r0/2 + gi|110640213|ref|NC_008253.1| 2963701 GAATACTGGCGGATTACCGGGGAAGCTGGAGC EDCCCBAAAA@@@@?>===<;;9:99987776 0
r0/1 - gi|110640213|ref|NC_008253.1| 2963907 CTGACCGGCAGGAGTATGAAGGATGCGGAAGAATA 45567778999:9;;<===>?@@@@AAAABCCCDE 0
r1/2 + gi|110640213|ref|NC_008253.1| 4118712 AATGTGAAAACGCCATCGATGGAACAGGCAAT EDCCCBAAAA@@@@?>===<;;9:99987776 0
r1/1 - gi|110640213|ref|NC_008253.1| 4118838 GGCGTAGATGTCGGCGCGAAAAAAGAGATCTATCA 45567778999:9;;<===>?@@@@AAAABCCCDE 0
r2/2 + gi|110640213|ref|NC_008253.1| 4158419 AACGCGCGTTATCGTGCCGGTCCATTACGCGG EDCCCBAAAA@@@@?>===<;;9:99987776 0
r2/1 - gi|110640213|ref|NC_008253.1| 4158521 CGCTCAGGGCGTGATGTCCACTTACAAAGGGCGTG 45567778999:9;;<===>?@@@@AAAABCCCDE 0
r3/1 + gi|110640213|ref|NC_008253.1| 2116984 CGATGCAGATGCGTACCACCTGGACCAGGCCTTTC EDCCCBAAAA@@@@?>===<;;9:99987776554 2
r3/2 - gi|110640213|ref|NC_008253.1| 2117146 CGTTTATCTGGCTGTTTATCCGACGCCCGAAA 67778999:9;;<===>?@@@@AAAABCCCDE 2
r4/2 + gi|110640213|ref|NC_008253.1| 1796722 GCACAACATCGGGAGGGTAAGATTTGTGACGA EDCCCBAAAA@@@@?>===<;;9:99987776 0
r4/1 - gi|110640213|ref|NC_008253.1| 1796883 CACTTCAGCCGCCACTTCTGGCACCAGCAAAGTCA 45567778999:9;;<===>?@@@@AAAABCCCDE 0
r5/2 + gi|110640213|ref|NC_008253.1| 136755 ATGGCCGCACTTGTAGAGCTGCTGAAAAACCC EDCCCBAAAA@@@@?>===<;;9:99987776 0
r5/1 - gi|110640213|ref|NC_008253.1| 136879 TGGCTGCTGTCGCCAAAGGCGAAGCTAAATCCCCG 45567778999:9;;<===>?@@@@AAAABCCCDE 0
r6/2 + gi|110640213|ref|NC_008253.1| 2717264 CCCATTTCCGCGTCTCAAATAATTTGCTGGAG EDCCCBAAAA@@@@?>===<;;9:99987776 0
r6/1 - gi|110640213|ref|NC_008253.1| 2717443 AACAGATAAAAAGCCTGGGTCAGCGCCGTATACGC 45567778999:9;;<===>?@@@@AAAABCCCDE 0
r7/1 + gi|110640213|ref|NC_008253.1| 1073149 GTATCGCTGTTTTCCAGTTGTTCAAGATAAGAAAA EDCCCBAAAA@@@@?>===<;;9:99987776554 0
r7/2 - gi|110640213|ref|NC_008253.1| 1073350 TGATGTTGCAACTTTGTGCAACCGTGTTAAAT 67778999:9;;<===>?@@@@AAAABCCCDE 0
r8/1 + gi|110640213|ref|NC_008253.1| 177630 CCACGGTTGATGCTGGCATCGCTGATTGGTGCGTT EDCCCBAAAA@@@@?>===<;;9:99987776554 0
r8/2 - gi|110640213|ref|NC_008253.1| 177789 AGCATTAGCGCGCCGGATATGAAGGTGAACGA 67778999:9;;<===>?@@@@AAAABCCCDE 0
r9/1 + gi|110640213|ref|NC_008253.1| 1642197 GCGCCTTAATGCGCTACTCCACCAGCAATTACGTA EDCCCBAAAA@@@@?>===<;;9:99987776554 0
r9/2 - gi|110640213|ref|NC_008253.1| 1642268 TCAACTGGCCCTCACCGCCAGATGCCACGCGA 67778999:9;;<===>?@@@@AAAABCCCDE 0
# reads processed: 10
# reads with at least one reported alignment: 10 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 10 paired-end alignments to 1 output stream(s)
To me, delegating zipping and unzipping to external programs is vastly preferable to building it into Bowtie, since it enables a multitude of compression formats without unnecessarily complicating Bowtie or detracting from its portability. So I'm unlikely to build that into Bowtie unless I hear many more people ask for it.

Let me know if the suggestions don't work or if you'd like be to send you the paired-end script.

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Old 11-12-2009, 02:37 PM   #266
amaer
Member
 
Location: San Diego

Join Date: Oct 2009
Posts: 15
Default eliminating redundancy from the output

Quote:
Originally Posted by Ben Langmead View Post
Hi Andreia,

This is the first time I've heard that requested. Of course, it's easy to pipe or 'tee' bowtie's output directly to gzip; (...) To me, delegating zipping and unzipping to external programs is vastly preferable to building it into Bowtie, since it enables a multitude of compression formats without unnecessarily complicating Bowtie or detracting from its portability. So I'm unlikely to build that into Bowtie unless I hear many more people ask for it.

Let me know if the suggestions don't work or if you'd like be to send you the paired-end script.

Thanks,
Ben
Thanks, Ben, I would have liked to have the option of reading a zipped/gzipped file since that's how we receive the data from Solexa sequencing provider. Besides, the FASTQ files are quite big, and I'd prefer to leave them uncompressed, rather than decompress them for analysis and then recompress them. But I can certainly understand your point.

In the mean time, I have a few more questions/suggestions? about the output. I had 20 million Solexa 36mer reads run against a division of Genbank (lots of redundancy obviously) and got a huge output file - over 100 GB. It took over 3 hrs on 8 CPUs (64 bit) (I ran it with the options -a --best --strata -n 2).

1. Could we have the option that in the outfile we don't print the read sequence and/or quality score? or other ways to reduce the size of the output file, while not being quite "concise" style?
2. In the 'concise' mode, could we print the NAME (e.g. gi/accession up to the 1st whitespace) of the reference sequence, instead of the index?

Thanks,
Andreia
amaer is offline   Reply With Quote
Old 11-16-2009, 02:17 AM   #267
ramouz87
Member
 
Location: Doha, Qatar

Join Date: Oct 2009
Posts: 34
Default @RG Header

Hi Ben,
Is there a way to have @RG header when generating the sam output file with bowtie ?
Thanks in advance.
ramouz87 is offline   Reply With Quote
Old 11-16-2009, 07:13 AM   #268
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 199
Default

Quote:
Originally Posted by amaer View Post
I would have liked to have the option of reading a zipped/gzipped file since that's how we receive the data from Solexa sequencing provider. Besides, the FASTQ files are quite big, and I'd prefer to leave them uncompressed, rather than decompress them for analysis and then recompress them. But I can certainly understand your point.
To be clear, I am suggesting a solution to your problem; just not one that builds gzip into Bowtie, which is my preference.

Quote:
Originally Posted by amaer View Post
In the mean time, I have a few more questions/suggestions? about the output. I had 20 million Solexa 36mer reads run against a division of Genbank (lots of redundancy obviously) and got a huge output file - over 100 GB. It took over 3 hrs on 8 CPUs (64 bit) (I ran it with the options -a --best --strata -n 2).

1. Could we have the option that in the outfile we don't print the read sequence and/or quality score? or other ways to reduce the size of the output file, while not being quite "concise" style?
2. In the 'concise' mode, could we print the NAME (e.g. gi/accession up to the 1st whitespace) of the reference sequence, instead of the index?
If the size of the output is a problem and not all of the information output by Bowtie is needed, I would strongly suggest a strategy of postprocessing bowtie's default output, perhaps by piping its output to a tool like awk or by wrapping the bowtie invocation it in a perl script or something similar. That's far more flexible than relying solely on Bowtie's output mode. Here's a simple example where read sequences and qualities are suppressed but the output is otherwise identical:

Quote:
$ ./bowtie -u 10 e_coli reads/e_coli_1000.fq | awk -v OFS="\t" '{print $1,$2,$3,$4,$7,$8}'
# reads processed: 10
# reads with at least one reported alignment: 7 (70.00%)
# reads that failed to align: 3 (30.00%)
Reported 7 alignments to 1 output stream(s)
r0 - gi|110640213|ref|NC_008253.1| 3658049 0 32:T>G,34:G>A
r1 - gi|110640213|ref|NC_008253.1| 1902085 0
r2 - gi|110640213|ref|NC_008253.1| 3989609 0
r5 + gi|110640213|ref|NC_008253.1| 4249841 0
r7 + gi|110640213|ref|NC_008253.1| 4086913 0
r8 + gi|110640213|ref|NC_008253.1| 2679194 0 14:A>T,33:C>G
r9 - gi|110640213|ref|NC_008253.1| 2430559 0
It's likely that I'll deprecate --concise mode in a future release, so writing a script or a wrapper to postprocess Bowtie's output is highly recommended.

Hope that helps,
Ben
Ben Langmead is offline   Reply With Quote
Old 11-16-2009, 07:35 AM   #269
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 199
Default

Quote:
Originally Posted by ramouz87 View Post
Is there a way to have @RG header when generating the sam output file with bowtie ?
Hi ramouz,

I guess I'm not too familiar with how other tools set these fields. Are they typically set by the user? I.e., should I add a command-line option where the user can specify values for ID, SM, etc.?

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Old 11-16-2009, 10:54 PM   #270
amaer
Member
 
Location: San Diego

Join Date: Oct 2009
Posts: 15
Default concise mode and speed

Quote:
Originally Posted by Ben Langmead View Post
(...) piping its output to a tool like awk or by wrapping the bowtie invocation it in a perl script or something similar. That's far more flexible than relying solely on Bowtie's output mode. Here's a simple example where read sequences and qualities are suppressed but the output is otherwise identical:
(...)
It's likely that I'll deprecate --concise mode in a future release, so writing a script or a wrapper to postprocess Bowtie's output is highly recommended.

Hope that helps,
Ben
Thanks a lot, Ben, that's very helpful. I have noticed that in my case running in the --concise mode is MUCH faster than the full mode (more than twice as fast). I don't mind using indexes instead of real IDs that much, but you do 'lose' the mismatch information. a concise mode with the mismatch information added would be nice.

I also noticed that if I have for example a target of 1 million Genbank sequences of 20Kb each, if I concatenate them in a single Fasta sequence before building the index it speeds up the run by about 15-20%.
amaer is offline   Reply With Quote
Old 11-16-2009, 11:06 PM   #271
pythonlovesbowtie
Junior Member
 
Location: Xishuangbanna

Join Date: Nov 2009
Posts: 5
Default How to limite the reported read length?

Hello everybody. Not sure how to address my question. I mean, bowtie will skip reads that are less than 4 characters. How can I make it skip reads less than 5 or 6 or more? Can I set that?

Thanks very much!
pythonlovesbowtie is offline   Reply With Quote
Old 11-17-2009, 12:08 AM   #272
ramouz87
Member
 
Location: Doha, Qatar

Join Date: Oct 2009
Posts: 34
Default

Quote:
Originally Posted by Ben Langmead View Post
Hi ramouz,

I guess I'm not too familiar with how other tools set these fields. Are they typically set by the user? I.e., should I add a command-line option where the user can specify values for ID, SM, etc.?

Thanks,
Ben
Hi Ben
I'm trying to run some structural variation programs such BreakDancer, Pindel, ... and seems that problem is due to missing info in the header. I guess that there's no other choice than specifing the header manually so would be a good idea to add that option.
I guess effort should focus on a standard format to avoid compatibility problem.
Thanks Ben.
ramouz87 is offline   Reply With Quote
Old 11-24-2009, 02:17 PM   #273
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 481
Default bowtie missed read mapped by blat

I am confused about this read being missed by bowtie. Did I miss something here

Here is the blat result on reference sequence
HTML Code:
BLASTN 2.2.11 [blat]

Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool

Query= read
         (75 letters)

Database: Zv7_scaffolds.fa 
           7494 sequences; 1,440,319,812 total letters

Searching.done
                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

Zv7_scaffold910                                                       132   5e-31
Zv7_scaffold910                                                       128   9e-30
...

>Zv7_scaffold910 
          Length = 7751464

 Score = 132 bits (341), Expect = 5e-31
 Identities = 72/75 (96%)
 Strand = Minus / Plus

Query: 75      agtctgcttttccatataaaactgagaagaagagactgcagccttgaacaaacttgggaa 16
               ||||||||||||||||||||||||||||||||||||||||||||||| |||||||| |||
Sbjct: 5660145 agtctgcttttccatataaaactgagaagaagagactgcagccttgatcaaacttgcgaa 5660204

Query: 15      gtcttaacttacacg 1
               |||| ||||||||||
Sbjct: 5660205 gtctgaacttacacg 5660219


While this is what bowtie reports

HTML Code:
$ /home/m049157/build/bowtie-0.10.0/bowtie --best -n 3 -p 4 -t zv7scaffold -c CGTGTAAGTTAAGACT
TCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT
Time loading forward index: 00:00:16
Time loading mirror index: 00:00:16
Seeded quality full-index search: 00:00:01
No results
Time searching: 00:00:33
Overall time: 00:00:33
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 11-30-2009, 10:21 AM   #274
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 481
Default

<bump>

any comments?
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 11-30-2009, 10:56 AM   #275
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 199
Default

Quote:
Originally Posted by bioinfosm View Post
<bump>

any comments?
Looks like the -e ceiling is exceeded. When input is provided via -c, qualities are assumed to be 'I' (which, rounded, becomes Phred 30). So try -e 90 or higher.

Ben
Ben Langmead is offline   Reply With Quote
Old 11-30-2009, 12:50 PM   #276
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 481
Default not the optimum mapping of a read

Thanks Ben, setting it that way worked.

But why don't I get the optimum alignment, that is seen using blat! Even using the -a option gives sub-optimal hits..

Here is the real read with Q values, that ended up unmapped
@HWI-E4:1:87:1633:1127#0/1
CGTGTAAGTTAAGACTTCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT
+
B427?@?=@@>07?@ABBBB@?<>AB=4@B?<+/B42@82;A?A<4.,@:6<)9:<8(0998(-/;A=3%%%%%%

HTML Code:
$  /home/m049157/build/bowtie-0.10.0/bowtie --best 
-p 4 -t -e 90 -n 3 zv7scaffold w ww
Time loading forward index: 00:00:01
Time loading mirror index: 00:00:01
Seeded quality full-index search: 00:00:00
Reported 1 alignments to 1 output stream(s)
Time searching: 00:00:02
Overall time: 00:00:02

$ cat ww
HWI-E4:1:87:1633:1127#0/1       +       Zv7_scaffold1183        139472  CGTGTAAGTTAAGACTTCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT     B427?@?=@@>07?@ABBBB@?<>AB=4@B?<+/B42@82;A?A<4.,@:6<)9:<8(0998(-/;A=3%%%%%%     0      18:G>C,27:A>T,57:G>T,68:C>G,69:A>C,70:G>A,71:A>G,72:C>A,74:G>T


-- removing -n 3
$  /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -e 90 zv7scaffold w ww
$ cat ww
HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold724 463363  AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B     0       18:C>G,27:T>A,57:C>A,68:G>C,69:T>G,70:A>T,71:T>C,72:G>T,74:C>A


-- using -a to report all aln
$  /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -e 90 -a zv7scaffold w ww
$ cat ww
HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold724 463363  AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B     0       18:C>G,27:T>A,57:C>A,68:G>C,69:T>G,70:A>T,71:T>C,72:G>T,74:C>A
HWI-E4:1:87:1633:1127#0/1       +       Zv7_scaffold1183        139472  CGTGTAAGTTAAGACTTCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT     B427?@?=@@>07?@ABBBB@?<>AB=4@B?<+/B42@82;A?A<4.,@:6<)9:<8(0998(-/;A=3%%%%%%     0      18:G>C,27:A>T,57:G>T,68:C>G,69:A>C,70:G>A,71:A>G,72:C>A,74:G>T
HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold2650        169222  AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B     0      18:C>G,27:T>A,57:C>A,68:G>C,69:T>G,70:C>T,71:T>C,72:G>T,73:A>G,74:T>A
Thanks,
__________________
--
bioinfosm

Last edited by bioinfosm; 11-30-2009 at 01:14 PM.
bioinfosm is offline   Reply With Quote
Old 11-30-2009, 02:46 PM   #277
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 481
Default

When I limited my reference sequence to the blat hit region, I got the hit with 3 mis-matches, however, not before I increased the -e option to -e 80. Why would I not get this hit previously, when I used -a -e 90 to report all hits?

And why do I have to do -n 3, when the seed length by default is 28, and there are no more than 2 mis-matches in 28bp?

HTML Code:
$ /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -n 3 -e 80 -a www w ww
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Seeded quality full-index search: 00:00:00
Reported 1 alignments to 1 output stream(s)
Time searching: 00:00:00
Overall time: 00:00:00
$ cat ww
HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold910 5660144 AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B       0       10:G>T,18:C>G,27:T>A
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 12-01-2009, 05:47 AM   #278
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 199
Default

Quote:
Originally Posted by bioinfosm View Post
When I limited my reference sequence to the blat hit region, I got the hit with 3 mis-matches, however, not before I increased the -e option to -e 80. Why would I not get this hit previously, when I used -a -e 90 to report all hits?

And why do I have to do -n 3, when the seed length by default is 28, and there are no more than 2 mis-matches in 28bp?

HTML Code:
$ /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -n 3 -e 80 -a www w ww
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Seeded quality full-index search: 00:00:00
Reported 1 alignments to 1 output stream(s)
Time searching: 00:00:00
Overall time: 00:00:00
$ cat ww
HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold910 5660144 AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B       0       10:G>T,18:C>G,27:T>A
Hi bioinfosm,

Try using the --maxbts or -y options to increase the amount of searching effort put in by Bowtie. Note that -n 2 and -n 3 modes are not fully fully sensitive by default to avoid excessive backtracking (see manual section on Maq-like alignment).

That alignment does have 3 mismatches in the seed (at 0-based offsets 10, 18 and 27 from the 5' end).

Hope that helps,
Ben
Ben Langmead is offline   Reply With Quote
Old 12-05-2009, 01:06 AM   #279
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 316
Question

Hi,

I am confused by the bowtie options again. I used the options "-a --best --strata", but got a result as below:

Code:
Read1 16      chr1    7947971 255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIIIIIIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:50 NM:i:0
Read1 16      chr12   48275260        255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:12A7T29    NM:i:2
The result shows that there are two hits for this read: one hits to chr1 (where the sequence from) perfectly, and the other hits to chr12 with 2 mismatches. However, my expectation is to make bowtie only report the best hit (namely the hit to chr1) by using the options "-a --best --strata". Why I get this weird result?
Thanks in advance.
--
Xi
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 12-07-2009, 09:05 AM   #280
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 481
Default

I think that is to do with the seed length. For your seed length, are both reads equally good hits!

Quote:
Originally Posted by Xi Wang View Post
Hi,

I am confused by the bowtie options again. I used the options "-a --best --strata", but got a result as below:

Code:
Read1 16      chr1    7947971 255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIIIIIIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:50 NM:i:0
Read1 16      chr12   48275260        255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:12A7T29    NM:i:2
The result shows that there are two hits for this read: one hits to chr1 (where the sequence from) perfectly, and the other hits to chr12 with 2 mismatches. However, my expectation is to make bowtie only report the best hit (namely the hit to chr1) by using the options "-a --best --strata". Why I get this weird result?
Thanks in advance.
--
Xi
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:20 PM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.