![]() |
|
|||||||
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome | Ben Langmead | Literature Watch | 2 | 03-04-2013 02:06 AM |
| The best short read aligner | Deutsche | Bioinformatics | 4 | 04-14-2011 07:12 PM |
| Short Read Micro re-Aligner Paper | nilshomer | Literature Watch | 0 | 10-29-2010 09:59 AM |
| New Short Read Aligner | sparks | Bioinformatics | 48 | 08-26-2009 08:01 AM |
| Very Short Read aligner | Rupinder | Bioinformatics | 1 | 06-02-2009 07:10 PM |
![]() |
|
|
Thread Tools |
|
|
#261 | |
|
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 197
|
Quote:
Ben |
|
|
|
|
|
|
#262 | |
|
Senior Member
Location: Newcastle, Australia Join Date: Oct 2009
Posts: 307
|
Quote:
|
|
|
|
|
|
|
#263 | |
|
Member
Location: College Park, MD Join Date: Oct 2009
Posts: 32
|
Quote:
thanks for your reply, Indeed setting the -X parameter to 5000 and the orientation to (--rf [although it's the standard illumina paired end protocol]) and clipping 24 pair on the reverse sequence lead to 52.35% : #bowtie -t -p 16 -X 5000 --rf ./ref/h_sapiens_37_asm -1 ./fastq/s_8_1_sequence.fq -2 ./fastq/s_8_2_sequence_50b.fq ./align/pronest_5000_rf_2_50b.bowtie.align --un ./unalign/pronest_5000_rf_2_50b.unalign.fq # reads with at least one reported alignment: 3486518 (52.35%) # reads that failed to align: 3173993 (47.65%) Do you think alignment could be improved further ? thanks again for help and congratulation for the good work. Regards, Ramzi |
|
|
|
|
|
|
#264 |
|
Member
Location: San Diego Join Date: Oct 2009
Posts: 15
|
Hi Ben, Thanks for the update. Do you have an estimate about when Bowtie would be able to handle gzipped files for input and output?
Thanks, Andreia |
|
|
|
|
|
#265 | ||||
|
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 197
|
Hi Andreia,
Quote:
Quote:
Quote:
Quote:
Let me know if the suggestions don't work or if you'd like be to send you the paired-end script. Thanks, Ben |
||||
|
|
|
|
|
#266 | |
|
Member
Location: San Diego Join Date: Oct 2009
Posts: 15
|
Quote:
In the mean time, I have a few more questions/suggestions? about the output. I had 20 million Solexa 36mer reads run against a division of Genbank (lots of redundancy obviously) and got a huge output file - over 100 GB. It took over 3 hrs on 8 CPUs (64 bit) (I ran it with the options -a --best --strata -n 2). 1. Could we have the option that in the outfile we don't print the read sequence and/or quality score? or other ways to reduce the size of the output file, while not being quite "concise" style? 2. In the 'concise' mode, could we print the NAME (e.g. gi/accession up to the 1st whitespace) of the reference sequence, instead of the index? Thanks, Andreia |
|
|
|
|
|
|
#267 |
|
Member
Location: College Park, MD Join Date: Oct 2009
Posts: 32
|
Hi Ben,
Is there a way to have @RG header when generating the sam output file with bowtie ? Thanks in advance. |
|
|
|
|
|
#268 | |||
|
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 197
|
Quote:
Quote:
Quote:
Hope that helps, Ben |
|||
|
|
|
|
|
#269 | |
|
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 197
|
Quote:
I guess I'm not too familiar with how other tools set these fields. Are they typically set by the user? I.e., should I add a command-line option where the user can specify values for ID, SM, etc.? Thanks, Ben |
|
|
|
|
|
|
#270 | |
|
Member
Location: San Diego Join Date: Oct 2009
Posts: 15
|
Quote:
I also noticed that if I have for example a target of 1 million Genbank sequences of 20Kb each, if I concatenate them in a single Fasta sequence before building the index it speeds up the run by about 15-20%. |
|
|
|
|
|
|
#271 |
|
Junior Member
Location: Xishuangbanna Join Date: Nov 2009
Posts: 5
|
Hello everybody. Not sure how to address my question. I mean, bowtie will skip reads that are less than 4 characters. How can I make it skip reads less than 5 or 6 or more? Can I set that?
Thanks very much! |
|
|
|
|
|
#272 | |
|
Member
Location: College Park, MD Join Date: Oct 2009
Posts: 32
|
Quote:
I'm trying to run some structural variation programs such BreakDancer, Pindel, ... and seems that problem is due to missing info in the header. I guess that there's no other choice than specifing the header manually so would be a good idea to add that option. I guess effort should focus on a standard format to avoid compatibility problem. Thanks Ben. |
|
|
|
|
|
|
#273 |
|
Senior Member
Location: USA Join Date: Jan 2008
Posts: 480
|
I am confused about this read being missed by bowtie. Did I miss something here
Here is the blat result on reference sequence HTML Code:
BLASTN 2.2.11 [blat]
Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
Query= read
(75 letters)
Database: Zv7_scaffolds.fa
7494 sequences; 1,440,319,812 total letters
Searching.done
Score E
Sequences producing significant alignments: (bits) Value
Zv7_scaffold910 132 5e-31
Zv7_scaffold910 128 9e-30
...
>Zv7_scaffold910
Length = 7751464
Score = 132 bits (341), Expect = 5e-31
Identities = 72/75 (96%)
Strand = Minus / Plus
Query: 75 agtctgcttttccatataaaactgagaagaagagactgcagccttgaacaaacttgggaa 16
||||||||||||||||||||||||||||||||||||||||||||||| |||||||| |||
Sbjct: 5660145 agtctgcttttccatataaaactgagaagaagagactgcagccttgatcaaacttgcgaa 5660204
Query: 15 gtcttaacttacacg 1
|||| ||||||||||
Sbjct: 5660205 gtctgaacttacacg 5660219
While this is what bowtie reports HTML Code:
$ /home/m049157/build/bowtie-0.10.0/bowtie --best -n 3 -p 4 -t zv7scaffold -c CGTGTAAGTTAAGACT TCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT Time loading forward index: 00:00:16 Time loading mirror index: 00:00:16 Seeded quality full-index search: 00:00:01 No results Time searching: 00:00:33 Overall time: 00:00:33 |
|
|
|
|
|
#275 |
|
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 197
|
|
|
|
|
|
|
#276 |
|
Senior Member
Location: USA Join Date: Jan 2008
Posts: 480
|
Thanks Ben, setting it that way worked.
But why don't I get the optimum alignment, that is seen using blat! Even using the -a option gives sub-optimal hits.. Here is the real read with Q values, that ended up unmapped @HWI-E4:1:87:1633:1127#0/1 CGTGTAAGTTAAGACTTCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT + B427?@?=@@>07?@ABBBB@?<>AB=4@B?<+/B42@82;A?A<4.,@:6<)9:<8(0998(-/;A=3%%%%%% HTML Code:
$ /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -e 90 -n 3 zv7scaffold w ww Time loading forward index: 00:00:01 Time loading mirror index: 00:00:01 Seeded quality full-index search: 00:00:00 Reported 1 alignments to 1 output stream(s) Time searching: 00:00:02 Overall time: 00:00:02 $ cat ww HWI-E4:1:87:1633:1127#0/1 + Zv7_scaffold1183 139472 CGTGTAAGTTAAGACTTCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT B427?@?=@@>07?@ABBBB@?<>AB=4@B?<+/B42@82;A?A<4.,@:6<)9:<8(0998(-/;A=3%%%%%% 0 18:G>C,27:A>T,57:G>T,68:C>G,69:A>C,70:G>A,71:A>G,72:C>A,74:G>T -- removing -n 3 $ /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -e 90 zv7scaffold w ww $ cat ww HWI-E4:1:87:1633:1127#0/1 - Zv7_scaffold724 463363 AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B 0 18:C>G,27:T>A,57:C>A,68:G>C,69:T>G,70:A>T,71:T>C,72:G>T,74:C>A -- using -a to report all aln $ /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -e 90 -a zv7scaffold w ww $ cat ww HWI-E4:1:87:1633:1127#0/1 - Zv7_scaffold724 463363 AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B 0 18:C>G,27:T>A,57:C>A,68:G>C,69:T>G,70:A>T,71:T>C,72:G>T,74:C>A HWI-E4:1:87:1633:1127#0/1 + Zv7_scaffold1183 139472 CGTGTAAGTTAAGACTTCCCAAGTTTGTTCAAGGCTGCAGTCTCTTCTTCTCAGTTTTATATGGAAAAGCAGACT B427?@?=@@>07?@ABBBB@?<>AB=4@B?<+/B42@82;A?A<4.,@:6<)9:<8(0998(-/;A=3%%%%%% 0 18:G>C,27:A>T,57:G>T,68:C>G,69:A>C,70:G>A,71:A>G,72:C>A,74:G>T HWI-E4:1:87:1633:1127#0/1 - Zv7_scaffold2650 169222 AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B 0 18:C>G,27:T>A,57:C>A,68:G>C,69:T>G,70:C>T,71:T>C,72:G>T,73:A>G,74:T>A Last edited by bioinfosm; 11-30-2009 at 12:14 PM. |
|
|
|
|
|
#277 |
|
Senior Member
Location: USA Join Date: Jan 2008
Posts: 480
|
When I limited my reference sequence to the blat hit region, I got the hit with 3 mis-matches, however, not before I increased the -e option to -e 80. Why would I not get this hit previously, when I used -a -e 90 to report all hits?
And why do I have to do -n 3, when the seed length by default is 28, and there are no more than 2 mis-matches in 28bp? HTML Code:
$ /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -n 3 -e 80 -a www w ww Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Seeded quality full-index search: 00:00:00 Reported 1 alignments to 1 output stream(s) Time searching: 00:00:00 Overall time: 00:00:00 $ cat ww HWI-E4:1:87:1633:1127#0/1 - Zv7_scaffold910 5660144 AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B 0 10:G>T,18:C>G,27:T>A |
|
|
|
|
|
#278 | |
|
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 197
|
Quote:
Try using the --maxbts or -y options to increase the amount of searching effort put in by Bowtie. Note that -n 2 and -n 3 modes are not fully fully sensitive by default to avoid excessive backtracking (see manual section on Maq-like alignment). That alignment does have 3 mismatches in the seed (at 0-based offsets 10, 18 and 27 from the 5' end). Hope that helps, Ben |
|
|
|
|
|
|
#279 |
|
Senior Member
Location: Newcastle, Australia Join Date: Oct 2009
Posts: 307
|
Hi,
I am confused by the bowtie options again. I used the options "-a --best --strata", but got a result as below: Code:
Read1 16 chr1 7947971 255 50M * 0 0 ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG IIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:50 NM:i:0 Read1 16 chr12 48275260 255 50M * 0 0 ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG IIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:12A7T29 NM:i:2 Thanks in advance. -- Xi
__________________
Xi Wang |
|
|
|
|
|
#280 | |
|
Senior Member
Location: USA Join Date: Jan 2008
Posts: 480
|
I think that is to do with the seed length. For your seed length, are both reads equally good hits!
Quote:
|
|
|
|
|
![]() |
| Thread Tools | |
|
|