SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa sai to bam conversion and indexfile.nt.ann?? cllorens Bioinformatics 16 05-29-2013 09:27 AM
Merge sai file of bwa ? louis7781x Bioinformatics 5 12-20-2011 04:00 PM
The 'S' in CIGAR of sam file (bwa) qixiaofei General 6 09-16-2011 12:28 AM
BWA - file formats robekubica Bioinformatics 1 08-27-2011 05:07 PM
Confused about .sai file size CNVboy Bioinformatics 1 06-15-2011 02:14 PM

Reply
 
Thread Tools
Old 09-08-2009, 02:20 AM   #1
mingkunli
Member
 
Location: Germany

Join Date: Jan 2009
Posts: 41
Default what's in bwa's .sai file

besides the content that shown in .sam file(alignment of the best match, and number of suboptimal/all hits), seems it also contains some information of the suboptimal hits, is it possible to look at the details of these hits.
with the command "bwa samse -n INT", I can only get the position where they mapped and number of mismatch.

For paired-end data, whether "bwa sampe" consider the suboptimal hits? i.e., best-best match violate the distance constrain, but suboptimal-suboptimal or suboptimal-best may be in the rational distance.
mingkunli is offline   Reply With Quote
Old 09-08-2009, 03:53 AM   #2
henry
Member
 
Location: china

Join Date: Sep 2009
Posts: 36
Default

Quote:
Originally Posted by mingkunli View Post
besides the content that shown in .sam file(alignment of the best match, and number of suboptimal/all hits), seems it also contains some information of the suboptimal hits, is it possible to look at the details of these hits.
with the command "bwa samse -n INT", I can only get the position where they mapped and number of mismatch.

For paired-end data, whether "bwa sampe" consider the suboptimal hits? i.e., best-best match violate the distance constrain, but suboptimal-suboptimal or suboptimal-best may be in the rational distance.
.sai file is the output of command aln. .sai file contain the suffix array coordinate of all short reads loaded in. For bwa, sequence alignment is equal to searching for suffix array interval of substring of chromosome that matches the short read. And if knowing the interval in the suffix array, we can get positions of the short read.
If you wanna know very detailed how bwa algorithm works, you may read "fast and accurate short read alignment with burrows-wheeler transform' (Heng Li, et al), which has been published in bioinformatics.
I took a couple of days to full track and understand MAQ, and BWA algorithms. ^ ^

Best

Jing
henry is offline   Reply With Quote
Old 09-08-2009, 04:26 AM   #3
henry
Member
 
Location: china

Join Date: Sep 2009
Posts: 36
Default

Quote:
Originally Posted by mingkunli View Post
besides the content that shown in .sam file(alignment of the best match, and number of suboptimal/all hits), seems it also contains some information of the suboptimal hits, is it possible to look at the details of these hits.
with the command "bwa samse -n INT", I can only get the position where they mapped and number of mismatch.

For paired-end data, whether "bwa sampe" consider the suboptimal hits? i.e., best-best match violate the distance constrain, but suboptimal-suboptimal or suboptimal-best may be in the rational distance.
I haven't used bwa to process paired-end reads. So I have no hand-on experiences yet.
"BWA first finds the positions of all the good hits, sorts them according to the chromosomal coordinates and then does a linear scan through all the potential hits to pair the two ends." From the description of the BWA algorithm I listed above, when pairing two ends, the distance constraints have been considered, so the phenomena you mentioned will never happen.
henry is offline   Reply With Quote
Old 09-08-2009, 04:48 AM   #4
henry
Member
 
Location: china

Join Date: Sep 2009
Posts: 36
Default

Quote:
Originally Posted by henry View Post
I haven't used bwa to process paired-end reads. So I have no hand-on experiences yet.
"BWA first finds the positions of all the good hits, sorts them according to the chromosomal coordinates and then does a linear scan through all the potential hits to pair the two ends." From the description of the BWA algorithm I listed above, when pairing two ends, the distance constraints have been considered, so the phenomena you mentioned will never happen.
Sorry, I just misunderstood you. In MAQ, two best hits are kept in the queue for one end. if there are multiple consistent hit pairs, the mapping qualities are set much lower. so suboptimal hits are considered in MAQ. In BWA, there isn't description about the details how bwa pair two ends. It should also be considered.

best

Jing
henry is offline   Reply With Quote
Old 09-08-2009, 05:30 AM   #5
mingkunli
Member
 
Location: Germany

Join Date: Jan 2009
Posts: 41
Default

hey Jing, thanks for your help.
I also got the reply from the author, thanks to lh3
1) Both optimal and suboptimal hits are stored in .sai files, but only
approximate chromosomal positions are available. Detailed alignments are
reconstructed by samse and sampe.
2) Sampe considers suboptimal hits in pairing.

However, there is no way to generate the detailed alignments for these suboptimal
hits(in sam format) using samse, sampe.
mingkunli is offline   Reply With Quote
Old 09-08-2009, 06:43 PM   #6
henry
Member
 
Location: china

Join Date: Sep 2009
Posts: 36
Default

Quote:
Originally Posted by mingkunli View Post
hey Jing, thanks for your help.
I also got the reply from the author, thanks to lh3
1) Both optimal and suboptimal hits are stored in .sai files, but only
approximate chromosomal positions are available. Detailed alignments are
reconstructed by samse and sampe.
2) Sampe considers suboptimal hits in pairing.

However, there is no way to generate the detailed alignments for these suboptimal
hits(in sam format) using samse, sampe.
hi mingkunli,

Thank you for sharing this. ^ ^

Best

Jing
henry is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:38 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO