SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA sam and Samtools sam->bam conversion problem maasha Bioinformatics 6 06-05-2013 07:39 AM
.SAM to .BAM with SAM file header @PG emilyjia2000 Bioinformatics 13 06-14-2011 12:21 PM
NEw to Chip-seq and have .bam/.sam/.bam.bai files... then what? NGS newbie Bioinformatics 11 05-25-2011 07:48 AM
sam to bam kwtennis311 Bioinformatics 2 07-20-2010 12:19 PM
SAM format NH tag question blu78 Bioinformatics 0 05-12-2010 05:07 AM

Reply
 
Thread Tools
Old 01-19-2011, 03:13 PM   #1
davetang
Member
 
Location: Japan

Join Date: Jul 2010
Posts: 11
Default SAM/BAM MD tag

Dear all,

I am writing a parser for the MD tag in SAM/BAM files because I couldn't find one. I am interested in tallying the alignment mismatches and the MD field contains the information I need.

In the example of the SAM manual:

The MD field aims to achieve SNP/indel calling without looking at the reference. For example, a string "10A5^AC6" means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is different from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches. The MD field ought to match the CIGAR string.

I was wondering how the MD field would describe a 2bp deletion that is followed by a mismatch e.g.


R: AAAAAAAAAAATTTTT--GTTTTT
Q: AAAAAAAAAAGTTTTTACATTTTT


since this would be "10A5^ACG5".

Perhaps I need to incorporate the CIGAR information to properly parse these cases or these cases never happen? Of course if a parser is already available for doing this, I would much prefer that.

Thank you in advance,

Dave
davetang is offline   Reply With Quote
Old 01-19-2011, 03:29 PM   #2
davetang
Member
 
Location: Japan

Join Date: Jul 2010
Posts: 11
Default

Should have looked before I posted my question but nevertheless this post could still be useful for someone else.

30M8D6M 27T2^ATGCATTT0G3T1

There is a 0 separating the 8 deletions and the single mismatch after the deletions.
davetang is offline   Reply With Quote
Old 03-10-2012, 01:55 PM   #3
Mad_bess
Junior Member
 
Location: Paris

Join Date: Mar 2012
Posts: 1
Default

Could you please share the source code of the parser ? Would be sooo nice of you
Mad_bess is offline   Reply With Quote
Old 03-10-2012, 03:51 PM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

bio::db::sam has a parser that does this.
nilshomer is offline   Reply With Quote
Old 03-10-2012, 05:50 PM   #5
davetang
Member
 
Location: Japan

Join Date: Jul 2010
Posts: 11
Default

Quote:
Originally Posted by Mad_bess View Post
Could you please share the source code of the parser ? Would be sooo nice of you :)
Hello,

As nilshomer suggested, use the Perl module use Bio::DB::Sam. I wrote some code which uses the module:

http://davetang.org/muse/2011/01/28/perl-and-sam/

The code is written by me, so caveat emptor.

Cheers,

Dave
davetang is offline   Reply With Quote
Old 03-24-2012, 06:07 AM   #6
adrian
Member
 
Location: baltimore

Join Date: Oct 2009
Posts: 89
Default What is the meaning of '0' in MD tag

I could never understand what the purpose of 0 in MD tag. Could you help understanding that please.
thanks.
Adrian
adrian is offline   Reply With Quote
Old 03-24-2012, 08:25 AM   #7
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

I haven't found why they are necessary, but sometimes it helps to have them visually. They generally occur between SNPs, or between a deletion then a SNP.

For example, "5^AC0C5" with a cigar "5M2D6M", or "5A0C5" with a cigar "12M". In the former it is easy to see where the deletion ends (the 0) and the next base (a C SNP) starts.

The SAMtools code puts them in, so other's follow the same lead. You could ask the samtools help list.
nilshomer is offline   Reply With Quote
Old 07-08-2020, 02:34 PM   #8
ihoskins
Junior Member
 
Location: Colorado, USA

Join Date: Jun 2016
Posts: 2
Default

For clarification on the 0, see this blog: https://lh3.github.io/2018/03/27/the...and-the-md-tag

As nilshomer indicates this is to delineate mismatches from the reference in the context of deletions.
ihoskins is offline   Reply With Quote
Reply

Tags
bam, md tag, parser, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:53 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO