SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   SAM Format - SEQ field '=' (http://seqanswers.com/forums/showthread.php?t=5725)

Bio.X2Y 06-28-2010 10:38 AM

SAM Format - SEQ field '='
 
Hi,

I was just looking through the SAM format spec again and I've come across something that confuses me.

The text for the 'SEQ' field is:
"query SEQuence; = for a match to the reference; n/N/. for ambiguity; cases are not maintained"

In practice, I've never seen a "=" in the SEQ field; is this supposed to be an optional way of using the format, i.e. using a combination of "=" and "N" instead of the actual query sequence?

Thanks,
Bio

epigen 06-29-2010 06:31 AM

You're right, that is somehow confusing.
By chance, I just read on the samtools manual page that samtools fillmd using option -e can "convert the read base to = if it is identical to the aligned reference base. Indel caller does not support the = bases at the moment."

I guess reporting the actual query sequence (along with the CIGAR string) makes much more sense in terms of computation because you don't need to look up the bases in the reference. And by using N, you'd lose all SNP information.

Bio.X2Y 06-29-2010 07:05 AM

Thanks epigen, I guess it's an optional way of using the format then.

maubp 04-25-2012 05:26 AM

Using = in the SEQ field is useful for compressing the data, see:
http://blastedbio.blogspot.co.uk/201...mpression.html

(I know it is an old post, but the new similar thread feature of the forum brought it to my attention)


All times are GMT -8. The time now is 10:47 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.