SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bowtie SAM mapq field rgregor Bioinformatics 2 12-19-2012 05:04 PM
Looking process to convert gff3 format into ace format or sam format andylai Bioinformatics 1 05-17-2011 03:09 AM
Is * really a valid value for a SAM FLAG field? derobins Bioinformatics 1 01-20-2011 10:06 AM
SAM flag field and removing unmapped reads from BFAST output aiden Bioinformatics 3 05-27-2010 07:10 PM
Extracting one field from a SAM file jdrum00 Bioinformatics 8 01-04-2010 09:40 PM

Reply
 
Thread Tools
Old 06-28-2010, 10:38 AM   #1
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default SAM Format - SEQ field '='

Hi,

I was just looking through the SAM format spec again and I've come across something that confuses me.

The text for the 'SEQ' field is:
"query SEQuence; = for a match to the reference; n/N/. for ambiguity; cases are not maintained"

In practice, I've never seen a "=" in the SEQ field; is this supposed to be an optional way of using the format, i.e. using a combination of "=" and "N" instead of the actual query sequence?

Thanks,
Bio
Bio.X2Y is offline   Reply With Quote
Old 06-29-2010, 06:31 AM   #2
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

You're right, that is somehow confusing.
By chance, I just read on the samtools manual page that samtools fillmd using option -e can "convert the read base to = if it is identical to the aligned reference base. Indel caller does not support the = bases at the moment."

I guess reporting the actual query sequence (along with the CIGAR string) makes much more sense in terms of computation because you don't need to look up the bases in the reference. And by using N, you'd lose all SNP information.
epigen is offline   Reply With Quote
Old 06-29-2010, 07:05 AM   #3
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default

Thanks epigen, I guess it's an optional way of using the format then.
Bio.X2Y is offline   Reply With Quote
Old 04-25-2012, 05:26 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Using = in the SEQ field is useful for compressing the data, see:
http://blastedbio.blogspot.co.uk/201...mpression.html

(I know it is an old post, but the new similar thread feature of the forum brought it to my attention)
maubp is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:40 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO