SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   SAM Format - Hard Clip (http://seqanswers.com/forums/showthread.php?t=5505)

Bio.X2Y 06-14-2010 11:01 AM

SAM Format - Hard Clip
 
Hiya,

I was wondering can shed light on the following question on the hard-clipping within the SAM format (I'm not able to pinpoint an answer in the spec):

If a query read aligns with H's on either end (i.e. hard-clipped), I understand the query sequence itself is truncated in the SEQ field.

My question is - in these cases, should the POS reflect
(a) the index of the first aligned base in the reference, or b
(b) the index of the first base in the reference correspond to the first H.

Thanks for your time.
Bio.

mrawlins 06-14-2010 11:34 AM

I don't recall having seen a SAM file with an H at the beginning of the CIGAR, only at the end. If the H is at the end (or somehow in the middle) then it shouldn't affect the position.

I would guess the position reflects the first base corresponding to the first H, though that may not be true. My reasoning is, if you're not starting from the first H, why even include the hard clipping in the first place? Since hard clipped sequence isn't included in the reported alignment, the only real reason for including it is to indicate the offset from the state position.

That being said, I'm just guessing here. If somebody knows conclusively, go with whatever they tell you.

Good Luck

Bio.X2Y 06-14-2010 11:45 AM

Thanks mrawlins! I'm leaning towards that interpretation too.

I'm trying to get a definitive stance since I'm in the process of writing some code that supports sam, and I'd prefer not to add another non-compliant application to the mix!

Anybody have something conclusive?
Thanks for your time

nilshomer 06-14-2010 11:49 AM

Quote:

Originally Posted by Bio.X2Y (Post 20159)
Thanks mrawlins! I'm leaning towards that interpretation too.

I'm trying to get a definitive stance since I'm in the process of writing some code that supports sam, and I'd prefer not to add another non-compliant application to the mix!

Anybody have something conclusive?
Thanks for your time

Since the spec. is ambiguous in this case, and it probably should not be, could you send your email to the samtools help list (samtools-help@lists.sourceforge.net)? I think the spec. could benefit from this question.

lh3 06-14-2010 12:04 PM

SAM spec: POS is "1-based leftmost POSition/coordinate of clipped sequence". Isn't it clear?

nilshomer 06-14-2010 12:12 PM

Quote:

Originally Posted by lh3 (Post 20161)
SAM spec: POS is "1-based leftmost POSition/coordinate of clipped sequence". Isn't it clear?

I was reading 1.4.3, since this is where you would ask such a question. Very clear, sorry :o.

Bio.X2Y 06-14-2010 12:15 PM

Thanks guys.

lh3, I'm afraid I still find that a bit ambiguous -

does "clipped sequence" refer to the "sequence that is clipped" (implying the whole thing, including the clipped part) or the "region of the sequence that remains after clipping".

Which way are you taking it?

Thanks

Bio.X2Y 06-14-2010 12:38 PM

As I think about it, the second one seems to make the most sense - "region of the sequence that remains after clipping".

Thanks for your input everyone.


All times are GMT -8. The time now is 05:35 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.