SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SAM/BAM format to wiggle format pinki999 Bioinformatics 19 08-12-2015 12:35 AM
SAM to CUFFLINKS SAM format repinementer Bioinformatics 4 03-15-2012 08:53 AM
Looking process to convert gff3 format into ace format or sam format andylai Bioinformatics 1 05-17-2011 02:09 AM
csfasta quality hard trimming do i need to hard trim the qual file? KevinLam Bioinformatics 2 05-13-2010 02:27 PM
anyone help me on bowtie format -> sam format! tninja Bioinformatics 2 04-25-2010 09:33 PM

Reply
 
Thread Tools
Old 06-14-2010, 10:01 AM   #1
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default SAM Format - Hard Clip

Hiya,

I was wondering can shed light on the following question on the hard-clipping within the SAM format (I'm not able to pinpoint an answer in the spec):

If a query read aligns with H's on either end (i.e. hard-clipped), I understand the query sequence itself is truncated in the SEQ field.

My question is - in these cases, should the POS reflect
(a) the index of the first aligned base in the reference, or b
(b) the index of the first base in the reference correspond to the first H.

Thanks for your time.
Bio.
Bio.X2Y is offline   Reply With Quote
Old 06-14-2010, 10:34 AM   #2
mrawlins
Member
 
Location: Retirement - Not working with bioinformatics anymore.

Join Date: Apr 2010
Posts: 63
Default

I don't recall having seen a SAM file with an H at the beginning of the CIGAR, only at the end. If the H is at the end (or somehow in the middle) then it shouldn't affect the position.

I would guess the position reflects the first base corresponding to the first H, though that may not be true. My reasoning is, if you're not starting from the first H, why even include the hard clipping in the first place? Since hard clipped sequence isn't included in the reported alignment, the only real reason for including it is to indicate the offset from the state position.

That being said, I'm just guessing here. If somebody knows conclusively, go with whatever they tell you.

Good Luck
mrawlins is offline   Reply With Quote
Old 06-14-2010, 10:45 AM   #3
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default

Thanks mrawlins! I'm leaning towards that interpretation too.

I'm trying to get a definitive stance since I'm in the process of writing some code that supports sam, and I'd prefer not to add another non-compliant application to the mix!

Anybody have something conclusive?
Thanks for your time
Bio.X2Y is offline   Reply With Quote
Old 06-14-2010, 10:49 AM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Bio.X2Y View Post
Thanks mrawlins! I'm leaning towards that interpretation too.

I'm trying to get a definitive stance since I'm in the process of writing some code that supports sam, and I'd prefer not to add another non-compliant application to the mix!

Anybody have something conclusive?
Thanks for your time
Since the spec. is ambiguous in this case, and it probably should not be, could you send your email to the samtools help list (samtools-help@lists.sourceforge.net)? I think the spec. could benefit from this question.
nilshomer is offline   Reply With Quote
Old 06-14-2010, 11:04 AM   #5
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

SAM spec: POS is "1-based leftmost POSition/coordinate of clipped sequence". Isn't it clear?
lh3 is offline   Reply With Quote
Old 06-14-2010, 11:12 AM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by lh3 View Post
SAM spec: POS is "1-based leftmost POSition/coordinate of clipped sequence". Isn't it clear?
I was reading 1.4.3, since this is where you would ask such a question. Very clear, sorry .
nilshomer is offline   Reply With Quote
Old 06-14-2010, 11:15 AM   #7
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default

Thanks guys.

lh3, I'm afraid I still find that a bit ambiguous -

does "clipped sequence" refer to the "sequence that is clipped" (implying the whole thing, including the clipped part) or the "region of the sequence that remains after clipping".

Which way are you taking it?

Thanks
Bio.X2Y is offline   Reply With Quote
Old 06-14-2010, 11:38 AM   #8
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default

As I think about it, the second one seems to make the most sense - "region of the sequence that remains after clipping".

Thanks for your input everyone.
Bio.X2Y is offline   Reply With Quote
Reply

Tags
hard clip, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO