![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
SAM/BAM format to wiggle format | pinki999 | Bioinformatics | 19 | 08-12-2015 01:35 AM |
SAM to CUFFLINKS SAM format | repinementer | Bioinformatics | 4 | 03-15-2012 09:53 AM |
Looking process to convert gff3 format into ace format or sam format | andylai | Bioinformatics | 1 | 05-17-2011 03:09 AM |
csfasta quality hard trimming do i need to hard trim the qual file? | KevinLam | Bioinformatics | 2 | 05-13-2010 03:27 PM |
anyone help me on bowtie format -> sam format! | tninja | Bioinformatics | 2 | 04-25-2010 10:33 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Europe Join Date: Apr 2010
Posts: 46
|
![]()
Hiya,
I was wondering can shed light on the following question on the hard-clipping within the SAM format (I'm not able to pinpoint an answer in the spec): If a query read aligns with H's on either end (i.e. hard-clipped), I understand the query sequence itself is truncated in the SEQ field. My question is - in these cases, should the POS reflect (a) the index of the first aligned base in the reference, or b (b) the index of the first base in the reference correspond to the first H. Thanks for your time. Bio. |
![]() |
![]() |
![]() |
#2 |
Member
Location: Retirement - Not working with bioinformatics anymore. Join Date: Apr 2010
Posts: 63
|
![]()
I don't recall having seen a SAM file with an H at the beginning of the CIGAR, only at the end. If the H is at the end (or somehow in the middle) then it shouldn't affect the position.
I would guess the position reflects the first base corresponding to the first H, though that may not be true. My reasoning is, if you're not starting from the first H, why even include the hard clipping in the first place? Since hard clipped sequence isn't included in the reported alignment, the only real reason for including it is to indicate the offset from the state position. That being said, I'm just guessing here. If somebody knows conclusively, go with whatever they tell you. Good Luck |
![]() |
![]() |
![]() |
#3 |
Member
Location: Europe Join Date: Apr 2010
Posts: 46
|
![]()
Thanks mrawlins! I'm leaning towards that interpretation too.
I'm trying to get a definitive stance since I'm in the process of writing some code that supports sam, and I'd prefer not to add another non-compliant application to the mix! Anybody have something conclusive? Thanks for your time |
![]() |
![]() |
![]() |
#4 | |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Boston Join Date: Feb 2008
Posts: 693
|
![]()
SAM spec: POS is "1-based leftmost POSition/coordinate of clipped sequence". Isn't it clear?
|
![]() |
![]() |
![]() |
#6 |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]() |
![]() |
![]() |
![]() |
#7 |
Member
Location: Europe Join Date: Apr 2010
Posts: 46
|
![]()
Thanks guys.
lh3, I'm afraid I still find that a bit ambiguous - does "clipped sequence" refer to the "sequence that is clipped" (implying the whole thing, including the clipped part) or the "region of the sequence that remains after clipping". Which way are you taking it? Thanks |
![]() |
![]() |
![]() |
#8 |
Member
Location: Europe Join Date: Apr 2010
Posts: 46
|
![]()
As I think about it, the second one seems to make the most sense - "region of the sequence that remains after clipping".
Thanks for your input everyone. |
![]() |
![]() |
![]() |
Tags |
hard clip, sam |
Thread Tools | |
|
|