I would like to know whether I should include the intron size (CIGAR N) and softclipped reads (CIGAR S) in the TLEN field of SAM file.
For example, if the positions and CIGAR strings of the two mates are as follows:
Mate 1: pos 1000 CIGAR 25M10000N25M
Mate 2: pos 11300 CIGAR 40M10S
I can think of four different TLEN values
1. 11300 - 1000 + 40 = 10340 (include intron)
2. 11300 - 1000 + 40 + 10 = 10350 (include intron + softclip)
3. 11300 - 1000 + 40 - 10000 = 340 (don't include intron or softclip)
4. 11300 - 1000 + 40 + 10 - 10000 = 350 (include softclip)
Thank you.
For example, if the positions and CIGAR strings of the two mates are as follows:
Mate 1: pos 1000 CIGAR 25M10000N25M
Mate 2: pos 11300 CIGAR 40M10S
I can think of four different TLEN values
1. 11300 - 1000 + 40 = 10340 (include intron)
2. 11300 - 1000 + 40 + 10 = 10350 (include intron + softclip)
3. 11300 - 1000 + 40 - 10000 = 340 (don't include intron or softclip)
4. 11300 - 1000 + 40 + 10 - 10000 = 350 (include softclip)
Thank you.
Comment