Seqanswers Leaderboard Ad

**maubp** · 01-18-2013, 05:31 PM

An issue like that was reported on the cram mailing list last year, actually a Picard bug:

Template length is off by 1 · Issue #20 · vadimzalunin/crammer

https://github.com/vadimzalunin/crammer/issues/20

$ samtools view NA12878.mapped.illumina.mosaik.CEU.exome.20110411.chr20.bam 20:64000-65000 | cut -f -9 SRR098401.102568768 147 20 63932 64 76M = 63912 -95 SRR098401.7377046 99 20 63987 65 76M = 641...

What version of the cram-tools do you have?

**BAMseek** · 01-19-2013, 01:33 AM

The magnitude of TLEN is "the number of bases from the leftmost mapped base to the rightmost mapped base".

So you should be able to check the other read and calculate TLEN to see which is right.

Assuming that the other read also has 76 matches, then I would think

TLEN = 69404 + 76 - 69171 = 309.

Which aligner did you use for the original BAM file?

Justin

**priesgo** · 01-21-2013, 01:53 AM

Thanks for your answers!
maubp, I am using version 1.00-b244 and it is linked to Picard 1.79.
Justin, this data is an Illumina exome from the 1000 Genomes Project. As it is reported in the project information, it was aligned with Mosaik (don't know the version).

By the way this is the mate-pair:

Code:

SRR107049.155163702     83      1       69404   39      76M     =       69171   -308    TGGTGACCCCCATAGCCATGGGCTGTGACAGATATAGAGCAATATGCAAGCCCCTACACTACACTACAATTATGTG    ##############################################EABFCC@G?>E=E;<H>F>>F@>B=>>B>S    RG:Z:SRR107049  NM:i:4  OQ:Z:##############################################?CBECCAEC=>;A<:FBDCBDEDDDDADDD

The value of 309 also makes sense for me. By the way, Justin, I saw you assumed that the mate-pair had a length of 76 just as his mate. And this is correct as you can see above, do mates usually have the same length?

Pablo.

**BAMseek** · 01-21-2013, 12:51 PM

Hi Pablo,
Yeah, it looks like Picard might be giving the right answer on this. Not sure which downstream tools rely on TLEN - the only thing I can think of is possibly the genomic viewers.

Paired-end reads won't necessarily be the same length (for example, if there was some trimming done to the reads to remove adapters). And even if they were the same length, they may not have the same number of aligned bases. To calculate the length of an alignment, I think it is #M (match/mismatch) + #D (deletions) + #N (skips). An alignment can have soft clips (bases at the ends of the read that aren't aligned), and those wouldn't be counted towards the length of the alignment. I just got lucky that it also happened to be 76M.

best,
Justin

**priesgo** · 01-22-2013, 12:10 AM

Thanks again Justin,

Yes, it seems to be an issue with the original file, don't know if coming from mosaik or somewhere else.

Pablo.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

CRAM compression and TLEN SAM's field

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News