Seqanswers Leaderboard Ad

**cascoamarillo** · 12-12-2012, 03:31 PM

Ok, I'll try to explain with some data:
samtools|bcftools find this variant:

Code:

scaffold_739	50165	.	A	G	21	.	DP=39;VDB=0.0011;AF1=0.5004;AC1=1;DP4=0,11,0,28;MQ=20;FQ=3.54;PV4=1,3.8e-43,1,0.13

When you trace it you find the region in your bam file (or mpileup) like:

Code:

EAS1745:7:120:13118:14919	16	scaffold_739	50153	255	30M	*	0	0	ATGCAATAAACG*G*CATTGTCTCTGTTGTCA	IIIIIIIIIIIIIIIIIIIIIIIIIIIIII	XA:i:1	MD:Z:12A17	NM:i:1

where the "16" means it is the reverse complimentary of the read, with the variant between * (I added that). But actually the read is
>EAS1745:7:120:13118:14919
TGACAACAGAGACAATG*C*CGTTTATTGCAT

So the real variant should be T->C instead of A->G as the becftools calls it (also happens with VarScan). Is there a way to do this correction without going one by one read?

cheers

**kmcarr** · 12-12-2012, 07:44 PM

Originally posted by cascoamarillo View Post

So the real variant should be T->C instead of A->G as the becftools calls it (also happens with VarScan). Is there a way to do this correction without going one by one read?

A->G on the Top|Forward|Watson strand is equally as real as the T->C on the Bottom|Reverse|Crick strand. Variants are always reported as the change to the reference sequence which in this case is A->G. The VCF file is correct; if you try to "fix" it you will actually be breaking it.

**cascoamarillo** · 12-13-2012, 08:38 AM

Thanks for the reply and for your point of view. But in my case (directional RNA-Seq), you can have reads matching the forward or the reverse strand and I want to see the RNA-editing that could happend in those real reads (it is not paired-end RNA-seq run).

In other words, in an ADAR (adenosine deaminase) is acting in one point of a transcript, editing C->U, you should be able to detect the C->T in the RNA-seq reads (comparing it with the reference). But if the reads is matching the reverse strand, what you get is G->A, which is not the real editing that is happenig in the cell.
I'm just sorprised that samtools|bcftools is not able to distinguish those scenarios; or at least I'm not able to find it.

**HESmith** · 12-13-2012, 09:05 AM

Directionality is preserved at the alignment step, so you could split your data based on alignment orientation and then proceed with variant calling on the two data sets.

**kmcarr** · 12-13-2012, 09:35 AM

Originally posted by cascoamarillo View Post

Thanks for the reply and for your point of view. But in my case (directional RNA-Seq), you can have reads matching the forward or the reverse strand and I want to see the RNA-editing that could happend in those real reads (it is not paired-end RNA-seq run).

In other words, in an ADAR (adenosine deaminase) is acting in one point of a transcript, editing C->U, you should be able to detect the C->T in the RNA-seq reads (comparing it with the reference). But if the reads is matching the reverse strand, what you get is G->A, which is not the real editing that is happenig in the cell.
I'm just sorprised that samtools|bcftools is not able to distinguish those scenarios; or at least I'm not able to find it.

It's not surprising at all since samtools/bcftools are adhering to the universally accepted practice of always reporting variation with respect to the published reference.

The problem lies in the fact that you are trying to use a genomic reference to describe events occurring at the transcript level. Instead of aligning your reads to a genome reference, align them to a reference made of the transcripts from your organism. That way each transcript is a separate entity and is represented in its coding direction. The drawback with this approach is the presence of multiple isoforms of the same gene being present in your reference set. This will mean many reads will map to multiple transcripts (i.e. shared exons in isoforms). You will have to adjust your mapping/reporting parameters to permit multiple alignments.

**cascoamarillo** · 12-13-2012, 12:57 PM

Thanks for the replies!
You are right kmcarr, it should be better to compare it with a transcript reference; but anyway I still have sense and antisense reads to the transcripts (yes, it's crazy stuff). I'll follow HESmith suggestion and split the bam files in two. Thanks guys.

**syfo** · 12-14-2012, 01:40 AM

Originally posted by cascoamarillo View Post

anyway I still have sense and antisense reads to the transcripts (yes, it's crazy stuff)

Maybe more interesting than crazy..

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Variant call in directional RNA-seq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News