Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variant call in directional RNA-seq

    Hi,
    I'm looking for single nucleotide variants (SNV) in my RNA-seq data. But after following bowtie mapping -> samtools mpileup -> bcftools pipeline; I see that it doesn't keep the directionality of the reads. It just take the reference and the alternate, ignoring if the mismatch is based on the forward or the reverse strand. So after looking at all the parameter that these programs can take into, I'm not able to find anything regarding this issue. The question seems pretty stupid and I'm feeling so because I can't find an answer for that; how to maintain the directionality in these case?

    Thanks

  • #2
    Ok, I'll try to explain with some data:
    samtools|bcftools find this variant:

    Code:
    scaffold_739	50165	.	A	G	21	.	DP=39;VDB=0.0011;AF1=0.5004;AC1=1;DP4=0,11,0,28;MQ=20;FQ=3.54;PV4=1,3.8e-43,1,0.13
    When you trace it you find the region in your bam file (or mpileup) like:

    Code:
    EAS1745:7:120:13118:14919	16	scaffold_739	50153	255	30M	*	0	0	ATGCAATAAACG*G*CATTGTCTCTGTTGTCA	IIIIIIIIIIIIIIIIIIIIIIIIIIIIII	XA:i:1	MD:Z:12A17	NM:i:1
    where the "16" means it is the reverse complimentary of the read, with the variant between * (I added that). But actually the read is
    >EAS1745:7:120:13118:14919
    TGACAACAGAGACAATG*C*CGTTTATTGCAT

    So the real variant should be T->C instead of A->G as the becftools calls it (also happens with VarScan). Is there a way to do this correction without going one by one read?

    cheers

    Comment


    • #3
      Originally posted by cascoamarillo View Post
      So the real variant should be T->C instead of A->G as the becftools calls it (also happens with VarScan). Is there a way to do this correction without going one by one read?
      A->G on the Top|Forward|Watson strand is equally as real as the T->C on the Bottom|Reverse|Crick strand. Variants are always reported as the change to the reference sequence which in this case is A->G. The VCF file is correct; if you try to "fix" it you will actually be breaking it.

      Comment


      • #4
        Thanks for the reply and for your point of view. But in my case (directional RNA-Seq), you can have reads matching the forward or the reverse strand and I want to see the RNA-editing that could happend in those real reads (it is not paired-end RNA-seq run).

        In other words, in an ADAR (adenosine deaminase) is acting in one point of a transcript, editing C->U, you should be able to detect the C->T in the RNA-seq reads (comparing it with the reference). But if the reads is matching the reverse strand, what you get is G->A, which is not the real editing that is happenig in the cell.
        I'm just sorprised that samtools|bcftools is not able to distinguish those scenarios; or at least I'm not able to find it.

        Comment


        • #5
          Directionality is preserved at the alignment step, so you could split your data based on alignment orientation and then proceed with variant calling on the two data sets.

          Comment


          • #6
            Originally posted by cascoamarillo View Post
            Thanks for the reply and for your point of view. But in my case (directional RNA-Seq), you can have reads matching the forward or the reverse strand and I want to see the RNA-editing that could happend in those real reads (it is not paired-end RNA-seq run).

            In other words, in an ADAR (adenosine deaminase) is acting in one point of a transcript, editing C->U, you should be able to detect the C->T in the RNA-seq reads (comparing it with the reference). But if the reads is matching the reverse strand, what you get is G->A, which is not the real editing that is happenig in the cell.
            I'm just sorprised that samtools|bcftools is not able to distinguish those scenarios; or at least I'm not able to find it.
            It's not surprising at all since samtools/bcftools are adhering to the universally accepted practice of always reporting variation with respect to the published reference.

            The problem lies in the fact that you are trying to use a genomic reference to describe events occurring at the transcript level. Instead of aligning your reads to a genome reference, align them to a reference made of the transcripts from your organism. That way each transcript is a separate entity and is represented in its coding direction. The drawback with this approach is the presence of multiple isoforms of the same gene being present in your reference set. This will mean many reads will map to multiple transcripts (i.e. shared exons in isoforms). You will have to adjust your mapping/reporting parameters to permit multiple alignments.

            Comment


            • #7
              Thanks for the replies!
              You are right kmcarr, it should be better to compare it with a transcript reference; but anyway I still have sense and antisense reads to the transcripts (yes, it's crazy stuff). I'll follow HESmith suggestion and split the bam files in two. Thanks guys.

              Comment


              • #8
                Originally posted by cascoamarillo View Post
                anyway I still have sense and antisense reads to the transcripts (yes, it's crazy stuff)
                Maybe more interesting than crazy..

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X