Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq SNP-calling without a complete reference

    I am working on a project that seeks to call SNPs for a non-model organism with no existing reference genome or transcriptome using multiplexed Illumina RNA-seq data.

    I used Trinity to assemble a partial 'reference' transcriptome of the most highly expressed transcripts for which we had sufficient coverage, as well as many fragments of lower-expressed transcripts. Then I used BWA to map all data for multiple individuals back to that reference, and finally used GATK to call SNPs.

    However, I am running into an issue where reads derived from paralogous genes or a multigene family are mapping back to the same reference contig, creating false SNPs in divergent positions. My evidence of this is that in general one 'allele' (actually a slightly divergent gene) is supported by significantly fewer than half of the reads for a given individual that is called a heterozygote. These 'SNPs' are also generally observed across several individuals, leading me to believe that these are not sequencing/library prep errors.

    I think that I will be able to identify these cases with some statistic, but I am wondering if there is a good way to modify the corresponding SAM files to remove the mis-mapped reads, then re-genotype. Has anyone else encountered similar issues, and if so how did you deal with it?
    Last edited by shoegame2001; 12-06-2011, 03:49 PM.

  • #2
    Hi, friend,

    You may try my program: EBARDenovo for RNA-Seq.
    Download EBARDenovo for free. Highly-accurate de novo assembler of paired-end RNA-Seq. A highly-accurate search-based de novo assembler of paired-end RNA-Seq for advance transcriptomic study.



    It's a 64-bits Windows command with .Net.

    EBARDenovo can assembly lower-expressed transcripts even their coverage depths are very low (e.g., 1.5).


    Frank H.T. Chu from Taiwan

    Originally posted by shoegame2001 View Post
    I am working on a project that seeks to call SNPs for a non-model organism with no existing reference genome or transcriptome using multiplexed Illumina RNA-seq data.

    I used Trinity to assemble a partial 'reference' transcriptome of the most highly expressed transcripts for which we had sufficient coverage, as well as many fragments of lower-expressed transcripts. Then I used BWA to map all data for multiple individuals back to that reference, and finally used GATK to call SNPs.

    However, I am running into an issue where reads derived from paralogous genes or a multigene family are mapping back to the same reference contig, creating false SNPs in divergent positions. My evidence of this is that in general one 'allele' (actually a slightly divergent gene) is supported by significantly fewer than half of the reads for a given individual that is called a heterozygote. These 'SNPs' are also generally observed across several individuals, leading me to believe that these are not sequencing/library prep errors.

    I think that I will be able to identify these cases with some statistic, but I am wondering if there is a good way to modify the corresponding SAM files to remove the mis-mapped reads, then re-genotype. Has anyone else encountered similar issues, and if so how did you deal with it?

    Comment


    • #3
      I’m in the same boat my friend. Right now I am using oases to assemble; after trialing several assembly programs I found it did the best work with my transcriptomes. I then implemented SOAPaligner in conjunction with SOAPsnp. This trial is still underway I will update you as soon as I compile my results. I would love to hear if you have made any progress using different programs or pipelines.
      Thanks
      Last edited by Nico55; 12-14-2011, 06:38 PM.

      Comment


      • #4
        RNA-seq SNP-calling without a complete reference

        Hi all,

        I tried also Oases for de novo transcriptome and quite satisfied with the output.
        But now, I notice that how to obtain the SNP position from de novo assembly?
        Can we just rely on the SNP position that was given from variant calls etc: samtools, gigabayes, freebayes or we need to write in house script ?

        In my case, I'm working with diploid plant. Some people said it's easier. But for me it's still a challenge.

        Hope to hear comments from you guys.
        Thanks!

        Comment


        • #5
          Hi shoegame2001,

          Do you figure out the solution for your doubt?
          Currently I'm facing the same problem as well.
          I have a Illumina RNA-seq pair-end read, reference transcriptome.
          However, I have no idea how to get the SNP result from my data set.
          Thanks for any advice.

          Comment


          • #6
            As far as I can tell, there is no software designed for SNP-calling in RNA-seq data in the absence of a reference genome. Aligning reads back to a de novo assembled transcriptome and then filtering based on the proportion of reads supporting the alternative allele in called heterozygotes as well as deviation from Hardy-Weinberg results in a more reliable SNP set, but I am afraid there are still false positives that slip through.

            Comment


            • #7
              Hi, friends,

              You may try my program: EBARDenovo for RNA-Seq.
              EBARDenovo now can output SNP locations in the comtigs with the parameter (-P)
              Please check:
              Download EBARDenovo for free. Highly-accurate de novo assembler of paired-end RNA-Seq. A highly-accurate search-based de novo assembler of paired-end RNA-Seq for advance transcriptomic study.


              It's a 64-bits Windows command with .Net.
              You can run it on a Windows PC with 16G RAM for 30~40G fastq RNA-Seq data.
              In our experiments, EBARDenovo is more accurate than Trinity and Oases.

              Hsueh-Ting Chu

              Originally posted by shoegame2001 View Post
              As far as I can tell, there is no software designed for SNP-calling in RNA-seq data in the absence of a reference genome. Aligning reads back to a de novo assembled transcriptome and then filtering based on the proportion of reads supporting the alternative allele in called heterozygotes as well as deviation from Hardy-Weinberg results in a more reliable SNP set, but I am afraid there are still false positives that slip through.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X