![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
RNA-seq SNP calling softwore | huangjun | RNA Sequencing | 8 | 07-23-2013 12:51 AM |
De novo SNP calling in absence of complete reference assembly | fcr | De novo discovery | 15 | 09-21-2012 03:34 AM |
Editing fasta , reference base in snp calling samtools | moriah | Bioinformatics | 2 | 08-10-2011 12:11 AM |
SNP calling from a reference sequence | blackrabite | Genomic Resequencing | 2 | 05-21-2011 09:48 PM |
Hierarchical reference-free SNP calling | Marius | Bioinformatics | 1 | 12-27-2010 09:38 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: California Join Date: Dec 2010
Posts: 21
|
![]()
I am working on a project that seeks to call SNPs for a non-model organism with no existing reference genome or transcriptome using multiplexed Illumina RNA-seq data.
I used Trinity to assemble a partial 'reference' transcriptome of the most highly expressed transcripts for which we had sufficient coverage, as well as many fragments of lower-expressed transcripts. Then I used BWA to map all data for multiple individuals back to that reference, and finally used GATK to call SNPs. However, I am running into an issue where reads derived from paralogous genes or a multigene family are mapping back to the same reference contig, creating false SNPs in divergent positions. My evidence of this is that in general one 'allele' (actually a slightly divergent gene) is supported by significantly fewer than half of the reads for a given individual that is called a heterozygote. These 'SNPs' are also generally observed across several individuals, leading me to believe that these are not sequencing/library prep errors. I think that I will be able to identify these cases with some statistic, but I am wondering if there is a good way to modify the corresponding SAM files to remove the mis-mapped reads, then re-genotype. Has anyone else encountered similar issues, and if so how did you deal with it? Last edited by shoegame2001; 12-06-2011 at 03:49 PM. |
![]() |
![]() |
![]() |
#2 | |
Junior Member
Location: Taiwan Join Date: Dec 2011
Posts: 5
|
![]()
Hi, friend,
You may try my program: EBARDenovo for RNA-Seq. https://sourceforge.net/projects/ebardenovo It's a 64-bits Windows command with .Net. EBARDenovo can assembly lower-expressed transcripts even their coverage depths are very low (e.g., 1.5). Frank H.T. Chu from Taiwan Quote:
|
|
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Wa. Join Date: Dec 2011
Posts: 7
|
![]()
I’m in the same boat my friend. Right now I am using oases to assemble; after trialing several assembly programs I found it did the best work with my transcriptomes. I then implemented SOAPaligner in conjunction with SOAPsnp. This trial is still underway I will update you as soon as I compile my results. I would love to hear if you have made any progress using different programs or pipelines.
Thanks Last edited by Nico55; 12-14-2011 at 06:38 PM. |
![]() |
![]() |
![]() |
#4 |
Member
Location: montreal Join Date: Jan 2011
Posts: 31
|
![]()
Hi all,
I tried also Oases for de novo transcriptome and quite satisfied with the output. But now, I notice that how to obtain the SNP position from de novo assembly? Can we just rely on the SNP position that was given from variant calls etc: samtools, gigabayes, freebayes or we need to write in house script ? In my case, I'm working with diploid plant. Some people said it's easier. But for me it's still a challenge. Hope to hear comments from you guys. Thanks! |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: China Join Date: Sep 2009
Posts: 199
|
![]()
Hi shoegame2001,
Do you figure out the solution for your doubt? Currently I'm facing the same problem as well. I have a Illumina RNA-seq pair-end read, reference transcriptome. However, I have no idea how to get the SNP result from my data set. Thanks for any advice. |
![]() |
![]() |
![]() |
#6 |
Member
Location: California Join Date: Dec 2010
Posts: 21
|
![]()
As far as I can tell, there is no software designed for SNP-calling in RNA-seq data in the absence of a reference genome. Aligning reads back to a de novo assembled transcriptome and then filtering based on the proportion of reads supporting the alternative allele in called heterozygotes as well as deviation from Hardy-Weinberg results in a more reliable SNP set, but I am afraid there are still false positives that slip through.
|
![]() |
![]() |
![]() |
#7 | |
Junior Member
Location: Taiwan Join Date: Dec 2011
Posts: 5
|
![]()
Hi, friends,
You may try my program: EBARDenovo for RNA-Seq. EBARDenovo now can output SNP locations in the comtigs with the parameter (-P) Please check: https://sourceforge.net/projects/ebardenovo It's a 64-bits Windows command with .Net. You can run it on a Windows PC with 16G RAM for 30~40G fastq RNA-Seq data. In our experiments, EBARDenovo is more accurate than Trinity and Oases. Hsueh-Ting Chu Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|