SEQanswers (
-   Bioinformatics (
-   -   How can I distinguish assembly error from true splice or isoforms in RNA-seq? (

mlacencio 02-03-2015 03:06 PM

How can I distinguish assembly error from true splice or isoforms in RNA-seq?
1 Attachment(s)
Hii everyone!

I have received thousands of transcripts generated by a non-stranded RNA-seq and I have just annotated them (I used a in-house bash script to extract from blast results xml the most informative and frequent annotations from blast hits). However, I have found that many transcripts have the same annotation, e.g. two transcripts have been annotated as 1-aminocyclopropane-1-carboxylate oxidase and so on.

Please find attached a file containing some examples of blast alignments of these transcritps. Would you consider these cases as a result of wrong assembly? How can I distinguish assembly error from true splice or isoforms in RNA-seq? Moreover, can I consider two highly similar reverse complement transcripts as only one single transcripts since this is a non-stranded RNA-seq?

Best regards,


Brian Bushnell 02-03-2015 03:53 PM

Many organisms have multiple copies of genes that may be slightly different. I would worry about classifying two sequences with only 89% or 95% identity as the same transcript. Assembly error rates should be far below that.

But it may depend on the organism... is it haploid or diploid/polyploid?

GenoMax 02-03-2015 05:16 PM

Examples you included in your file are good hits over most entire length of those contigs. What method did you use to assemble the contigs? What was the average depth that led to that consensus sequence? Since this is non-stranded library you do have a 50-50 chance of sequencing either strand.

Brian Bushnell 02-03-2015 05:28 PM

I may have misinterpreted something... are the Blast alignments you posted of the transcriptome to itself, or to some other database?

mlacencio 02-03-2015 05:50 PM

Hi Brian and GenoMax!

The Blast alignments I posted are of the transcriptome to itself. Anyway, transcripts are from a tetraploid organism.

GenoMax, the sequences have been assembled by someone else. I will have to check with him how he has assembled the reads.



Brian Bushnell 02-03-2015 05:58 PM

If it's tetraploid, then depending on the organism's degree of heterozygosity, those may very well be the same transcript from different ploidies (which would not really be considered misassemblies)... or they could be two copies of the same gene with different genomic coordinates. I don't know that there's an easy way to tell. If possible, I'd try to inbreed the organism as much as possible before doing assemblies.

All times are GMT -8. The time now is 09:41 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.