Hello all,
I wanted to make you all aware of a suite of tools that are useful to anyone who does work with RNA-Seq data or transcriptomics and gene identification. We’ve developed three tools – HashMatch, Supersplat, and TAU – which handle transcriptomic data.
HashMatch is a simple perl script which aligns reads against an indexed reference in a perfect-match only fashion, at approximately 1 million reads per minute.
Supersplat, implemented in C++, empirically identifies all locations in a genomic reference sequence which indicate a potential intron. Supersplat uses the genomic reference and RNA-seq high-throughput sequencing datasets to empirically identify these potential splice junctions without the need to construct coverage islands, and does not
rely or base its predictions on canonical splice junction donor/acceptor dinucleotides. Supersplat maps approximately 11.4 million short reads per hour.
TAU is an algorithm designed to use transcriptomic data of any kind to empirically define transcriptional units within the target reference genome. Written in Perl, the Transcription-unit Assembly Utility can utilize any transcriptomic evidence which can be aligned to the target reference, and is designed to leverage the extreme depth of RNA-Seq data. TAU is capable of using alignments generated by many programs, and because it relies only on alignments and not sequence properties, uses short-read, long-read, and full-length EST data to build gene models. Through the use of spliced alignment data, TAU can annotate any form of alternative splicing, and through basic prediction algorithms, is able to predict the directionality of novel genes with good accuracy.
All three tools are available here: http://mocklerlab-tools.cgrb.oregonstate.edu/ , with manuscripts for TAU and Supersplat available for more detail on their design and performance.
I wanted to make you all aware of a suite of tools that are useful to anyone who does work with RNA-Seq data or transcriptomics and gene identification. We’ve developed three tools – HashMatch, Supersplat, and TAU – which handle transcriptomic data.
HashMatch is a simple perl script which aligns reads against an indexed reference in a perfect-match only fashion, at approximately 1 million reads per minute.
Supersplat, implemented in C++, empirically identifies all locations in a genomic reference sequence which indicate a potential intron. Supersplat uses the genomic reference and RNA-seq high-throughput sequencing datasets to empirically identify these potential splice junctions without the need to construct coverage islands, and does not
rely or base its predictions on canonical splice junction donor/acceptor dinucleotides. Supersplat maps approximately 11.4 million short reads per hour.
TAU is an algorithm designed to use transcriptomic data of any kind to empirically define transcriptional units within the target reference genome. Written in Perl, the Transcription-unit Assembly Utility can utilize any transcriptomic evidence which can be aligned to the target reference, and is designed to leverage the extreme depth of RNA-Seq data. TAU is capable of using alignments generated by many programs, and because it relies only on alignments and not sequence properties, uses short-read, long-read, and full-length EST data to build gene models. Through the use of spliced alignment data, TAU can annotate any form of alternative splicing, and through basic prediction algorithms, is able to predict the directionality of novel genes with good accuracy.
All three tools are available here: http://mocklerlab-tools.cgrb.oregonstate.edu/ , with manuscripts for TAU and Supersplat available for more detail on their design and performance.