Dear SEQanswers community,
I am working on Illumina RNA-Seq data from human samples. While it is quite clear to me how to map the reads against the human reference under the consideration of splice junctions with tools such as Tophat or MapSplice, I have an additional requirement that makes things more inteersting.
In my data, the sampled tissue is infected by one or many different viral species that may induce the expression of viral genes. I now would like to identify these expressed viral transcripts together with the human transcripts. The idea is that viral transcripts can be detected by finding all reads that cannot be mapped to the human reference and subsequently re-mapping these reads against a collection of known viral genomes (perhaps with BLAST or even Smith-Waterman since viral genomes tend to be short and highly variant).
However, this strikes me as inelegant and it also requires post-processing of the main mapping to identify unaligned reads. Therefore my question is: are there is any mapping tools geared towards RNA-Seq that allows multiple reference genomes and map each read to the most likely genome while also considering splice junctions? This would be a special case of RNA-Seq for metagenomics data with one large mammal reference genome and many very small reference genomes. Bonus points if the mapper can deal with a higher number of alignment errors (viral transcripts can be quite variant). Thanks a bunch for any leads.
I am working on Illumina RNA-Seq data from human samples. While it is quite clear to me how to map the reads against the human reference under the consideration of splice junctions with tools such as Tophat or MapSplice, I have an additional requirement that makes things more inteersting.
In my data, the sampled tissue is infected by one or many different viral species that may induce the expression of viral genes. I now would like to identify these expressed viral transcripts together with the human transcripts. The idea is that viral transcripts can be detected by finding all reads that cannot be mapped to the human reference and subsequently re-mapping these reads against a collection of known viral genomes (perhaps with BLAST or even Smith-Waterman since viral genomes tend to be short and highly variant).
However, this strikes me as inelegant and it also requires post-processing of the main mapping to identify unaligned reads. Therefore my question is: are there is any mapping tools geared towards RNA-Seq that allows multiple reference genomes and map each read to the most likely genome while also considering splice junctions? This would be a special case of RNA-Seq for metagenomics data with one large mammal reference genome and many very small reference genomes. Bonus points if the mapper can deal with a higher number of alignment errors (viral transcripts can be quite variant). Thanks a bunch for any leads.
Comment