I've been working with smRNA data for the last little while trying to come up with a pipeline to replace the AB smRNA tools that we were previously using for SOLiD data. I'm curious to know what others are using. I've used seqanswers search and found a number of others still running AB's smRNA pipeline. It seems to have lost some steam over the last while and has not been updated in over a year.
The key points for a pipeline were: adjusting for adapters (by trimming reads), identifying unique alignments, ability to reads that are potentially longer than the reference sequences, speed and accuracy of alignments.
I have found that with SHRiMP's support for seed and extension, I did not have to remove adapters from the reads which reduces a step in my pipeline. I'm currently aligning my data to the mature smRNA sequences as obtained from miRBase.org (v16) as we're currently not interested in novel miR's. This has yielded some interesting results. The pipeline is as follows:
1) SAET colourspace reads to increase accuracy (optional step)
2) align reads to the mature miRBase v16 sequences for human in SAM format
3) convert to BAM using Picard
4) sort BAM using Picard
5) metrics (including GC, quality distribution, etc...)
6) count miR's identified in the aligned data
SHRiMP has always had decent sensitivity, with the latest version it has dramatically increased it's speed as well.
Can anyone else shed some light on how they analyze smRNA data?
PS: If anyone wants some of my Perl code, you're more than welcome to it. Please PM me.
Richard
The key points for a pipeline were: adjusting for adapters (by trimming reads), identifying unique alignments, ability to reads that are potentially longer than the reference sequences, speed and accuracy of alignments.
I have found that with SHRiMP's support for seed and extension, I did not have to remove adapters from the reads which reduces a step in my pipeline. I'm currently aligning my data to the mature smRNA sequences as obtained from miRBase.org (v16) as we're currently not interested in novel miR's. This has yielded some interesting results. The pipeline is as follows:
1) SAET colourspace reads to increase accuracy (optional step)
2) align reads to the mature miRBase v16 sequences for human in SAM format
3) convert to BAM using Picard
4) sort BAM using Picard
5) metrics (including GC, quality distribution, etc...)
6) count miR's identified in the aligned data
SHRiMP has always had decent sensitivity, with the latest version it has dramatically increased it's speed as well.
Can anyone else shed some light on how they analyze smRNA data?
PS: If anyone wants some of my Perl code, you're more than welcome to it. Please PM me.
Richard
Comment