I am little bit confused with the smallRNA analysis pipeline used for analyzing SOLiD results.
Here is what I have read everywhere.
1. Trim the adaptors and convert from csfasta to csfastq/fastq
2. Align them to the genome
3. Match them with miRBase to get the number of counts per miRNA sequence.
My question is Why do you need to align to the reference genome. Why can't we just find the unique sequence reads and then blast them to miRBase to get the counts.
The unaligned reads can then be blasted to the genome to discover any new miRNA.
I am really new to this and cannot seem to find a reasonable explanation for aligning first to the reference genome.
If someone can explain this to me or suggest any paper, that will be great!
Thank you
Here is what I have read everywhere.
1. Trim the adaptors and convert from csfasta to csfastq/fastq
2. Align them to the genome
3. Match them with miRBase to get the number of counts per miRNA sequence.
My question is Why do you need to align to the reference genome. Why can't we just find the unique sequence reads and then blast them to miRBase to get the counts.
The unaligned reads can then be blasted to the genome to discover any new miRNA.
I am really new to this and cannot seem to find a reasonable explanation for aligning first to the reference genome.
If someone can explain this to me or suggest any paper, that will be great!
Thank you
Comment