Hi all,
We are doing some short RNA sequencing (human) at are having an internal debate on what the best approach for mapping the reads would be. We seem to be split along three lines, here are the options:
1) Map the reads to the miRNA sequences instead of the whole genome. My thought is that this is not the best approach as you are going to be biasing your reads to those regions, when in fact a specific read can some reads may potentially map to some other part of the genome "better". Also, there are other short RNAs than miRNAs, although they are probably the most abundant. Others in our group disagree and feel this is the best method to look at the miRNAs.
2) map the reads uniquely to the whole genome (i.e. 1 read, 1 location). In this way you are only dealing with reads that are going to a single place and would make doing some differential expression easier. This is my preferred option.
3) map the reads non-uniquely to the whole genome (i.e. 1 read may go to multiple places). Since the human genome is very repetative, this approach would account for this fact. Although my thought is that since 1 read would be in many places, it would make doing some differential expression rather difficult as 1 read would be counted multiple times making some sort of normalization rather difficult. Also, since a read can go to many places, potential 100's or more, you are assuming that each 1 of those locations is equally expressed, which is most likely not the case, and you will not know the exact location in the genome of your molecule of interest. In this case you may be dealing with specific sequences that are differentially expressed rather than locations or some annotated element.
Based on these options, does anyone have any thoughts or experiences. We are trying all 3 approaches and seeing which works best. Although "best" can be rather subjective as this may be getting results which you want to see.
Thanks
We are doing some short RNA sequencing (human) at are having an internal debate on what the best approach for mapping the reads would be. We seem to be split along three lines, here are the options:
1) Map the reads to the miRNA sequences instead of the whole genome. My thought is that this is not the best approach as you are going to be biasing your reads to those regions, when in fact a specific read can some reads may potentially map to some other part of the genome "better". Also, there are other short RNAs than miRNAs, although they are probably the most abundant. Others in our group disagree and feel this is the best method to look at the miRNAs.
2) map the reads uniquely to the whole genome (i.e. 1 read, 1 location). In this way you are only dealing with reads that are going to a single place and would make doing some differential expression easier. This is my preferred option.
3) map the reads non-uniquely to the whole genome (i.e. 1 read may go to multiple places). Since the human genome is very repetative, this approach would account for this fact. Although my thought is that since 1 read would be in many places, it would make doing some differential expression rather difficult as 1 read would be counted multiple times making some sort of normalization rather difficult. Also, since a read can go to many places, potential 100's or more, you are assuming that each 1 of those locations is equally expressed, which is most likely not the case, and you will not know the exact location in the genome of your molecule of interest. In this case you may be dealing with specific sequences that are differentially expressed rather than locations or some annotated element.
Based on these options, does anyone have any thoughts or experiences. We are trying all 3 approaches and seeing which works best. Although "best" can be rather subjective as this may be getting results which you want to see.
Thanks
Comment