Hi,
I would like to filter my small RNA-seq data vs. tRNA/rRNAs. However, after I did the filtering, I still end up with reads there when I count.
What I did is the following:
1. Download the tRNA/rRNA subset from UCSC by selecting repClass = 'tRNA' and 'rRNA'
2. Create a FASTA file from that using bedtools getfasta
3. Map my FASTQ files to a bowtie index built from those FASTA files using less strict criteria than I use for my actual mapping (-v 3, ie. 3 mismatches allowed), write unmapped reads to a new FASTQ via --un
4. Map those unmapped reads to the genome using more strict criteria (-v 0)
5. Count reads on the tRNA/rRNA bed files that I originally downloaded in R using summarizeOverlaps with mode='IntersectionStrict'
Albeit few, I still end up with reads that are being counted there. I wonder how is that possible? What's a possible reason for that?
Any help's greatly appreciated
I would like to filter my small RNA-seq data vs. tRNA/rRNAs. However, after I did the filtering, I still end up with reads there when I count.
What I did is the following:
1. Download the tRNA/rRNA subset from UCSC by selecting repClass = 'tRNA' and 'rRNA'
2. Create a FASTA file from that using bedtools getfasta
3. Map my FASTQ files to a bowtie index built from those FASTA files using less strict criteria than I use for my actual mapping (-v 3, ie. 3 mismatches allowed), write unmapped reads to a new FASTQ via --un
4. Map those unmapped reads to the genome using more strict criteria (-v 0)
5. Count reads on the tRNA/rRNA bed files that I originally downloaded in R using summarizeOverlaps with mode='IntersectionStrict'
Albeit few, I still end up with reads that are being counted there. I wonder how is that possible? What's a possible reason for that?
Any help's greatly appreciated
Comment