Hello,
I recently got data back from a single-end RNAseq run. After looking at the fastQC report, there were many overrepresented sequences in my samples. I BLASTed these sequences and all sequences were identified as rRNA. I would like to remove these rRNA sequences from my samples prior to beginning any analysis.
In order to do so, I tried using Trimmomatic and used a custom .fa file containing the overrepresented sequences. This would theoretically remove all of those sequences. However, this was not successful in removing the rRNA sequences from my samples. I am not sure if it is an issue with my custom "adapter" .fa file or an issue with my code. Below is an excerpt of the .fa file I created containing the overrepresented sequences I'd like removed (there are about 100 sequences in the actual file):
>seq
GCCGACATCGCCGCAGACCCCTGACGCCTTTGACGTGGGCCGATCCCCGC
>seq
CCGACATCGCCGCAGACCCCTGACGCCTTTGACGTGGGCCGATCCCCGCC
>seq
GGCGAAGGTGGCTCGCGGCTCCGGCCGTGAGCTTTACAGCGCCCCCTCGC
>seq
GGGACGGCCGCTCGGTGCGGGAGGATCCCCTCGTGGGACCTCTCCCCGGC
Also, here is the command line I tried to use:
java -jar /opt/linux/centos/7.x/x86_64/pkgs/trimmomatic/0.33/bin/trimmomatic.jar SE -phred33 ~/bigdata/mahi_mucus/rawreads/966/C1.fastq C1_trialtrim.fastq.gz ILLUMINACLIP:~/bigdata/mahi_mucus/overrepseq/ca1topoverrep.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Has anyone had any success using Trimmomatic to remove custom overrepresented sequences? Any tips would be greatly appreciated. I am open to other methods as well.
P.S. I tried using cutadapt which did not work either.
Let me know if there is any other information that would be helpful. Thanks everyone!
I recently got data back from a single-end RNAseq run. After looking at the fastQC report, there were many overrepresented sequences in my samples. I BLASTed these sequences and all sequences were identified as rRNA. I would like to remove these rRNA sequences from my samples prior to beginning any analysis.
In order to do so, I tried using Trimmomatic and used a custom .fa file containing the overrepresented sequences. This would theoretically remove all of those sequences. However, this was not successful in removing the rRNA sequences from my samples. I am not sure if it is an issue with my custom "adapter" .fa file or an issue with my code. Below is an excerpt of the .fa file I created containing the overrepresented sequences I'd like removed (there are about 100 sequences in the actual file):
>seq
GCCGACATCGCCGCAGACCCCTGACGCCTTTGACGTGGGCCGATCCCCGC
>seq
CCGACATCGCCGCAGACCCCTGACGCCTTTGACGTGGGCCGATCCCCGCC
>seq
GGCGAAGGTGGCTCGCGGCTCCGGCCGTGAGCTTTACAGCGCCCCCTCGC
>seq
GGGACGGCCGCTCGGTGCGGGAGGATCCCCTCGTGGGACCTCTCCCCGGC
Also, here is the command line I tried to use:
java -jar /opt/linux/centos/7.x/x86_64/pkgs/trimmomatic/0.33/bin/trimmomatic.jar SE -phred33 ~/bigdata/mahi_mucus/rawreads/966/C1.fastq C1_trialtrim.fastq.gz ILLUMINACLIP:~/bigdata/mahi_mucus/overrepseq/ca1topoverrep.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Has anyone had any success using Trimmomatic to remove custom overrepresented sequences? Any tips would be greatly appreciated. I am open to other methods as well.
P.S. I tried using cutadapt which did not work either.
Let me know if there is any other information that would be helpful. Thanks everyone!
Comment