Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Overrepresented sequences in Genomic DNA sequence data from Illumina akashrestha MGISEQ (FKA Complete Genomics) 6 11-18-2015 01:14 PM
How to keep the raw .fastq.gz files for RNASeq data shirley0818 RNA Sequencing 5 03-25-2014 10:15 AM
Tool to identify 16s rRNA fragments in raw data Mithril Metagenomics 3 09-23-2012 03:05 AM
Raw readcounts for RNAseq data using CountOverlaps function in IRanges biofreak General 1 06-28-2011 02:32 PM
removing adapters sequences from ChIPseq data? johannes.rainer Illumina/Solexa 0 02-05-2010 07:50 AM

Thread Tools
Old 11-28-2018, 10:48 AM   #1
Junior Member
Location: California

Join Date: Nov 2018
Posts: 2
Talking Removing overrepresented sequences (rRNA contaimation) from RNAseq raw data


I recently got data back from a single-end RNAseq run. After looking at the fastQC report, there were many overrepresented sequences in my samples. I BLASTed these sequences and all sequences were identified as rRNA. I would like to remove these rRNA sequences from my samples prior to beginning any analysis.

In order to do so, I tried using Trimmomatic and used a custom .fa file containing the overrepresented sequences. This would theoretically remove all of those sequences. However, this was not successful in removing the rRNA sequences from my samples. I am not sure if it is an issue with my custom "adapter" .fa file or an issue with my code. Below is an excerpt of the .fa file I created containing the overrepresented sequences I'd like removed (there are about 100 sequences in the actual file):


Also, here is the command line I tried to use:

java -jar /opt/linux/centos/7.x/x86_64/pkgs/trimmomatic/0.33/bin/trimmomatic.jar SE -phred33 ~/bigdata/mahi_mucus/rawreads/966/C1.fastq C1_trialtrim.fastq.gz ILLUMINACLIP:~/bigdata/mahi_mucus/overrepseq/ca1topoverrep.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Has anyone had any success using Trimmomatic to remove custom overrepresented sequences? Any tips would be greatly appreciated. I am open to other methods as well.
P.S. I tried using cutadapt which did not work either.

Let me know if there is any other information that would be helpful. Thanks everyone!
nandr009 is offline   Reply With Quote
Old 11-30-2018, 06:54 AM   #2
Senior Member
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080

I suggest that you use from BBMap suite along with a copy of rDNA repeat (if available) for your organism to bin reads into rRNA and rest pools.

There are other tools like sortMeRNA that do something similar.
GenoMax is offline   Reply With Quote

overrepresented sequences, rna sequencing, rrna contamination

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 07:06 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO