SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Overrepresented sequences in Genomic DNA sequence data from Illumina akashrestha Complete Genomics 6 11-18-2015 01:14 PM
How to keep the raw .fastq.gz files for RNASeq data shirley0818 RNA Sequencing 5 03-25-2014 10:15 AM
Tool to identify 16s rRNA fragments in raw data Mithril Metagenomics 3 09-23-2012 03:05 AM
Raw readcounts for RNAseq data using CountOverlaps function in IRanges biofreak General 1 06-28-2011 02:32 PM
removing adapters sequences from ChIPseq data? johannes.rainer Illumina/Solexa 0 02-05-2010 07:50 AM

Reply
 
Thread Tools
Old 11-28-2018, 10:48 AM   #1
nandr009
Junior Member
 
Location: California

Join Date: Nov 2018
Posts: 1
Talking Removing overrepresented sequences (rRNA contaimation) from RNAseq raw data

Hello,

I recently got data back from a single-end RNAseq run. After looking at the fastQC report, there were many overrepresented sequences in my samples. I BLASTed these sequences and all sequences were identified as rRNA. I would like to remove these rRNA sequences from my samples prior to beginning any analysis.

In order to do so, I tried using Trimmomatic and used a custom .fa file containing the overrepresented sequences. This would theoretically remove all of those sequences. However, this was not successful in removing the rRNA sequences from my samples. I am not sure if it is an issue with my custom "adapter" .fa file or an issue with my code. Below is an excerpt of the .fa file I created containing the overrepresented sequences I'd like removed (there are about 100 sequences in the actual file):

>seq
GCCGACATCGCCGCAGACCCCTGACGCCTTTGACGTGGGCCGATCCCCGC
>seq
CCGACATCGCCGCAGACCCCTGACGCCTTTGACGTGGGCCGATCCCCGCC
>seq
GGCGAAGGTGGCTCGCGGCTCCGGCCGTGAGCTTTACAGCGCCCCCTCGC
>seq
GGGACGGCCGCTCGGTGCGGGAGGATCCCCTCGTGGGACCTCTCCCCGGC



Also, here is the command line I tried to use:

java -jar /opt/linux/centos/7.x/x86_64/pkgs/trimmomatic/0.33/bin/trimmomatic.jar SE -phred33 ~/bigdata/mahi_mucus/rawreads/966/C1.fastq C1_trialtrim.fastq.gz ILLUMINACLIP:~/bigdata/mahi_mucus/overrepseq/ca1topoverrep.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36


Has anyone had any success using Trimmomatic to remove custom overrepresented sequences? Any tips would be greatly appreciated. I am open to other methods as well.
P.S. I tried using cutadapt which did not work either.

Let me know if there is any other information that would be helpful. Thanks everyone!
nandr009 is offline   Reply With Quote
Old 11-30-2018, 06:54 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,813
Default

I suggest that you use bbsplit.sh from BBMap suite along with a copy of rDNA repeat (if available) for your organism to bin reads into rRNA and rest pools.

There are other tools like sortMeRNA that do something similar.
GenoMax is offline   Reply With Quote
Old Today, 10:09 AM   #3
heyyou
Junior Member
 
Location: tabriz

Join Date: Dec 2018
Posts: 1
Default

I'm looking forward to hear about the solution you used to solve this problem
خرید بذر از بذر سرا
heyyou is offline   Reply With Quote
Reply

Tags
overrepresented sequences, rna sequencing, rrna contamination

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:56 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO