SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Oxford Nanopore -- additional fundraising for technology development gringer General 0 08-11-2014 11:13 PM
Additional MIDs in the HLA kits GraemeFox 454 Pyrosequencing 3 10-04-2012 07:46 AM
Additional accessories for NGS DNAcowboy General 5 05-05-2009 12:39 PM

Reply
 
Thread Tools
Old 10-06-2016, 10:02 AM   #1
steepale
Junior Member
 
Location: Lansing, MI

Join Date: Aug 2013
Posts: 2
Default Additional question regarding BBtools

Hello,
This thread helped me with an issue I'm dealing with only I've become stuck and seek some advice.

I am trying to use BBMap or BBTools to fix what might be either an incorrectly interleaved file or a file that has reads formatted as single-end but really contains a collection of paired and single end reads.

I have interleaved some paired-end RNAseq Illumina reads in order to run them through a program called 'sortmerna' to remove rRNA from my reads. Here is a testformat.sh on the interleaved input file.


Code:
/mnt/home/steepale/Apps/bbmap/testformat.sh ./data/017798-1_1_AGTGAG_L001_interleaved_001_100K.fastq
sanger	fastq	raw	interleaved	125bp

If BBTools can provide this ability (filter out rRNA), I would appreciate any advice. The output file seems to either be incorrectly interleaved or are just in single end format. Here is a testformat.sh of the output file I am interested in.


Code:
/mnt/home/steepale/Apps/bbmap/testformat.sh ./data/017798-1_1_AGTGAG_L001_norRNA_001_100K.fastq
-1	fastq	raw	single-ended

I would ultimately like to the reads from this output file and separate them by their forward and reverse reads into two files; essentially I want to map them with tophat.

Is anyone familiar with BBTools and how to fix such an issue? I've hit a roadblock.

Last edited by steepale; 10-06-2016 at 10:05 AM.
steepale is offline   Reply With Quote
Old 10-06-2016, 10:19 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,993
Default

I have moved your question to a new thread to make it visible.

Solution for your problem can come from BBTools themselves. There is a program called bbsplit.sh that you can use with your original data (R1/R2 files). Provide this program with rDNA repeat sequence (provided one is available for your genome). BBsplit can then separate any reads that align to this in one file where as the rest will go to a different file. BBMap is also splice aware so you could use it to align your RNAseq data (should perform better than tophat).

You can use reformat.sh from BBMap to see if the reads are correctly interleaved

Code:
$ reformat.sh in=reads.fastq verifypairing
then optionally de-interleave them

Code:
$ reformat.sh in=reads.fastq out1=r1.fastq out2=r2.fastq

Last edited by GenoMax; 10-06-2016 at 10:22 AM.
GenoMax is offline   Reply With Quote
Old 10-06-2016, 12:16 PM   #3
steepale
Junior Member
 
Location: Lansing, MI

Join Date: Aug 2013
Posts: 2
Default

Here's some additional advice which might add to the convo.

Also, thanks GenoMax, I've located Chicken-specific rDNA clusters and am lifting them over to the correct genome build.

Assuming the reads stayed in the same order but sortmerna just removed some of them, you can use repair.sh like this:

repair.sh in=017798-1_1_AGTGAG_L001_norRNA_001_100K.fastq out1=r1.fq out2=r2.fq outs=single.fq fint
"
If the reads were reordered you'd need the "repair" flag instead of "fint" but they probably were not. The "repair" flag will always work, it just uses more memory than "fint".

However, you can avoid this problem in the first place if you use BBDuk for kmer-matching to remove rRNAs, if you have the ribosomal sequence, since BBDuk will keep pairs together:

bbduk.sh in=interleaved.fq out1=filtered1.fq out2=filtered2.fq outm1=rrna1.fq out2=rrna2.fq ref=ribosomes.fa k=31

You can also use a bulk set of ribosomal sequences like Silva, but using the species' specific ribosomal sequences is much more precise."
steepale is offline   Reply With Quote
Old 10-06-2016, 12:53 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,993
Default

I would not bother doing any liftover since the sequence of rDNA is unlikely to change between builds.

Use one copy of the full rDNA repeat (don't bother with multiple copies since those are just tandem repeats in most organisms) with whichever tool (bbduk or bbsplit) that you choose to use.
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:52 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO