Seqanswers Leaderboard Ad

**GenoMax** · 10-06-2016, 09:19 AM

I have moved your question to a new thread to make it visible.

Solution for your problem can come from BBTools themselves. There is a program called bbsplit.sh that you can use with your original data (R1/R2 files). Provide this program with rDNA repeat sequence (provided one is available for your genome). BBsplit can then separate any reads that align to this in one file where as the rest will go to a different file. BBMap is also splice aware so you could use it to align your RNAseq data (should perform better than tophat).

You can use reformat.sh from BBMap to see if the reads are correctly interleaved

Code:

$ reformat.sh in=reads.fastq verifypairing

then optionally de-interleave them

Code:

$ reformat.sh in=reads.fastq out1=r1.fastq out2=r2.fastq

**steepale** · 10-06-2016, 11:16 AM

Here's some additional advice which might add to the convo.

Also, thanks GenoMax, I've located Chicken-specific rDNA clusters and am lifting them over to the correct genome build.

Assuming the reads stayed in the same order but sortmerna just removed some of them, you can use repair.sh like this:

repair.sh in=017798-1_1_AGTGAG_L001_norRNA_001_100K.fastq out1=r1.fq out2=r2.fq outs=single.fq fint
"
If the reads were reordered you'd need the "repair" flag instead of "fint" but they probably were not. The "repair" flag will always work, it just uses more memory than "fint".

However, you can avoid this problem in the first place if you use BBDuk for kmer-matching to remove rRNAs, if you have the ribosomal sequence, since BBDuk will keep pairs together:

bbduk.sh in=interleaved.fq out1=filtered1.fq out2=filtered2.fq outm1=rrna1.fq out2=rrna2.fq ref=ribosomes.fa k=31

You can also use a bulk set of ribosomal sequences like Silva, but using the species' specific ribosomal sequences is much more precise."

**GenoMax** · 10-06-2016, 11:53 AM

I would not bother doing any liftover since the sequence of rDNA is unlikely to change between builds.

Use one copy of the full rDNA repeat (don't bother with multiple copies since those are just tandem repeats in most organisms) with whichever tool (bbduk or bbsplit) that you choose to use.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Additional question regarding BBtools

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News