SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality-, adapter- and RRBS-trimming with Trim Galore! fkrueger Bioinformatics 132 04-18-2017 01:04 AM
Adapter trimming figo1019 RNA Sequencing 1 04-07-2014 10:58 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 04:55 PM
adapter trimming - help a_mt Bioinformatics 6 11-12-2012 07:36 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 12:53 PM

Reply
 
Thread Tools
Old 10-25-2017, 10:51 PM   #281
boulund
Member
 
Location: Sweden

Join Date: Jan 2017
Posts: 18
Default

Hi!

Is it possible to get the various quality histograms for both before and after e.g. trimming with BBDuk in a single run, or do I need to run BBDuk it twice to produce metrics for before and after trimming?
That is; run once without any trimming, just outputting histograms, and then again to trim and output histograms? Or am I missing something? The histograms output by BBDuk normally show metrics after trimming/contaminant removal, right?

By the way, I might mention that I finally tried to assemble my very large background sample using Tadpole, and then align my primary sample to that to remove 'background/contamination' reads. It produced a fairly poor assembly overall, but at least it ran to completion on the 500GB background sample on my memory constrained machine (64GB). The kmer-based approach was just too memory consuming.

Last edited by boulund; 10-25-2017 at 10:54 PM.
boulund is offline   Reply With Quote
Old 11-20-2017, 09:38 AM   #282
mcmc
Member
 
Location: Midwest, USA

Join Date: Jan 2016
Posts: 14
Default Seal not printing outu file

Hi - I'm running seal to map reads to ref genomes. this is the command I ran:

Code:
seal in="${samplename}_nonribo.fq.gz" ref="${all4genomes}" pattern="${samplename}_out_%.fq.gz" outu="${samplename}_unmapped.fq.gz" ambig=all stats="${samplename}_mapstats.txt"
It's outputting the matched reads per reference sequence using the 'pattern', but it's not making the 'outu' file of unmapped reads (and there should be some unmapped reads, according to my stderr output).

Is there another trick to making this file?

Thanks,
MC
mcmc is offline   Reply With Quote
Old 01-19-2018, 11:06 AM   #283
catagui
Junior Member
 
Location: Florida, USA

Join Date: Mar 2017
Posts: 3
Default

Hi I would like to use bbduk to filter the reads that map to a genome. I have 48 pair samples and want to make sure I understand how to input the file.
My sample are are Pair end and are labeled as F and R (plus _1 and _2).
Should I put them all after in= ? or use in2= for the reverse? Does interleave means that I put each pair together?
eg. in= S1_F_paired_1.fq,S1_R_paired_2.fq,S10_F_paired_1.fq,S10_R_paired_2.fq...

I have them as two separate lines at the moment:
S1_F_paired_1.fq,S10_F_paired_1.fq,S11_F_paired_1.fq,S12_F_paired_1.fq,S13_F_paired_1.fq..
S1_R_paired_2.fq,S10_R_paired_2.fq,S11_R_paired_2.fq,S12_R_paired_2.fq,S13_R_paired_2.fq..

Thanks,

Catalina
catagui is offline   Reply With Quote
Old 02-14-2018, 01:28 AM   #284
DrYak
Member
 
Location: South Africa

Join Date: Sep 2013
Posts: 12
Question Entropy filtering: Java ArrayIndexOutOfBoundsException

Hi,

I'm having an issue while trying to filter a fastq file using an entropy filter. The library protocol used ribozero so there are a lot of poly T sequences that I would like to remove.

I have successfully removed adapter and phiX contamination from the file but when I try the entropy filter (with various -Xmx settings or none) I get a java array error.

There are ~137 million 100 bp unpaired reads in the fastq file and they have been filtered for adapters, low quality and phiX (using BBDuk).

I'm working on a node with 24 cores and 128 GiB of RAM running CentOS Linux release 7.3.1611 and java version "1.7.0_131".

Command and error messages follow:

$ bbduk.sh -Xmx8g in=seq.fq out=seq_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1
java -Djava.library.path=/apps/chpc/bio/bbmap/jni/ -ea -Xmx8g -Xms8g -cp /apps/chpc/bio/bbmap/current/ jgi.BBDukF -Xmx8g in=fish-coral_1_filtered_clean.fq out=fish-coral_1_filtered_clean_0-1-entrop-filtered.fq outm=low_complexity-0-1.fq entropy=0.1
Executing jgi.BBDukF [-Xmx8g, in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1]
Version 37.90 [-Xmx8g, in=seq.fq, out=seq_0-1-entrop-filtered.fq, outm=low_complexity-0-1.fq, entropy=0.1]

Initial:
Memory: max=8232m, free=8061m, used=171m

Input is being processed as unpaired
Started output streams: 0.038 seconds.
Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 39
at structures.EntropyTracker.averageEntropy(EntropyTracker.java:302)
at structures.EntropyTracker.passes(EntropyTracker.java:348)
at jgi.BBDukF$ProcessThread.run(BBDukF.java:2583)
Exception in thread "Thread-28" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-25" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-8" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-17" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-24" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-13" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-15" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-23" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-11" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-14" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-7" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-10" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-29" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-20" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-22" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-16" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-26" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-19" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-12" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-18" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-27" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-21" java.lang.ArrayIndexOutOfBoundsException
Processing time: 0.192 seconds.

Input: 34841 reads 3436691 bases.
Low entropy discards: 2157 reads (6.19%) 215168 bases (6.26%)
Total Removed: 2181 reads (6.26%) 216121 bases (6.29%)
Result: 32660 reads (93.74%) 3220570 bases (93.71%)

Time: 0.255 seconds.
Reads Processed: 34841 136.55k reads/sec
Bases Processed: 3436k 13.47m bases/sec


Any suggestions?

Thanks.
Dave
DrYak is offline   Reply With Quote
Old 02-25-2018, 08:21 PM   #285
Amazonmatt
Junior Member
 
Location: Pennsylvania

Join Date: Nov 2015
Posts: 1
Default

Hi. Anybody got any idea why bbduk is only reading (and trimming) 364 reads from my file.

The HPC is using BBDUK 36.32.

Here's my code:

bbduk.sh in1=Vireo1_R1_001.fastq.gz in2=Vireo1_R2_001.fastq.gz out1=Vireo1_R1_trimmed.fastq.gz out2=Vireo1_R2_trimmed.fastq.gz ref=/opt/bbmap/36.32/bbmap/resources/adapters.fa threads=12 k=19 mink=5 hdist=1 ktrim=r qtrim=r minlength=36 trimq=14

I checked the header of both the input and (short) outputfile. They both appear to be formatted correctly, so there isn't a file corruption issue that I can detect. Also, zcat shows a reasonable number of reads for the input file (about 26 million reads). And the input file size is correct.

I'm stumped.
Amazonmatt is offline   Reply With Quote
Old 02-26-2018, 01:15 PM   #286
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,759
Default

That is a pretty old version of BBMap. I suggest that you start by upgrading to the latest first.

It seems unlikely but are the rest of the reads failing other limits you have set?
GenoMax is offline   Reply With Quote
Old 02-26-2018, 08:10 PM   #287
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 442
Default

When you say you checked the header do you mean you looked at the the 364th and 365th read? What happens if you take some other random set of reads from the input and use that? Like
zcat Vireo1_R1_001.fastq.gz | head -2000 | tail -1000 > test_R1.fastq
(and for R2). What happens if you just do read1 (are they out of synch?).
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 03-08-2018, 02:42 PM   #288
cb841011
Junior Member
 
Location: State College

Join Date: Mar 2018
Posts: 2
Arrow Can genome be used to filter RNA-seq reads?

Hi!

I have plant RNA-seq reads that are contaminated with fungal reads. I have access to a draft genome of the fungus. Is it possible to use BBduk or Seal to filter fungal RNA reads away from plant RNA reads using the DNA sequence of the contaminant?

Thanks,
Chris
cb841011 is offline   Reply With Quote
Old 03-08-2018, 04:00 PM   #289
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,759
Default

You should use bbsplit for this purpose. Provide the genome for your fungus alongside the plant and bin the reads.
GenoMax is offline   Reply With Quote
Old 03-09-2018, 06:23 AM   #290
cb841011
Junior Member
 
Location: State College

Join Date: Mar 2018
Posts: 2
Default

Hey GenoMax,

Thank you for the reply! I had not heard of bbsplit. Unfortunately, I dont have genomic sequence of the plant. Only the fungus.

How does this change my options?

Thanks!
cb841011 is offline   Reply With Quote
Old 03-09-2018, 08:13 AM   #291
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,759
Default

@cb841011: Since this was also cross-posted and discussed on Biostars I will add a reference to the thread here: https://www.biostars.org/p/302864/

You could always use a closely related grass genome (if one is available). There would be some loss of real data (or gain of false positives) but since you don't have the genome of your grass it is about the best you can do.

Since you have
Quote:
Draft genome of the fungus
RNA-seq reads from non-infected grass
RNA-seq reads from infected grass (contains grass and fungal transcripts)
RNA-seq reads from the fungus growing in culture
You could assemble transcriptomes (using Trinity) from non-infected grass and then fungus. Use those to see if you are able to find any new transcripts showing up in the infected grass.
GenoMax is offline   Reply With Quote
Old 03-19-2018, 03:10 AM   #292
nicorascovan
Junior Member
 
Location: Rosario, Argentina

Join Date: Feb 2010
Posts: 9
Default Error

Hello,

I am trying to run bbduk on my server with the following command:
~/soft/bbmap/bbduk.sh in=myfile.fastq.gz out=myfile_filtered.fq outm=myfile_low_complexity.fq entropy=0.5

and I get this error:

Exception in thread "main" java.lang.NoClassDefFoundError: java.util.concurrent.ThreadLocalRandom
at java.lang.J9VMInternals.verifyImpl(Native Method)
at java.lang.J9VMInternals.verify(J9VMInternals.java:72)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:134)
at jgi.BBDukF.<clinit>(BBDukF.java:4267)
at java.lang.J9VMInternals.initializeImpl(Native Method)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:200)
Caused by: java.lang.ClassNotFoundException: java.util.concurrent.ThreadLocalRandom
at java.net.URLClassLoader.findClass(URLClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:660)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:346)
at java.lang.ClassLoader.loadClass(ClassLoader.java:626)
... 6 more
Could not find the main class: jgi.BBDukF. Program will exit.

Any idea about why this could be happening?


A second question: If I want to change the WAYS=7 to WAYS=1 in order to be able to run bbduk on my laptop, how should I do to change it and re-compile, as suggested in the README file?

Thanks,

Nicolas.
nicorascovan is offline   Reply With Quote
Old 03-19-2018, 03:36 AM   #293
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,759
Default

What OS are you using? Did you move any of the files around after you downloaded and uncompressed the software?

I am not sure where the WAYS=7 option is in README. You should be able to run bbduk on your laptop. Set threads=N if you want to limit resource usage.
GenoMax is offline   Reply With Quote
Old 06-12-2018, 08:55 AM   #294
horvathdp
Member
 
Location: Fargo

Join Date: Dec 2011
Posts: 63
Default

I have run into a bit of a problem with the adapter trimming. It keeps leaving the sequence "AGATCGG" at the end when I run example:

./bbduk.sh -Xmx1g in1=read1_R1.fastq in2=read1_R2.fastq out1=cleanread1_R1.fastq out2=cleanread1_R2.fastq ktrim=r ref=resources/adapters.fa k=28 mink=12 hdist=1

My library was made with the NEBNext ultra directional kit with NEBnext primers 1-48. Is there an updated adapters.fa list that will hit these sequences?
horvathdp is offline   Reply With Quote
Old 06-12-2018, 06:24 PM   #295
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,759
Default

@horvathdp: You can provide NEBnext primers in a separate file as multi-fasta sequence. Then use that file with bbduk.sh. Also with paired-end reads use options "tpe tbo" to get residual bases at end of reads.
GenoMax is offline   Reply With Quote
Old 06-13-2018, 04:39 AM   #296
horvathdp
Member
 
Location: Fargo

Join Date: Dec 2011
Posts: 63
Default

Thanks! for those options, do I just add -tpe -tbo to the command?
horvathdp is offline   Reply With Quote
Old 06-13-2018, 04:56 AM   #297
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,759
Default

Quote:
Originally Posted by horvathdp View Post
Thanks! for those options, do I just add -tpe -tbo to the command?
No hyphens. Just tpe and tbo.
GenoMax is offline   Reply With Quote
Old 07-08-2018, 12:02 PM   #298
kokyriakidis
Junior Member
 
Location: Thessaloniki, Greece

Join Date: Jul 2018
Posts: 7
Default trimming Long and Short

Hi, do I need to follow a different approach in trimming and filtering Short vs long mate pair reads (Nextera)? And if yes could someone elaborate the pipeline?
kokyriakidis is offline   Reply With Quote
Old 07-12-2018, 04:24 AM   #299
kokyriakidis
Junior Member
 
Location: Thessaloniki, Greece

Join Date: Jul 2018
Posts: 7
Default

Genomax, do we have to trim Mate pair reads differently? I ask because they have the internal adapter. I am not asking about the Nextera mate pair. I ask for the reads made by MatePairSamplePrep v2. Do we have to reverse complement them?
kokyriakidis is offline   Reply With Quote
Old 07-12-2018, 08:59 PM   #300
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,759
Default

Have you looked at the in-line help for "splitnextera.sh"? The adapters for Mate Pair libraries are in the "adapters.fa" file so you should be able to trim them as usual (Nextera_LMP_Read1_External_Adapter, Nextera_LMP_Read2_External_Adapter)
GenoMax is offline   Reply With Quote
Reply

Tags
adapter, bbduk, bbtools, cutadapt, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO