Seqanswers Leaderboard Ad

**necrosnake91** · 09-09-2019, 07:41 PM

Hi to everyone.

I have an issue with bbduk.sh and I,m looking for some help. At this moment I'm calculating the percentage of reads with a quality equal or higher than 30. Using bbduk.sh, I'm running this command:

```../bbmap/bbduk.sh in1=../t01-R1.fastq in2=../t01-R2.fastq trimq=30 qtrim=rl ````

and the output looks like:

````

Max memory cannot be determined. Attempting to use 1400 MB.
If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command,
or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value.
java -ea -Xmx1400m -Xms1400m -cp /Users/rodolfochavez/RNASeq/bbmap/current/ jgi.BBDuk in1=../t01-R1.fastq in2=../t01-R2.fastq trimq=30 qtrim=rl
Executing jgi.BBDuk [in1=../t01-R1.fastq, in2=../t01-R2.fastq, trimq=30, qtrim=rl]
Version 38.59

No output stream specified. To write to stdout, please specify 'out=stdout.fq' or similar.
0.019 seconds.
Initial:
Memory: max=1468m, total=1468m, free=1439m, used=29m

Input is being processed as paired
Processing time: 2.250 seconds.

Input: 3000000 reads 225505684 bases.
QTrimmed: 2167934 reads (72.26%) 96158343 bases (42.64%)
Total Removed: 444188 reads (14.81%) 96158343 bases (42.64%)
Result: 2555812 reads (85.19%) 129347341 bases (57.36%)

Time: 2.252 seconds.
Reads Processed: 3000k 1331.91k reads/sec
Bases Processed: 225m 100.12m bases/sec

```
And the result of the analysis is printed in the screen of my terminal. My question is, how can I save the statistics Results from this analysis in a file?

Thanks a lot

**GenoMax** · 09-09-2019, 07:57 PM

You can simply capture the STDERR output (where this is bring written) to a file.

**necrosnake91** · 09-11-2019, 07:10 PM

GenoMax,

Thanks a lot, you solved my issue!

**pilyric** · 11-13-2019, 07:43 AM

I've made a mistake that I did not use certain version of BBduk.

**coredump** · 04-23-2020, 07:01 AM

I would like to extract paired reads with exact matches to a list of 25bp kmers and bbduk seemed like a good tool to use. However, bbduk seems to be returning some 'outm' reads that don't have matches to the kmer list.

I've attached a file containing my bbduk command, a single pair of reads and a reference fasta containing the 2 kmers (including reverse complemented sequences) that the bbduk stats file reports as matching.

Using grep to search the input reads for each kmer sequence produces no matches. Please let me know if I have an incorrect bbduk option set.

Attached Files

bbduk.kmer.test.tar.gz (721 Bytes, 110 views)

**jmwhitha** · 08-12-2020, 08:09 AM

Good afternoon Brian and GenoMax,

Hope you two are doing well all things considered

I have many metagenomes to assemble these days and am trying to make fewer and longer scaffolds and reduce wall clock time by reducing errors in my 150x2 reads.

The following commands result in an aqhist distribution that bothers me. It has a gradual incline from Q17-33 (from ~0 to 3.5 million reads), an obvious dip around Q35 (down to ~0.5 million reads), and a rapid rise to Q38 (~10 million reads). My Phred linear base quality (qhist) seems good with a gradually decline from ~38 to ~34 and a small up swing to ~35.5 at the end.

#Cleaning commands:
#Paths removed for simplicity.

#Force Trim Poor Quality Ends
bbduk.sh -Xmx1g in=input.fastq out=trim151.fq ftr=149
bbduk.sh -Xmx1g in=trim151.fq out=ftrimmed.fq ftl=7

#Kmer Trim Adapters
bbduk.sh -Xmx1g in=ftrimmed.fq out=ktrimmed.fq ktrim=r k=23 mink=11 hdist=1 ref=truseq.fa.gz tbo tpe

#Quality Trimming
/usr/local/usrapps/bioinfo/env_jmw/bbtools/lib/bbduk.sh -Xmx1g in=ktrimmed.fq out=qtrimmed.fq minlen=100 qtrim=rl trimq=10

#Contaminant Removal
bbduk.sh -Xmx1g in=qtrimmed.fq out=bbclean.fq k=31 ref=sequencing_artifacts.fa.gz, phix_adapters.fa.gz, phix174_ill.ref.fa.gz

Questions:
1) Why is this happening?
2) Should it be fixed?
3) How can it be fixed (suggested command please)?

I greatly appreciate your help and any additional comments for improvement.

Thank you,
Jason

**jmwhitha** · 08-13-2020, 03:04 AM

Good morning,

It seems that the aqhist for my raw fastq file has the same dip at Q35. So, please disregard the previous questions. If you could tell me if peaks and dips are common or not uncommon, and if you have any insights about why the read quality scores have peaks and dips, I'd really appreciate it.

Thank you,
Jason

**Lamma** · 10-29-2020, 02:23 AM

When quality trimming raw reads to a phred score there is often conflicting advice surrounding to what Q score one should filter too.

I have read many of your post and replies and from what I gather there really is no right answer as it depends on what you want to do.

Say for example I have some illumina short reads and some pacbio long reads that I will to do the following with:

Hybrid assembly
Illumina only de novo assembly
Pacbio only de novo assembly

I have been previously told that a score of Q30 is highly typical to use however I have read that anything above 27 is just unnecessary and potentially damaging to generating good assemblies. On the bbduk help pages on Seqanswers Q10 is used in every example.

Thus is Q10 a good general base point for quality trimming to provide decent assemblies or is this also to high?

How one one work out the above on their own without needing to come here to answers?

Secondly, should we trim pacbio reads in either a hybrid assembly or pacbio de novo assembly?

**floem7** · 03-30-2021, 02:14 PM

What exactly "artifacts" mean in bbduk?

**gastonviarengo** · 05-10-2021, 07:04 AM

Hello everyone,

I've installed bbmap throw 'sudo apt-get install bbmap' and also by downloading the package, untaring it and adding the path to the directory in the .bashrc file. However, when running "bbduk.sh --version" or a script "bbduk.sh -Xmx ..." I get the following error:

/usr/bin/bbduk.sh: línea 344: /usr/share/bbmap/calcmem: No existe el archivo o el directorio
/usr/bin/bbduk.sh: línea 345: setEnvironment: orden no encontrada
/usr/bin/bbduk.sh: línea 346: parseXmx: orden no encontrada
/usr/bin/bbduk.sh: línea 350: freeRam: orden no encontrada
java -Xmxm -Xmsm -cp /usr/share/java/bbmap.jar jgi.BBDuk --version
Invalid maximum heap size: -Xmxm
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

I've tried -Xmx with different amounts of memory and receive the same error. Also, the folder /calcmem does indeed do not exist. Any ideas on how to solve this problem?

Thank you!

**J_bioinf** · 07-06-2021, 09:52 AM

BBDUK trimming and poor quality fastqc

Hi

I am very new to bioinformatics and I am attempting to trim reads using BBDUK. My libraries were prepared using the lexogen quantseq 3' mRNA fwd kit.

According to their recommendation I have performed the following:

runID*R1_001.fastq; do cat $i | bbduk.sh in=stdin.fq out=${i}_trimmed_clean ref=/data/resources/polyA.fa.gz,/data/resources/truseq.fa.gz k=13 ktrim=r useshortkmers=t mink=5 qtrim=t trimq=10 minlength=20; done

However my reads quality on fastqc has not improved much. Do I need to go ahead and remove illumina sequences? Does anyone happen to know the specific illumina adapter sequence?

Thank you.

**Pluto** · 07-23-2021, 06:23 AM

Adaptor trimming is not working

This is what my script looks like:

Ordered=t #Set to true to output reads in same order as input
Ktrim=r #once a reference kmer is matched in a read, that kmer and all the bases to the right will be trimmed
K=21 #specifies the kmer size
Mink=8 #"mink" allows it to use shorter kmers at the ends of the read
Hdist=2 #number of permitted mismatches

for Prefix in `ls -1 *_R1.fastq.gz | sed 's/_R1.fastq.gz//'`
do

bbduk.sh -Xmx128g in1=$Prefix\_R1.fastq.gz in2=$Prefix\_R2.fastq.gz out1=$Prefix\_clean_R1.fastq.gz out2=$Prefix\_clean_R2.fastq.gz ref=$adapters ordered=$Ordered ktrim=$Ktrim k=$K mink=$Mink hdist=$Hdist tpe tbo

done

My R1 and R2 files consist of concatenated sequences from different runs. Do you think this could be the reason?

Many thanks

**emortiz** · 08-26-2021, 10:54 PM

I have repeated Brian Bushnell's comparison among adaptor trimmers, this time including the most recent versions of Cutadapt, Trimmomatic, and fastp. Here are my commands:

Code:

# Cutadapt 3.4:
time cutadapt -m 21 -j 0 -b "file:gruseq.fa" -B "file:gruseq.fa" -o cutadapt_R1.fq.gz -p cutadapt_R2.fq.gz dirty_R1.fq.gz dirty_R2.fq.gz

# Trimmomatic 0.39:
time trimmomatic PE -phred33 dirty_R1.fq.gz dirty_R2.fq.gz trimmomatic_R1.fq.gz trimmomatic_U1.fq.gz trimmomatic_R2.fq.gz trimmomatic_U2.fq.gz ILLUMINACLIP:gruseq.fa:2:28:10:2:keepBothReads MINLEN:21

# fastp 0.22.0:
time fastp -w 8 -Q -l 21 --adapter_fasta gruseq.fa --detect_adapter_for_pe --in1 dirty_R1.fq.gz --in2 dirty_R2.fq.gz --out1 fastp_R1.fq.gz --out2 fastp_R2.fq.gz

# bbduk.sh 38.92:
time bbduk.sh in=dirty_R#.fq.gz out=bbduk_R#.fq.gz ref=gruseq.fa ktrim=r mink=12 hdist=1 minlen=21 tpe tbo

# bbduk.sh 38.92 (x2):
time bbduk.sh ktrim=r minlength=21 interleaved=f tpe tbo ref=gruseq.fa in=dirty_R#.fq.gz out=stdout.fq k=21 mink=11 hdist=2 | bbduk.sh ktrim=r minlength=21 interleaved=f tpe tbo ref=gruseq.fa in=stdin.fq out=bbduk_x2_R#.fq.gz k=19 mink=9 hdist=1

And these were the results:
No code has to be inserted here.I still prefer bbdu.sh for its speed and high accuracy. fastp had slightly higher accuracy but it sometimes mistakes genomic sequence for adaptor (see this post). However Cutadapt now is clearly more accurate (but the slowest by far). I wonder if anybody can recommend some settings that could increase bbduk's accuracy a little more?

**lituan** · 09-27-2021, 09:52 PM

bbduk hdist=0 does not work

Hi

I want to search CCGG in reads, so I tried following command

bbduk.sh in=test.fq literal=CCGG k=4 hdist=0 overwrite=t out=test.result.fq

in test.fq, only one read, no CCGG

@A00582:707:HJ3NTDSX2:2:1101:5267:1658 1:N:0:AACCTC
TATTAAGCAGAAGGGCAGGCTGGAAAATCCTCTTCAGCAGAACGGTGGACTGAGGCTCACTGCTATCAAGGTGGACAGGCTTCTCTGCTCAGCAAACCAGGCTCACCCAGGGGTGCTCTACACAGACTCGGGCTCGCTAA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

but , result this read is matched

BBDuk version 37.62
Initial:
Memory: max=210091m, free=200226m, used=9865m

Added 1 kmers; time: 0.151 seconds.
Memory: max=210091m, free=192553m, used=17538m

Input is being processed as unpaired
Started output streams: 0.269 seconds.
Processing time: 0.116 seconds.

Input: 1 reads 140 bases.
Contaminants: 1 reads (100.00%) 140 bases (100.00%)
Total Removed: 1 reads (100.00%) 140 bases (100.00%)
Result: 0 reads (0.00%) 0 bases (0.00%)

Time: 0.564 seconds.
Reads Processed: 1 0.00k reads/sec
Bases Processed: 140 0.00m bases/sec

it seems hdist does not work

Could you give some advice

**GenoMax** · 09-28-2021, 03:13 AM

@lituan you will need to use k=2 or less with such a short pattern (CCGG). Even then it may not work well.

This may be a case where you would want to use a different package called Seqkit. Specific tool would be "seqkit grep".

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 22 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News