Seqanswers Leaderboard Ad

**Brian Bushnell** · 09-16-2014, 07:11 PM

Oh - that's an intentional protection from overwriting files. Just delete the output file first or add the "overwrite" flag.

**lyw1** · 09-17-2014, 09:44 AM

high contaninants

Thanks.

Input is being processed as unpaired

Input: 385043 reads 10781204 bases.
Contaminants: 341911 reads (88.80%) 9573508 bases (88.80%)
Result: 43132 reads (11.20%) 1207696 bases (11.20%)

What is diffinition of contaminants? It looks very high.

**lyw1** · 09-17-2014, 10:02 AM

I need to read 30 nt for sequences. Miseq read 32 nt in sequencing. Thus many sequences have NN at last 2 positions. Does this relate to high contaminant rate?

**Brian Bushnell** · 09-17-2014, 10:23 AM

Are you using bbduk.sh? That's the only one that prints anything about contaminants. Can you show your specific command line?

Anyway, if you tried filtering out adapters and you got a result like that, it means you have almost no product and mostly adapter sequence.

**lyw1** · 09-17-2014, 10:28 AM

Yes, bbduk.sh.

Input is being processed as unpaired

Input: 385043 reads 10781204 bases.
Contaminants: 341911 reads (88.80%) 9573508 bases (88.80%)
Result: 43132 reads (11.20%) 1207696 bases (11.20%)

**Brian Bushnell** · 09-17-2014, 10:34 AM

Please give me the exact command line (what you typed before you hit enter).

**lyw1** · 09-17-2014, 10:37 AM

k=16 shows high contaminants than k=26

zheng@zheng-XPS-8500:~/Desktop/bbmap/20140916ngs$ bbduk.sh -Xmx1g in=probe48mix25fg_S7_L001_R2_001.fastq ref=ngs13template.fasta stats=probe48mix25fg_S7_L001_R2_001_26.txt k=26 fbm
java -ea -Xmx1g -cp /home/zheng/Desktop/bbmap/current/ jgi.BBDukF -Xmx1g in=probe48mix25fg_S7_L001_R2_001.fastq ref=ngs13template.fasta stats=probe48mix25fg_S7_L001_R2_001_26.txt k=26 fbm
Executing jgi.BBDukF [-Xmx1g, in=probe48mix25fg_S7_L001_R2_001.fastq, ref=ngs13template.fasta, stats=probe48mix25fg_S7_L001_R2_001_26.txt, k=26, fbm]

No output stream specified. To write to stdout, please specify 'out=stdout.fq' or similar.
Initial:
Memory: free=237m, used=14m

Added 13 kmers; time: 0.023 seconds.
Memory: free=228m, used=23m

Input is being processed as unpaired

Input: 159642 reads 4469976 bases.
Contaminants: 130724 reads (81.89%) 3660272 bases (81.89%)
Result: 28918 reads (18.11%) 809704 bases (18.11%)

Time: 0.197 seconds.
Reads Processed: 159k 811.47k reads/sec
Bases Processed: 4469k 22.72m bases/sec
zheng@zheng-XPS-8500:~/Desktop/bbmap/20140916ngs$ ^C
zheng@zheng-XPS-8500:~/Desktop/bbmap/20140916ngs$ bduk.sh -Xmx1g in=probe48mix25fg_S7_L001_R2_001.fastq ref=ngs13template.fasta stats=probe48mix25fg_S7_L001_R2_001_16.txt k=16 fbm
bduk.sh: command not found
zheng@zheng-XPS-8500:~/Desktop/bbmap/20140916ngs$ bbduk.sh -Xmx1g in=probe48mix25fg_S7_L001_R2_001.fastq ref=ngs13template.fasta stats=probe48mix25fg_S7_L001_R2_001_16.txt k=16 fbm
java -ea -Xmx1g -cp /home/zheng/Desktop/bbmap/current/ jgi.BBDukF -Xmx1g in=probe48mix25fg_S7_L001_R2_001.fastq ref=ngs13template.fasta stats=probe48mix25fg_S7_L001_R2_001_16.txt k=16 fbm
Executing jgi.BBDukF [-Xmx1g, in=probe48mix25fg_S7_L001_R2_001.fastq, ref=ngs13template.fasta, stats=probe48mix25fg_S7_L001_R2_001_16.txt, k=16, fbm]

No output stream specified. To write to stdout, please specify 'out=stdout.fq' or similar.
Initial:
Memory: free=237m, used=14m

Added 143 kmers; time: 0.028 seconds.
Memory: free=228m, used=23m

Input is being processed as unpaired

Input: 159642 reads 4469976 bases.
Contaminants: 151727 reads (95.04%) 4248356 bases (95.04%)
Result: 7915 reads (4.96%) 221620 bases (4.96%)

**Brian Bushnell** · 09-17-2014, 11:02 AM

So... that's telling you that you are getting matches between the stuff in your input file (probe48mix25fg_S7_L001_R2_001.fastq) and your reference file (ngs13template.fasta). And a shorter kmer will always find more matches in the presence of error.

probe48mix25fg_S7_L001_R2_001_26.txt will contain a list of which reference sequences were seen, and how many times they were seen.

**lyw1** · 09-17-2014, 12:30 PM

And a shorter kmer will always find more matches in the presence of error.

Here k=16 shows less match sequences than k=26

for k=16
Input: 159642 reads 4469976 bases.
Contaminants: 151727 reads (95.04%) 4248356 bases (95.04%)
Result: 7915 reads (4.96%) 221620 bases (4.96%)

for k=26
Input: 159642 reads 4469976 bases.
Contaminants: 130724 reads (81.89%) 3660272 bases (81.89%)
Result: 28918 reads (18.11%) 809704 bases (18.11%)

**Brian Bushnell** · 09-17-2014, 12:55 PM

In this case, the output is misleading... BBDuk assumes that the ref file is a file of contaminants because that's what I originally designed it for. So "Contaminants" actually means "Things that match the reference". I may change the wording eventually.

In other words, 95.04% of the reads matched the reference for K=16 and 81.89% did for K=26.

**lyw1** · 09-17-2014, 12:56 PM

Great, thanks.

Zheng

**lyw1** · 09-17-2014, 03:55 PM

Is there a size limitation for the referece sequences? It will not work when I add a 20 bp reference sequence.

**Brian Bushnell** · 09-17-2014, 06:07 PM

The size limit is the same as kmer length. So, if k=30, it will not work with anything less than a 30bp reference.

**lyw1** · 09-18-2014, 11:59 AM

Thanks.

How do you separate unambiguousReads and ambiguousReads in bbmap.sh?

**Brian Bushnell** · 09-18-2014, 12:10 PM

Ambiguously mapped reads get a "XT:A:R" tag in the sam output while unambiguously mapped get "XT:A:U".

You can also forbid ambiguously-mapping reads using the flag "ambig=toss", which will consider them unmapped.

Topics	Statistics	Last Post
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 162 views 0 likes	Last Post by seqadmin 03-03-2025, 01:15 PM
Highlights from AGBT 2025 – Part II by seqadmin Started by seqadmin, 02-28-2025, 12:58 PM	0 responses 248 views 0 likes	Last Post by seqadmin 02-28-2025, 12:58 PM
Highlights from AGBT 2025 – Part I by seqadmin Started by seqadmin, 02-24-2025, 02:48 PM	0 responses 623 views 0 likes	Last Post by seqadmin 02-24-2025, 02:48 PM
Selecting the Right AI Model for Bioinformatics Research by seqadmin Started by seqadmin, 02-21-2025, 02:46 PM	0 responses 265 views 0 likes	Last Post by seqadmin 02-21-2025, 02:46 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News