SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BBMap (aligner for DNA/RNAseq) is now open-source and available for download. Brian Bushnell Bioinformatics 658 12-06-2018 05:38 AM
BBMap for BitSeq dietmar13 Bioinformatics 1 04-30-2015 09:40 AM
Please help my BBMap cannot remove Illumina adapter TofuKaj Bioinformatics 4 04-28-2015 09:53 AM
BBMap Error Phage Hunter Bioinformatics 5 01-14-2015 05:34 AM
Introducing BBMap, a new short-read aligner for DNA and RNA Brian Bushnell Bioinformatics 24 07-07-2014 10:37 AM

Reply
 
Thread Tools
Old 02-22-2016, 01:22 PM   #61
DNA Sorcerer
Member
 
Location: Canada

Join Date: Mar 2010
Posts: 24
Default

testformat says: illumina fastq raw single-ended 108bp

As far as I remember this was a HiSeq run.

I tried the reformat line suggested by Brian but the process stops after a while with errors. Apparently short of memory. Will try to improve that and try again.
__________________
Hi there
DNA Sorcerer is offline   Reply With Quote
Old 02-22-2016, 09:16 PM   #62
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hmm... can you post the errors? Reformat by default uses very little memory, which is all that should be needed for a correctly-formatted file containing reads. It will run out of memory if you use it without increasing the default memory allocation on extremely long sequences (over tens of megabases) such as the human genome. It will never run out of memory on a correctly-formatted Illumina fastq file.

So, it would also be helpful if you could post the results of "head" (the first 10 lines of the file).
Brian Bushnell is offline   Reply With Quote
Old 02-23-2016, 05:51 AM   #63
DNA Sorcerer
Member
 
Location: Canada

Join Date: Mar 2010
Posts: 24
Default

See below. I run it for only one fo the files because doing both would go over my storage quota.

Quote:
java -da -Xmx200m -cp /home/cslamovi/CLARKSCV1.2.2-b/bbmap/current/ jgi.ReformatReads -da ibq qin=33 in=scratch/s_3_1_sequence.fastq out=scratch/fixed_1.fq
Executing jgi.ReformatReads [-da, ibq, qin=33, in=scratch/s_3_1_sequence.fastq, out=scratch/fixed_1.fq]

Input is being processed as unpaired
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at fileIO.ByteFile1.fillBuffer(ByteFile1.java:180)
at fileIO.ByteFile1.nextLine(ByteFile1.java:136)
at stream.FASTQ.toReadList(FASTQ.java:648)
at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:111)
at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:96)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:656)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
Input: 32775200 reads 3539721600 bases
Output: 32775200 reads (100.00%) 3539721600 bases (100.00%)

Time: 807.191 seconds.
Reads Processed: 32775k 40.60k reads/sec
Bases Processed: 3539m 4.39m bases/sec
Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
at jgi.ReformatReads.process(ReformatReads.java:1032)
at jgi.ReformatReads.main(ReformatReads.java:45)
__________________
Hi there
DNA Sorcerer is offline   Reply With Quote
Old 02-23-2016, 06:21 AM   #64
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,815
Default

Looks like you only ran this with 200MB of RAM. Can you try with -Xmx2g?

How old is this data BTW (in years)?
GenoMax is offline   Reply With Quote
Old 02-23-2016, 09:20 PM   #65
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Reformat should never run out of memory with the default settings and short (<200kbp) reads. I think the input file is corrupt, and should be re-downloaded. The corruption probably occurs somewhere around the 32.77 millionth read, but it's hard to be sure...
Brian Bushnell is offline   Reply With Quote
Old 03-09-2016, 02:25 AM   #66
FridaJoh
Junior Member
 
Location: Sweden

Join Date: Jan 2016
Posts: 1
Default

Hi

I came across this when searching for a way to demultiplex non-overlapping paired end reads that were sequenced using combinatorial barcodes. I don't suppose there is a way of doing that somehow using seal (or other tools?).

Quote:
Originally Posted by Brian Bushnell View Post
It is almost possible to do this with Seal, which outputs reads into bins based on kmer matching.

seal.sh in=reads.fq pattern=%.fq k=6 restrictleft=6 mm=f ref=barcodes.fa rcomp=f

That would require a file "barcodes.fa" like this:
>AACTGA
AACTGA
>GGCCTT
GGCCTT

etc., with one fasta entry per barcode, so the output reads would be in file AACTGA.fq and so forth. This is sort of a common request, so maybe I will make it unnecessary to provide a fasta file of the barcodes. Does that matter to you either way?

However, BBDuk has the flags "skipr1" and "skipr2", which allow it to only do kmer operations on one read or the other. Seal currently lacks this, but it's essential for processing inline barcodes. I'll add it for the next release.
FridaJoh is offline   Reply With Quote
Old 03-09-2016, 05:30 AM   #67
mcauchy
Junior Member
 
Location: oswego, ny

Join Date: Oct 2015
Posts: 5
Default

Newbie here! I have unzipped and untared bbmap but it wont run any commands. I have a Linux virtual box in windows 10. Am I missing some software to use BBMap?
mcauchy is offline   Reply With Quote
Old 03-09-2016, 05:39 AM   #68
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,815
Default

Quote:
Originally Posted by mcauchy View Post
Newbie here! I have unzipped and untared bbmap but it wont run any commands. I have a Linux virtual box in windows 10. Am I missing some software to use BBMap?
What do you mean "it won't run any commands"? Can you see the shell scripts in the "bbmap" folder. Try the following command and see if it produces help output on screen after you change to bbmap directory.

Code:
$ ./bbmap.sh
GenoMax is offline   Reply With Quote
Old 03-09-2016, 05:57 AM   #69
mcauchy
Junior Member
 
Location: oswego, ny

Join Date: Oct 2015
Posts: 5
Default

What I mean is I run:
$ ./repair.sh in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq

...and get:
java -ea -Xmx-211m -cp /media/sf_D_DRIVE/bbmap/current/ jgi.SplitPairsAndSingles rp in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
Invalid maximum heap size: -Xmx-211m
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
mcauchy is offline   Reply With Quote
Old 03-09-2016, 06:00 AM   #70
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,815
Default

How much memory have you allocated to the VM? You should at least have 2+ GB to have enough available for programs to run.
GenoMax is offline   Reply With Quote
Old 03-09-2016, 06:51 AM   #71
mcauchy
Junior Member
 
Location: oswego, ny

Join Date: Oct 2015
Posts: 5
Default

I have allocated 2.9Gb, which is all I have to give. It seems that is not enough. Thanks for your help.
mcauchy is offline   Reply With Quote
Old 03-09-2016, 07:12 AM   #72
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,815
Default

Quote:
Originally Posted by mcauchy View Post
I have allocated 2.9Gb, which is all I have to give. It seems that is not enough. Thanks for your help.
That may be true but in case BBMap was not able to allocate RAM correctly can you try running the command as follows:

Code:
$ ./repair.sh -Xmx2g in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
GenoMax is offline   Reply With Quote
Old 03-09-2016, 09:49 AM   #73
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Also it seems to me that '-Xmx-211m' is odd. Why the negative 211? I am not sure that makes a difference but it might.
westerman is offline   Reply With Quote
Old 03-09-2016, 10:40 AM   #74
mcauchy
Junior Member
 
Location: oswego, ny

Join Date: Oct 2015
Posts: 5
Default

Didn't run for very long....

$ ./repair.sh -Xmx2g in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq

java -ea -Xmx2g -cp /media/sf_D_DRIVE/bbmap/current/ jgi.SplitPairsAndSingles rp -Xmx2g in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
Executing jgi.SplitPairsAndSingles [rp, -Xmx2g, in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq, in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq, out1=fixed1.fq, out2=fixed2.fq, outsingle=single.fq]

Set INTERLEAVED to false
Started output stream.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:580)
at java.util.HashMap.addEntry(HashMap.java:879)
at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427)
at java.util.HashMap.put(HashMap.java:505)
at jgi.SplitPairsAndSingles.repair(SplitPairsAndSingles.java:751)
at jgi.SplitPairsAndSingles.process3_repair(SplitPairsAndSingles.java:538)
at jgi.SplitPairsAndSingles.process2(SplitPairsAndSingles.java:304)
at jgi.SplitPairsAndSingles.process(SplitPairsAndSingles.java:230)
at jgi.SplitPairsAndSingles.main(SplitPairsAndSingles.java:45)
mcauchy is offline   Reply With Quote
Old 03-09-2016, 10:43 AM   #75
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,815
Default

How about setting the VM aside and running BBMap directly on windows 10. How much RAM is there on the machine? BBMap is written in java and will run there but you would need to take into account windows versions of the command line usage for BBMap.
GenoMax is offline   Reply With Quote
Old 03-09-2016, 03:34 PM   #76
mcauchy
Junior Member
 
Location: oswego, ny

Join Date: Oct 2015
Posts: 5
Default

I'm not sure what the syntax would be in DOS. I've tried ./bbmap.sh, /bbmap.sh and bbmap.sh

Could you tell me what the command would be?
mcauchy is offline   Reply With Quote
Old 03-09-2016, 03:51 PM   #77
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,815
Default

Quote:
Originally Posted by mcauchy View Post
I'm not sure what the syntax would be in DOS. I've tried ./bbmap.sh, /bbmap.sh and bbmap.sh

Could you tell me what the command would be?
This information is in the BBMap thread. Here is how (note: you need to put the right path to the "current" directory on your machine and the space between that and align2.BBMap):

Code:
c:\> java -Xmx3g -cp c:\path_to\current align2.BBMap in=reads.fq out=mapped.sam
GenoMax is offline   Reply With Quote
Old 03-11-2016, 07:25 PM   #78
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi mcauchy,

Were you able to resolve this? The shellscripts (anything.sh) do not work in Windows, so the full syntax is needed. If you are still having trouble, please tell me the location of bbmap.sh (which is probably something like C:\something\bbmap\current\bbmap.sh).

-Brian
Brian Bushnell is offline   Reply With Quote
Old 03-12-2016, 04:16 AM   #79
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,815
Default

@Brian: How much memory does repair.sh need?
GenoMax is offline   Reply With Quote
Old 03-12-2016, 08:22 AM   #80
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by GenoMax View Post
@Brian: How much memory does repair.sh need?
That depends. If you have an interleaved file and the interleaving was broken because some reads were discarded, you can run it with the flag "fixinterleaving" and it only needs a trivial amount of memory (at any given time at most 2 reads need to be remembered).

For an arbitrarily disordered file or pair of files, in the worst case, it would store all reads in memory, so the amount of memory needed would be somewhat greater than the size of the uncompressed files.

But in the common case of a pair of files that are ordered correctly but some reads were deleted in each file without removing their mate, the amount of memory needed is proportional to the number of singleton reads.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
bbmap

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:28 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO