Seqanswers Leaderboard Ad

**vingomez** · 09-16-2014, 06:46 AM

Thanks Brian for your effort in providing bioinformatic applications,

As you mentioned in your post, BBsplit can use two paired-end files as input. In addition to the two files; Can I add a third file (e.g. merged read file) as input?

Thanks again

P.S. In previous post "java -Xmx29g -cp /path/to/current align2.BBSplit", must said align2.BBSplitter"

**Brian Bushnell** · 09-16-2014, 08:32 AM

Originally posted by vingomez View Post

Thanks Brian for your effort in providing bioinformatic applications,

As you mentioned in your post, BBsplit can use two paired-end files as input. In addition to the two files; Can I add a third file (e.g. merged read file) as input?

No, you'll have to do that in two runs, one for the paired reads and one for the merged reads. But ultimately that won't affect the output or runtime (other than the fact that the index will need to be loaded twice, and you'll end up with 2 sam files that need to be merged).

P.S. In previous post "java -Xmx29g -cp /path/to/current align2.BBSplit", must said align2.BBSplitter"

Fixed, thanks!

**sdriscoll** · 11-10-2014, 01:08 PM

After running bbsplit once using the syntax: ref=ref1.fa,ref2.fa, is it possible to re-use that index on subsequent runs using the path= parameter?

**Brian Bushnell** · 11-10-2014, 01:37 PM

Yes, it is. Also, with BBSplit, I think it will try to regenerate the index every time as long as "ref=" is specified, even if it already exists, so only do that once.

**arkilis** · 01-22-2015, 07:38 PM

How to use the bbsplit to check the match details.

i.e. I might want to know the 1 seq from my reads match which seq in the ref? Is it possible to do that?

Cheers,
a

**Brian Bushnell** · 01-22-2015, 09:35 PM

Hi arkilis,

The standard way to split files with bbsplit is like this:

(index)
bbsplit.sh ref=x.fa,y.fa,z.fa,...

(map)
bbsplit.sh in=reads.fq basename=out_%.sam

If you change the name of the output files to ".fq", it will come out in fastq format, which is often more useful, though reformat.sh can convert sam to fastq. Anyway, when you run bbsplit like this, it will produce one output file per reference, containing all of the reads that match that reference better than the others. In this case, it would produce 3 output files - out_x.sam, out_y.sam, and out_z.sam. You could examine the contents of the specific sam files to see exactly which contig/scaffold the read hit, but if you just need the reads split by reference file, that's how you do it.

**vingomez** · 02-24-2015, 12:34 PM

Hi Brian,

Originally posted by Brian Bushnell View Post

For example, if you had a library of something that was contaminated with e.coli and salmonella, you could do this:

For paired reads in 2 files, you would do this:

bbsplit.sh in1=reads1.fq in2=reads2.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu1=clean1.fq outu2=clean2.fq

This will produce 3 output files:
out_ecoli.fq (ecoli reads)
out_salmonella.fq (salmonella reads)
clean1.fq and clean2.fq (unmapped reads)

For the example provide above (minor edit for clarity):

(1) Is there a flag/command to generate paired reads for mapped reads (e.g. out_ecoli_r1.fq out_ecoli_r2.fq, out_salmonella_r1 out_salmonella_r2.fq.fq)?

(2) Will this flag/command work with BBMap?

Thanks
Vicente

**Brian Bushnell** · 02-24-2015, 12:43 PM

Hi Vicente,

The output will come out interleaved (alternating read1/read2) if the input was paired. There's currently not a way to make BBSplit write twin output files for the "basename" output streams (though I'll make a note to add that feature). You can de-interleave the output like this:

reformat.sh in=out_ecoli.fq out1=out_ecoli_r1.fq out2=out_ecoli_r2.fq

BBMap does not understand the "basename" flag so it doesn't work for splitting, but BBMap's "out", "outu", and "outm" all allow output of twin files using "out1" and "out2", etc.

**vingomez** · 02-24-2015, 12:47 PM

Thanks for you dedication and rapid response.

Vicente

**vingomez** · 02-26-2015, 08:12 AM

BBSplit: limit amount of ref

Hi Brian,

I tried BBSplit with four ref files without any success. Is there any limit to the number of ref files?

Code:

java -Xmx29g -cp /path/to/current align2.BBSplitter in1=clean_ecc_100_r1.fastq.gz in2=clean_ecc_100_r2.fastq.gz ref=/path/to/resources/ref1.fasta,/path/to/resources/ref2.fasta,/path/to/resources/ref3.fasta,/path/to/resources/ref4.fasta basename=out_%.fastq.gz out1=unmapped_r1.fastq.gz out2=unmapped_r2.fastq.gz ambig2=split

However it work fine with 3 ref files. I tried with interactions of various group of 3 ref files with success.

P.S. When do you think is better to perform BBSplit after or before ecc/quality trim?

Thanks again
Vicente

**Brian Bushnell** · 02-26-2015, 11:54 AM

Hi Vicente,

I just tested BBSplit and it worked fine for me:

Code:

bbsplit.sh -Xmx1g ref=Clostridium_perfringensATCC_13124.fasta,Desulfotomaculum_gibsoniae_DSM_7213.fasta,Hirschia_baltica_ATCC_49814.fasta,Nocardiopsis_dassonvillei_DSM_43111.fasta

randomreads.sh -Xmx1g reads=1000 out=reads.fq

bbsplit.sh -Xmx1g in=reads.fq basename=o_%.sam

produced 4 output files:

Code:

bushnell@gpint109:/global/scratch2/sd/bushnell/splittest$ ls -l
total 38144
-rw-rw-r-- 1 bushnell bushnell  3311027 Feb 26 10:24 Clostridium_perfringensATCC_13124.fasta
-rw-rw-r-- 1 bushnell bushnell  4952723 Feb 26 10:24 Desulfotomaculum_gibsoniae_DSM_7213.fasta
-rw-rw-r-- 1 bushnell bushnell  3599251 Feb 26 10:23 Hirschia_baltica_ATCC_49814.fasta
-rw-rw-r-- 1 bushnell bushnell  6652569 Feb 26 10:23 Nocardiopsis_dassonvillei_DSM_43111.fasta
-rw-rw-r-- 1 bushnell bushnell    81824 Feb 26 11:43 o_Clostridium_perfringensATCC_13124.sam
-rw-rw-r-- 1 bushnell bushnell   125594 Feb 26 11:43 o_Desulfotomaculum_gibsoniae_DSM_7213.sam
-rw-rw-r-- 1 bushnell bushnell    81422 Feb 26 11:43 o_Hirschia_baltica_ATCC_49814.sam
-rw-rw-r-- 1 bushnell bushnell   181854 Feb 26 11:43 o_Nocardiopsis_dassonvillei_DSM_43111.sam
-rw-rw-r-- 1 bushnell bushnell   352650 Feb 26 11:42 reads.fq
drwxrwsr-x 4 bushnell bushnell      512 Feb 26 10:24 ref

When you say "without success", what exactly happens? And by the way, to catch unmapped reads, you need to set "outu1" and "outu2", not "out1" and "out2".

As for your other question - you can do quality-trimming before or after, but error-correction should be done after. Generally, I would do quality-trimming before if you use BBSplit's default settings, which require high identity, or after if you reduce the identity threshold to, say, minid=0.75 (which is closer to BBMap's default). But error-correction is best done after, on each file individually, so that homologous areas shared between the genomes don't affect each other.

**vingomez** · 02-26-2015, 01:11 PM

Hi Brian,

Sorry for the lack of information. When I said "without success": the program stop and the prompt appear (see below - last 9 lines) . Again only when I used 4 ref files. I used the same commands listed in the previous post (with and without the ambig2=split flag) and used the flags outu1 and outu2 (as you suggested) with the same outcome. Maybe a JAVA conflict?

HTML Code:

Loaded Reference:       0.010 seconds.
Loading index for chunk 1-0, build 1
Generated Index:        0.039 seconds.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
        at align2.BBIndex.analyzeIndex(BBIndex.java:115)
        at align2.BBMap.loadIndex(BBMap.java:405)
        at align2.BBMap.main(BBMap.java:32)
        at align2.BBSplitter.main(BBSplitter.java:45)
sol1:/work/MP/bbmap%

Thanks
Vicente

**Brian Bushnell** · 02-26-2015, 04:03 PM

Hi Vicente,

There appears to be something wrong with the index, or possibly fasta files. Would it be possible for you to zip the references together and email them to me so I can try to replicate this?

-Brian

**vingomez** · 03-02-2015, 01:34 PM

Hi Brian,

Originally posted by Brian Bushnell View Post

Hi Vicente,

There appears to be something wrong with the index, or possibly fasta files. Would it be possible for you to zip the references together and email them to me so I can try to replicate this?

-Brian

Definitely, the index files were corrupted. Just deleted the ref directory, repeat index and map - no problem.

Thanks again

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Introducing BBSplit: Read Binning Tool for Metagenomes and Contaminated Libraries

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News