SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   bbtools: how to properly set up rqcfilter (http://seqanswers.com/forums/showthread.php?t=78986)

sghignone 11-06-2017 02:53 AM

bbtools: how to properly set up rqcfilter
 
Dear all,
I've recently obtained some PE RNA-seq data from JGI. They have been cleaned using the rqcfilter tool (v. 37.17). Since I would like to uniform the cleaning method extending it to other previously obtained Illumina data, I would like to set up my server to be able to run rqcfilter. I'm aware that this script has been hardcoded to run on on Genepool...but it worth to give a try.

Here follow the steps run by the RQCfilter, as deduced by the reproduce outputs, on my PE data:

>clumpify.sh pigz=t unpigz=t zl=4 reorder in1=OMISSIS.fastq.gz out1=TEMP_CLUMP_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz passes=1

>bbduk.sh ktrim=r ordered minlen=49 minlenfraction=0.33 mink=11 tbo tpe rcomp=f overwrite=true k=23 hdist=1 hdist2=1 ftm=5 pigz=t unpigz=t zl=4 ow=true in1=TEMP_CLUMP_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz out1=TEMP_TRIM_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz rqc=hashmap outduk=ktrim_kmerStats1.txt stats=ktrim_scaffoldStats1.txt loglog ref=adapters2.fa

>rm TEMP_CLUMP_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz

>bbduk.sh maq=10,0 trimq=6 qtrim=r ordered overwrite=true maxns=1 minlen=49 minlenfraction=0.33 k=25 hdist=1 pigz=t unpigz=t zl=6 cf=t barcodefilter=crash ow=true in1=TEMP_TRIM_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz out1=TEMP_FILTER1_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outm=synth1.fq.gz rqc=hashmap outduk=kmerStats1.txt stats=scaffoldStats1.txt loglog ref=pJET1.2.fasta

>rm TEMP_TRIM_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz

>bbduk.sh ordered overwrite=true k=20 hdist=1 pigz=t unpigz=t zl=6 ow=true in1=TEMP_FILTER1_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz out1=TEMP_FILTER2_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outm=synth2.fq.gz outduk=kmerStats2.txt stats=scaffoldStats2.txt loglog ref=short.fa

>rm TEMP_FILTER1_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz

>bbmap.sh ordered quickmatch k=13 idtag=t printunmappedcount ow=true qtrim=rl trimq=10 untrim build=1 null path= pigz=t unpigz=t zl=6 minid=.95 idfilter=.95 maxindel=3 minhits=2 bw=12 bwr=0.16 null maxsites2=10 tipsearch=0 in1=TEMP_FILTER2_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outu1=TEMP_MICROBE_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outm=microbes.fq.gz scafstats=commonMicrobes.txt

>rm TEMP_FILTER2_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz

>bbmap.sh ordered k=14 idtag=t usemodulo printunmappedcount ow=true qtrim=rl trimq=10 untrim kfilter=25 maxsites=1 tipsearch=0 minratio=.9 maxindel=3 minhits=2 bw=12 bwr=0.16 fast=true maxsites2=10 outm=human.fq.gz path= refstats=refStats.txt pigz=t unpigz=t zl=9 in1=TEMP_MICROBE_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outu1=OMISSIS.anqrpht.fastq.gz

>rm TEMP_MICROBE_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz

>bbmerge.sh loose overwrite=true in1=OMISSIS.anqrpht.fastq.gz ihist=ihist_merge.txt outc=cardinality.txt pigz=t unpigz=t zl=9 adapters=adapters2.fa

>kmercountexact.sh overwrite=true in1=OMISSIS.anqrpht.fastq.gz khist=khist.txt peaks=peaks.txt unpigz=t

I have highlighted in bold the files that, at a first glance, are missing. Where can I find them? while it's pretty clear what could be the content of adapters2.fa and pJET1.2.fasta, what's the content of the short.fa file?

about the contaminant removal, should I retrieve also human and cat/dog/mouse/microbial references or the script does it itself?

Thanks for any hints you may provide!
Cheers,
Stefano

GenoMax 11-06-2017 03:56 AM

I think you are better off asking this question to JGI tech support who gave you this data.

@Brian Bushnell (author of BBMap) works at JGI and contributes regularly on this forum. He has been busy with something for last few days and I have not seen him here. You could also try emailing him directly.

sghignone 11-06-2017 04:24 AM

I supposed he has been busy...I 've wrote him last week and got no replies, that's why I posted here. I'll give another try by email and with JGI staff.
Thanks!
s.-


All times are GMT -8. The time now is 03:39 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.