Dear all,
I've recently obtained some PE RNA-seq data from JGI. They have been cleaned using the rqcfilter tool (v. 37.17). Since I would like to uniform the cleaning method extending it to other previously obtained Illumina data, I would like to set up my server to be able to run rqcfilter. I'm aware that this script has been hardcoded to run on on Genepool...but it worth to give a try.
Here follow the steps run by the RQCfilter, as deduced by the reproduce outputs, on my PE data:
>clumpify.sh pigz=t unpigz=t zl=4 reorder in1=OMISSIS.fastq.gz out1=TEMP_CLUMP_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz passes=1
>bbduk.sh ktrim=r ordered minlen=49 minlenfraction=0.33 mink=11 tbo tpe rcomp=f overwrite=true k=23 hdist=1 hdist2=1 ftm=5 pigz=t unpigz=t zl=4 ow=true in1=TEMP_CLUMP_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz out1=TEMP_TRIM_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz rqc=hashmap outduk=ktrim_kmerStats1.txt stats=ktrim_scaffoldStats1.txt loglog ref=adapters2.fa
>rm TEMP_CLUMP_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz
>bbduk.sh maq=10,0 trimq=6 qtrim=r ordered overwrite=true maxns=1 minlen=49 minlenfraction=0.33 k=25 hdist=1 pigz=t unpigz=t zl=6 cf=t barcodefilter=crash ow=true in1=TEMP_TRIM_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz out1=TEMP_FILTER1_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outm=synth1.fq.gz rqc=hashmap outduk=kmerStats1.txt stats=scaffoldStats1.txt loglog ref=pJET1.2.fasta
>rm TEMP_TRIM_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz
>bbduk.sh ordered overwrite=true k=20 hdist=1 pigz=t unpigz=t zl=6 ow=true in1=TEMP_FILTER1_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz out1=TEMP_FILTER2_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outm=synth2.fq.gz outduk=kmerStats2.txt stats=scaffoldStats2.txt loglog ref=short.fa
>rm TEMP_FILTER1_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz
>bbmap.sh ordered quickmatch k=13 idtag=t printunmappedcount ow=true qtrim=rl trimq=10 untrim build=1 null path= pigz=t unpigz=t zl=6 minid=.95 idfilter=.95 maxindel=3 minhits=2 bw=12 bwr=0.16 null maxsites2=10 tipsearch=0 in1=TEMP_FILTER2_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outu1=TEMP_MICROBE_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outm=microbes.fq.gz scafstats=commonMicrobes.txt
>rm TEMP_FILTER2_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz
>bbmap.sh ordered k=14 idtag=t usemodulo printunmappedcount ow=true qtrim=rl trimq=10 untrim kfilter=25 maxsites=1 tipsearch=0 minratio=.9 maxindel=3 minhits=2 bw=12 bwr=0.16 fast=true maxsites2=10 outm=human.fq.gz path= refstats=refStats.txt pigz=t unpigz=t zl=9 in1=TEMP_MICROBE_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outu1=OMISSIS.anqrpht.fastq.gz
>rm TEMP_MICROBE_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz
>bbmerge.sh loose overwrite=true in1=OMISSIS.anqrpht.fastq.gz ihist=ihist_merge.txt outc=cardinality.txt pigz=t unpigz=t zl=9 adapters=adapters2.fa
>kmercountexact.sh overwrite=true in1=OMISSIS.anqrpht.fastq.gz khist=khist.txt peaks=peaks.txt unpigz=t
I have highlighted in bold the files that, at a first glance, are missing. Where can I find them? while it's pretty clear what could be the content of adapters2.fa and pJET1.2.fasta, what's the content of the short.fa file?
about the contaminant removal, should I retrieve also human and cat/dog/mouse/microbial references or the script does it itself?
Thanks for any hints you may provide!
Cheers,
Stefano
I've recently obtained some PE RNA-seq data from JGI. They have been cleaned using the rqcfilter tool (v. 37.17). Since I would like to uniform the cleaning method extending it to other previously obtained Illumina data, I would like to set up my server to be able to run rqcfilter. I'm aware that this script has been hardcoded to run on on Genepool...but it worth to give a try.
Here follow the steps run by the RQCfilter, as deduced by the reproduce outputs, on my PE data:
>clumpify.sh pigz=t unpigz=t zl=4 reorder in1=OMISSIS.fastq.gz out1=TEMP_CLUMP_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz passes=1
>bbduk.sh ktrim=r ordered minlen=49 minlenfraction=0.33 mink=11 tbo tpe rcomp=f overwrite=true k=23 hdist=1 hdist2=1 ftm=5 pigz=t unpigz=t zl=4 ow=true in1=TEMP_CLUMP_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz out1=TEMP_TRIM_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz rqc=hashmap outduk=ktrim_kmerStats1.txt stats=ktrim_scaffoldStats1.txt loglog ref=adapters2.fa
>rm TEMP_CLUMP_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz
>bbduk.sh maq=10,0 trimq=6 qtrim=r ordered overwrite=true maxns=1 minlen=49 minlenfraction=0.33 k=25 hdist=1 pigz=t unpigz=t zl=6 cf=t barcodefilter=crash ow=true in1=TEMP_TRIM_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz out1=TEMP_FILTER1_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outm=synth1.fq.gz rqc=hashmap outduk=kmerStats1.txt stats=scaffoldStats1.txt loglog ref=pJET1.2.fasta
>rm TEMP_TRIM_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz
>bbduk.sh ordered overwrite=true k=20 hdist=1 pigz=t unpigz=t zl=6 ow=true in1=TEMP_FILTER1_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz out1=TEMP_FILTER2_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outm=synth2.fq.gz outduk=kmerStats2.txt stats=scaffoldStats2.txt loglog ref=short.fa
>rm TEMP_FILTER1_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz
>bbmap.sh ordered quickmatch k=13 idtag=t printunmappedcount ow=true qtrim=rl trimq=10 untrim build=1 null path= pigz=t unpigz=t zl=6 minid=.95 idfilter=.95 maxindel=3 minhits=2 bw=12 bwr=0.16 null maxsites2=10 tipsearch=0 in1=TEMP_FILTER2_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outu1=TEMP_MICROBE_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outm=microbes.fq.gz scafstats=commonMicrobes.txt
>rm TEMP_FILTER2_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz
>bbmap.sh ordered k=14 idtag=t usemodulo printunmappedcount ow=true qtrim=rl trimq=10 untrim kfilter=25 maxsites=1 tipsearch=0 minratio=.9 maxindel=3 minhits=2 bw=12 bwr=0.16 fast=true maxsites2=10 outm=human.fq.gz path= refstats=refStats.txt pigz=t unpigz=t zl=9 in1=TEMP_MICROBE_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz outu1=OMISSIS.anqrpht.fastq.gz
>rm TEMP_MICROBE_8352bfa6997ae09cc4039681816c_OMISSIS.anqrpht.fastq.gz
>bbmerge.sh loose overwrite=true in1=OMISSIS.anqrpht.fastq.gz ihist=ihist_merge.txt outc=cardinality.txt pigz=t unpigz=t zl=9 adapters=adapters2.fa
>kmercountexact.sh overwrite=true in1=OMISSIS.anqrpht.fastq.gz khist=khist.txt peaks=peaks.txt unpigz=t
I have highlighted in bold the files that, at a first glance, are missing. Where can I find them? while it's pretty clear what could be the content of adapters2.fa and pJET1.2.fasta, what's the content of the short.fa file?
about the contaminant removal, should I retrieve also human and cat/dog/mouse/microbial references or the script does it itself?
Thanks for any hints you may provide!
Cheers,
Stefano
Comment