Hi all,
I'm using PBcR to assemble PacBio and error correct with PE250 Illumina data. Before obtaining the Illumina data, our collaborators assembled the PacBio data by itself with HGAP3 into one contig at ~6.7MB. But with PacBio's high error rate, we wanted to correct with our Illumina data.
We have 2056898 PE250 pairs. We had to aggressively quality trim due to a low quality run.
We have 117765 PacBio reads with an average read length of ~4.5kb.
When I use PBcR to correct and assemble, we end up with 310 contigs and a 4.5MB genome.
The commands I ran were based on the website documentation.
And here is the reference code:
Does anyone have an idea of what's going on here and how to improve this? Thanks in advance.
I'm using PBcR to assemble PacBio and error correct with PE250 Illumina data. Before obtaining the Illumina data, our collaborators assembled the PacBio data by itself with HGAP3 into one contig at ~6.7MB. But with PacBio's high error rate, we wanted to correct with our Illumina data.
We have 2056898 PE250 pairs. We had to aggressively quality trim due to a low quality run.
We have 117765 PacBio reads with an average read length of ~4.5kb.
When I use PBcR to correct and assemble, we end up with 310 contigs and a 4.5MB genome.
The commands I ran were based on the website documentation.
Code:
~/tools/wgs-8.3rc1/Linux-amd64/bin/PBcR -length 500 -partitions 200 -l NAME -s pacbio.spec -fastq PacBio.fastq genomeSize=6700000 illumina.frg
Code:
% cd sampleData/ % <wgs>/<Linux-amd64>/bin/fastqToCA -libraryname illumina -technology illumina -type sanger -innie -reads illumina.fastq > illumina.frg % <wgs>/<Linux-amd64>/bin/PBcR -length 500 -partitions 200 -l lambdaIll -s pacbio.spec -fastq pacbio.filtered_subreads.fastq genomeSize=50000 illumina.frg
Does anyone have an idea of what's going on here and how to improve this? Thanks in advance.
Comment