I've been trying to construct a de novo assembly of a mammalian genome for some time now. Currently I have an incomplete genome constructed from Illumina data on AllpathsLG, and I would like to use PBJelly to fill in the gaps using PacBio reads.
I ran the test data successfully, but the pipeline doesn't seem to work on my real data. I'm seeing essentially no improvement in my assembly quality after running PBJelly on my Pacbio reads. I'm getting a lot of errors in the assembly, especially at the setup and mapping stages. About twenty percent of my scaffold references are giving me this error in setup:
I'm not seeing any other errors in setup, though. In extraction, I get these kind of outputs:
And so forth for the rest of my data. Again, it appears to be throwing out another 20% of the data. Support is where I start to see even more issues, with both of these flags coming up in large numbers:
I've checked the reads using metrics like Fastqc and they don't seem to be noticeably lower quality than I would expect, so I'm finding this very confusing. I'm running PBJelly with all the defaults--is there anything that might be confounding my analysis to display these results? I'd be happy to display more log data if it would be helpful.
Does anyone have any advice? Any insight at all would be very welcome.
I ran the test data successfully, but the pipeline doesn't seem to work on my real data. I'm seeing essentially no improvement in my assembly quality after running PBJelly on my Pacbio reads. I'm getting a lot of errors in the assembly, especially at the setup and mapping stages. About twenty percent of my scaffold references are giving me this error in setup:
Code:
2015-03-19 09:48:26,814 [DEBUG] Scaffold scaffold_40566|ref0053720 is empty
Code:
2015-03-24 11:25:12,545 [INFO] Parsing /scratch/02985/emg2497/mouse_genome_project/pbjelly_nojoblimit/pacbioreads/Pacbio_A05_1.1.mod.fastq 2015-03-24 11:25:18,887 [INFO] Loaded 53626 Reads 2015-03-24 11:25:21,197 [INFO] Parsed 12357 Reads 2015-03-24 11:25:21,197 [INFO] Parsing /scratch/02985/emg2497/mouse_genome_project/pbjelly_nojoblimit/pacbioreads/Pacbio_A05_1.2.mod.fastq 2015-03-24 11:25:24,073 [INFO] Loaded 48605 Reads 2015-03-24 11:25:28,346 [INFO] Parsed 11056 Reads
Code:
2015-03-20 14:02:14,425 [DEBUG] Hit for m140207_170145_42153_c100619042550000001 823119607181456_s1_p0/2576/2848_6155 has mapq 0 - below threshold 200 2015-03-20 14:02:14,429 [DEBUG] Hit for m140207_170145_42153_c100619042550000001 823119607181456_s1_p0/2782/17335_18304 has mapq 0 - below threshold 200 2015-03-20 14:02:30,989 [DEBUG] gapSup 2015-03-20 14:02:30,989 [DEBUG] - Strand on m140207_170145_42153_c100619042550000001823119607181456_s1_p0/16349/3190_8591 2015-03-20 14:02:30,989 [DEBUG] RightDist 202 remainSeq -25 2015-03-20 14:02:30,990 [DEBUG] LeftDist -4938 remainSeq -25 2015-03-20 14:02:30,990 [DEBUG] 2015-03-20 14:02:30,990 [DEBUG] gapSup 2015-03-20 14:02:30,990 [DEBUG] - Strand on m140207_170145_42153_c100619042550000001823119607181456_s1_p0/16349/3190_8591 2015-03-20 14:02:30,990 [DEBUG] RightDist -3599 remainSeq -25 2015-03-20 14:02:30,990 [DEBUG] LeftDist -1217 remainSeq -25 2015-03-20 14:02:30,990 [DEBUG] span support 2015-03-20 14:02:30,990 [DEBUG]
Does anyone have any advice? Any insight at all would be very welcome.
Comment