I have a supernova assembly of 10x genomics data for which I also have 4 smrt cells of PacBio Sequel data. The general workflow of my efforts so far have been:
supernova (using 10x genomics data)
SSPACE-LongRead (using pacbio sequel data)
GapFiller (using 10x genomics data)
PBJelly (using pacbio sequel data)
I saw steady improvement of the assembly up through GapFiller, but when I ran PBJelly at default settings the output seem to be in worse shape than the input. Our guiding metrics were total assembly length (which we expect to be 400Mb) and BUSCO completeness. The GapFiller results looked good at 414Mb total length & 88.8% core genes being found by BUSCO. But the output of default PBJelly grew in size to 550Mb and the BUSCO completeness dropped to 82.8%.
I then tried running PBJelly set to only do internal gap filling to address the issue with the overall length. It performed better with this argument set but still too long at 500Mb, and the BUSCO results were still a bit worse than the input at 88.4% (which is 1 core gene less than what was found for the GapFiller results that were the input).
So I could use some advise on how to tune PBJelly for my project. Are there certain input assembly metrics I can look at to drive my choice of parameters to set? Any advice would be greatly appreciated.
Thanks,
John
supernova (using 10x genomics data)
SSPACE-LongRead (using pacbio sequel data)
GapFiller (using 10x genomics data)
PBJelly (using pacbio sequel data)
I saw steady improvement of the assembly up through GapFiller, but when I ran PBJelly at default settings the output seem to be in worse shape than the input. Our guiding metrics were total assembly length (which we expect to be 400Mb) and BUSCO completeness. The GapFiller results looked good at 414Mb total length & 88.8% core genes being found by BUSCO. But the output of default PBJelly grew in size to 550Mb and the BUSCO completeness dropped to 82.8%.
I then tried running PBJelly set to only do internal gap filling to address the issue with the overall length. It performed better with this argument set but still too long at 500Mb, and the BUSCO results were still a bit worse than the input at 88.4% (which is 1 core gene less than what was found for the GapFiller results that were the input).
So I could use some advise on how to tune PBJelly for my project. Are there certain input assembly metrics I can look at to drive my choice of parameters to set? Any advice would be greatly appreciated.
Thanks,
John
Comment