Hi, the default smrt analysis pipeline keeps the long reads that have at least 3 passes of CCS. For my project, I want to get >=6 passes.
I know I can filter them by the polymerase read length (eg. >= 6xCCS + 7xadpater). But the length of each amplicon differs, so I want to filter each long read based on its own CCS length.
I know bash5tools.py can filter the reads by number of passes, but it only outputs fasta, fastq, or data format result. After filtering, I want to use blasr to map the reads. Blasr has the option to map CCS first, and then map the subreads to the window that CCS maps, so I think I'd better use bas.h5 instead of fastq as the input.
So the question is, how to filter CCS reads by minimum number of passes and output the filtered data as bas.h5?
Thanks.
I know I can filter them by the polymerase read length (eg. >= 6xCCS + 7xadpater). But the length of each amplicon differs, so I want to filter each long read based on its own CCS length.
I know bash5tools.py can filter the reads by number of passes, but it only outputs fasta, fastq, or data format result. After filtering, I want to use blasr to map the reads. Blasr has the option to map CCS first, and then map the subreads to the window that CCS maps, so I think I'd better use bas.h5 instead of fastq as the input.
So the question is, how to filter CCS reads by minimum number of passes and output the filtered data as bas.h5?
Thanks.
Comment