Are there any stand alone tools for chimeric reads/clones detection in the de novo WGS datasets - like cosmid/BAC clones, sanger reads, moleculo mini asms or pacbio/nanopore error corrected sequences.
The assembly is a de novo, without known reference. We have ~50X illumina and ~40X pacbio datasets for our organism. (~9MB)
All the tools I've found so far target either amplicon sequenceing (usearch, vsearch, OCTOPUS, etc), or fusion transcript detection (mainly cancer RNASeq projects).
But there seems to be no reference to the tools which perform chimera junction detection and splitting when faced with 20x-100X WGS dataset of ~Q40 reads data.
PS: Most de novo assemblers (PHRAP, MIRA, Celera) do have a chimera detection module, but quite often it is not configurable, and struggle with chimera detection when presented with 10kb+ reads.
Ideally one would like to have a tool, which can detect chimeric junctions based on kmers/alignment with other reads in the dataset, and split the chimeric reads on those junctions, with *.fastq or *.fasta + *.qual output.
The assembly is a de novo, without known reference. We have ~50X illumina and ~40X pacbio datasets for our organism. (~9MB)
All the tools I've found so far target either amplicon sequenceing (usearch, vsearch, OCTOPUS, etc), or fusion transcript detection (mainly cancer RNASeq projects).
But there seems to be no reference to the tools which perform chimera junction detection and splitting when faced with 20x-100X WGS dataset of ~Q40 reads data.
PS: Most de novo assemblers (PHRAP, MIRA, Celera) do have a chimera detection module, but quite often it is not configurable, and struggle with chimera detection when presented with 10kb+ reads.
Ideally one would like to have a tool, which can detect chimeric junctions based on kmers/alignment with other reads in the dataset, and split the chimeric reads on those junctions, with *.fastq or *.fasta + *.qual output.