I have a SNP discovery pipeline for a specific bacteria that outputs BAM and VCFs. To this pipeline I have added steps which take the unmapped reads, assembles them using ABySS and does a local BLAST search. This allows one to determine if there are any contigs not aligning to the reference, and then shows if there has been any contamination present in the sample, library prep or instrument.
However ABySS removes reads that do not have overlap. Will ABySS (or any other assembler) allow one to output fragments that do not contribute to the assembly? If I seed my fastq files with a low level of "contaminated" reads/fragments, because there are no kmer overlaps, these "contaminated" sequences are thrown out and not included in the output files. I would like to see any reads that are not part of the contigs from the ABySS output. Is it possible to determine the unused reads/fragments?
However ABySS removes reads that do not have overlap. Will ABySS (or any other assembler) allow one to output fragments that do not contribute to the assembly? If I seed my fastq files with a low level of "contaminated" reads/fragments, because there are no kmer overlaps, these "contaminated" sequences are thrown out and not included in the output files. I would like to see any reads that are not part of the contigs from the ABySS output. Is it possible to determine the unused reads/fragments?
Comment