Hi everyone,
I am just getting started using SVDetect and am having some trouble interpreting some strange results.
I am using mapped 300 bp PE reads (some are trimmed to less than 300 mp) which have been filtered for "abnormally" mapped reads with the script provided with SVDetect (BAM_preprocessingPairs.pl). I still need to take a closer look at the abnormal reads to see what's going on with them. Mostly they should have an insert size less than or more than a certain cutoff, or they could have incorrect mapping orientation (not RF).
After running SVDetect, I see that most of the variants are these strange "undefined" variants that all have a separation distance of 299-300 bp (see the example below). I have almost no insights as to what could be causing this, although it seems to be an artifact of some sort and probably not biologically relevant (although the reads are mapped to a mitochondrial genome which does have clusters of genes and repeats). Could this be caused by some reads being less than 300 bp (SVDetect is expecting 300 bp reads based on the input I gave it)? Some of the reads are quality trimmed and hence might be shorter than 300 bp.
A small amount of the variants seem to be correct (based on previous published data). See the "Normal expected variants" below.
Also, when I compare the variants (using links2compare in SVDetect) between two different samples (the samples are two different genotypes which are expected to have different variants compared to the reference) all of the undefined variants are in common, and only insertions/deletions/inversions are unique to the two samples.
Any suggestions as to what could be causing the undefined variants would be greatly appreciated!
Weird "undefined" variants
Normal expected variants
I am just getting started using SVDetect and am having some trouble interpreting some strange results.
I am using mapped 300 bp PE reads (some are trimmed to less than 300 mp) which have been filtered for "abnormally" mapped reads with the script provided with SVDetect (BAM_preprocessingPairs.pl). I still need to take a closer look at the abnormal reads to see what's going on with them. Mostly they should have an insert size less than or more than a certain cutoff, or they could have incorrect mapping orientation (not RF).
After running SVDetect, I see that most of the variants are these strange "undefined" variants that all have a separation distance of 299-300 bp (see the example below). I have almost no insights as to what could be causing this, although it seems to be an artifact of some sort and probably not biologically relevant (although the reads are mapped to a mitochondrial genome which does have clusters of genes and repeats). Could this be caused by some reads being less than 300 bp (SVDetect is expecting 300 bp reads based on the input I gave it)? Some of the reads are quality trimmed and hence might be shorter than 300 bp.
A small amount of the variants seem to be correct (based on previous published data). See the "Normal expected variants" below.
Also, when I compare the variants (using links2compare in SVDetect) between two different samples (the samples are two different genotypes which are expected to have different variants compared to the reference) all of the undefined variants are in common, and only insertions/deletions/inversions are unique to the two samples.
Any suggestions as to what could be causing the undefined variants would be greatly appreciated!
Weird "undefined" variants
Code:
chr_type SV_type BAL_type chromosome1 start1-end1 average_dist chromosome2 start2-end2 nb_pairs score_strand_filtering score_order_filtering score_insert_size_filtering final_score breakpoint1_start1-end1 breakpoint2_start2-end2 INTRA UNDEFINED UNBAL chrNC_007579.1 24705-25998 299 chrNC_007579.1 24706-25999 607 100% 100% 100% 1 24806-24705 25999-25898 INTRA UNDEFINED UNBAL chrNC_007579.1 38201-39491 299 chrNC_007579.1 38202-39492 568 100% 100% 100% 1 38299-38201 39492-39394 INTRA UNDEFINED UNBAL chrNC_007579.1 38704-39997 299 chrNC_007579.1 38705-39998 550 100% 100% 100% 1 38805-38704 39998-39897
Code:
chr_type SV_type BAL_type chromosome1 start1-end1 average_dist chromosome2 start2-end2 nb_pairs score_strand_filtering score_order_filtering score_insert_size_filtering final_score breakpoint1_start1-end1 breakpoint2_start2-end2 INTRA INVERSION UNBAL chrNC_007579.1 105503-106542 23335 chrNC_007579.1 128858-129910 312 100% 100% - 1 105350-105503 128718-128858 INTRA INVERSION UNBAL chrNC_007579.1 156971-157791 28358 chrNC_007579.1 185278-186246 174 99% 100% - 0.994 157791-158163 186246-186470