Dear all,
I am looking at sequence data from a fungal microorganism strain mapped to an assembled reference genome sequence:
Number of reads: 5.5 Mill Paired End
Platform : Illumina HiSeq 2000
Length read : 126 bp
I want to retrieve the number of insertions / deletions between different strains and I applied BCFtools to list indels found with reasonable coverage in CDS regions. However, the identified indels are too many and not convincing as they include repetitions. (e.g. GCAACAGCAGCAACAGCAACAGCAGCAACA/GCA)
Please also look at the attached IGV
Do you know a way to filter meaningful indels? Which criteria would you apply?
Best ; Tom
I am looking at sequence data from a fungal microorganism strain mapped to an assembled reference genome sequence:
Number of reads: 5.5 Mill Paired End
Platform : Illumina HiSeq 2000
Length read : 126 bp
I want to retrieve the number of insertions / deletions between different strains and I applied BCFtools to list indels found with reasonable coverage in CDS regions. However, the identified indels are too many and not convincing as they include repetitions. (e.g. GCAACAGCAGCAACAGCAACAGCAGCAACA/GCA)
Please also look at the attached IGV
Do you know a way to filter meaningful indels? Which criteria would you apply?
Best ; Tom