Dear all,
I have troubles for analysing CRISPR experiments with sequencing. I am using PCR to amplify the target sequence of my sgRNA (classical PCR product of 400-500 bp) and then use paired-end sequencing with the MiSeq platform with reads of 250bp. My goals are to:
-determine the number of reads containing indels at the target site to infere the percent of edited sequences in my sample
-determine the locations of the indels
-annotate the variant types (synonymous,...) and determine their frequency in the pool of reads containing indels
To do this, I have difficulties to determine the best tools to use. I am planning to apply the classical first steps of reads processing (filter based on Phred quality,...) and then to use BWA-MEM or Bowtie2 for alignment on the PCR amplicon sequence. Are these aligners suitable for such applications?
My first idea for indels quantification was to process the BAM files to remove PCR duplicates and make it compatible with GATK HaplotypeCaller (using PICARD tool). I found GATK often used for WGS applications but is it a good tool to determine indels in PCR product sequencing? In addition, if this is optimum, should I trimm the reads around the expected zone of NHEJ before the analysis (for CRISPR, let say the target sequence +/- 10bp) or should I use the whole reads and perform indels realignments before variant calling? If GATK is not appropriate, is anyone know other more suitable tools?
So, as you can see, it is a litle confusing for the moment... I hope my questions are clear, if not, do not hesitate to tell me.
Thank you for your help.
Nicolas
I have troubles for analysing CRISPR experiments with sequencing. I am using PCR to amplify the target sequence of my sgRNA (classical PCR product of 400-500 bp) and then use paired-end sequencing with the MiSeq platform with reads of 250bp. My goals are to:
-determine the number of reads containing indels at the target site to infere the percent of edited sequences in my sample
-determine the locations of the indels
-annotate the variant types (synonymous,...) and determine their frequency in the pool of reads containing indels
To do this, I have difficulties to determine the best tools to use. I am planning to apply the classical first steps of reads processing (filter based on Phred quality,...) and then to use BWA-MEM or Bowtie2 for alignment on the PCR amplicon sequence. Are these aligners suitable for such applications?
My first idea for indels quantification was to process the BAM files to remove PCR duplicates and make it compatible with GATK HaplotypeCaller (using PICARD tool). I found GATK often used for WGS applications but is it a good tool to determine indels in PCR product sequencing? In addition, if this is optimum, should I trimm the reads around the expected zone of NHEJ before the analysis (for CRISPR, let say the target sequence +/- 10bp) or should I use the whole reads and perform indels realignments before variant calling? If GATK is not appropriate, is anyone know other more suitable tools?
So, as you can see, it is a litle confusing for the moment... I hope my questions are clear, if not, do not hesitate to tell me.
Thank you for your help.
Nicolas
Comment