SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Interpreting INDELs aforntacc Bioinformatics 7 02-24-2015 10:12 AM
Demultiplexing MiSeq Runs with Miseq Reporter cinimod Illumina/Solexa 2 12-17-2014 09:58 PM
Finding Indels Elmezzi Bioinformatics 2 06-07-2014 07:56 AM
Annotating indels anjulka Bioinformatics 4 03-22-2012 07:13 AM
samtools indels Robby Bioinformatics 3 11-08-2011 07:02 AM

Reply
 
Thread Tools
Old 04-30-2015, 10:25 AM   #1
nmerienn
Member
 
Location: Switzerland

Join Date: Sep 2014
Posts: 12
Default Indels quantifications with MiSeq

Dear all,

I have troubles for analysing CRISPR experiments with sequencing. I am using PCR to amplify the target sequence of my sgRNA (classical PCR product of 400-500 bp) and then use paired-end sequencing with the MiSeq platform with reads of 250bp. My goals are to:
-determine the number of reads containing indels at the target site to infere the percent of edited sequences in my sample
-determine the locations of the indels
-annotate the variant types (synonymous,...) and determine their frequency in the pool of reads containing indels

To do this, I have difficulties to determine the best tools to use. I am planning to apply the classical first steps of reads processing (filter based on Phred quality,...) and then to use BWA-MEM or Bowtie2 for alignment on the PCR amplicon sequence. Are these aligners suitable for such applications?

My first idea for indels quantification was to process the BAM files to remove PCR duplicates and make it compatible with GATK HaplotypeCaller (using PICARD tool). I found GATK often used for WGS applications but is it a good tool to determine indels in PCR product sequencing? In addition, if this is optimum, should I trimm the reads around the expected zone of NHEJ before the analysis (for CRISPR, let say the target sequence +/- 10bp) or should I use the whole reads and perform indels realignments before variant calling? If GATK is not appropriate, is anyone know other more suitable tools?

So, as you can see, it is a litle confusing for the moment... I hope my questions are clear, if not, do not hesitate to tell me.
Thank you for your help.

Nicolas
nmerienn is offline   Reply With Quote
Old 05-01-2015, 06:24 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

You may also wish to try out pindel and cortex. I've found both of them to be useful in addition to GATK. Unfortunately I do not have a 'single step' answer.
westerman is offline   Reply With Quote
Old 05-01-2015, 06:56 AM   #3
nmerienn
Member
 
Location: Switzerland

Join Date: Sep 2014
Posts: 12
Default

Dear Westerman,

Thank you for your answer. I was suspected that the answer would not be easy...
If I understand well how pindel and cortex are working, it is doing the same kind of analysis as HaplotypeCaller. However I found pindel more efficient than GATK for longer indels. So do you think I should run both analysis in parallel and then "merge" results or just use pindel and/or cortex to have another confirmation of results obtained with GATK? Then, I had difficulties to find confirmations that these tools are compatible with indels analysis on amplicons sequencing, could you just confirm me if this is the case?

Thank you very much for your help.
Nicolas
nmerienn is offline   Reply With Quote
Old 05-01-2015, 07:08 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

I would merge the three of them together. I say this because, at least for the dataset I was recently processing, I could manually look at the alignments (via IGV) and see places where one or the other missed an InDel.
westerman is offline   Reply With Quote
Old 05-01-2015, 08:12 AM   #5
nmerienn
Member
 
Location: Switzerland

Join Date: Sep 2014
Posts: 12
Default

Thank you for your help. I will try like this and come back if I have problems.
Nicolas
nmerienn is offline   Reply With Quote
Old 05-01-2015, 08:52 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

If you are interested in longer indels, I suggest you map with BBMap and not do indel realignment before variant calling. Bowtie2 and bwa-mem will only find short indels.
Brian Bushnell is offline   Reply With Quote
Old 05-01-2015, 09:54 AM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by Brian Bushnell View Post
If you are interested in longer indels, I suggest you map with BBMap and not do indel realignment before variant calling. Bowtie2 and bwa-mem will only find short indels.
While I believe that bowtie2/bwa can not find long indels by themselves it is my understanding that programs such as cortex and pindel can find the long indels via looking at the bowtie/bwa mapping for indel breakpoints.

That said, I should try BBMap on my dataset. It is a very good program that I should use more often.
westerman is offline   Reply With Quote
Old 05-02-2015, 02:03 AM   #8
nmerienn
Member
 
Location: Switzerland

Join Date: Sep 2014
Posts: 12
Default

I found in publications that indels created by the CRISPR system are often small and centered at the cleavage site (majority are less than 10bp). Is it too large to be tolerated by Bowtie 2 of BWA-MEM? In particular, I found that we can set manually the threshold of number of indels, mismatches and gaps with Bowtie 2.
I will try first with Bowtie 2 (to have a first idea of results) and use IGV to look manually at the indels size. I will then compare with BBMap to check if results are different between both.
Thank you for your comments!
Nicolas
nmerienn is offline   Reply With Quote
Old 05-02-2015, 09:43 AM   #9
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Perhaps publications describing the length of indels created by CRISPR only describe indels under 10bp because they are detecting them using tools that can only find indels under 10bp. I am not an expert on CRISPR, but when analyzing bacterial RNA-seq data, I found numerous deletions in the several hundred bp range within or adjacent to CRISPR-related genes. They were kind of interesting in that they clearly clustered together, but the boundaries often did not perfectly agree, and I don't know exactly what they were. I don't know if that's relevant to what you're studying, though.
Brian Bushnell is offline   Reply With Quote
Old 05-04-2015, 09:02 AM   #10
nmerienn
Member
 
Location: Switzerland

Join Date: Sep 2014
Posts: 12
Default

Indeed, this could be an explanation. In our case, we are editing the mammalian genome with an adapted CRISPR system (with a targeting on a specific gene). So there is no real CRISPR related genes as observed in bacteria with the presence of the repeated spacers. However, large indels at the target sites have been previously found but in rare cases. So I will do both analysis to determine the relative presence of large and short indels at our target site.
Thank you very much for your advices.
Nicolas
nmerienn is offline   Reply With Quote
Reply

Tags
alignment, crispr, gatk, indels, miseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:28 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO