SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Design of barcoding primers for ion torrent NGS of a CRISPR genome wide screen pierre sujobert Sample Prep / Library Generation 0 05-10-2014 05:18 AM
NGS data-QC, Kmer, Genome size finding bioman1 Bioinformatics 1 01-24-2014 03:23 AM
Help with finding complementary sequences in ngs data Ami Bioinformatics 1 04-02-2013 11:48 PM
Finding Perfect matches and mismatches bhargavpbk88 Bioinformatics 2 01-19-2012 07:00 AM

Reply
 
Thread Tools
Old 07-24-2014, 01:40 AM   #1
shambo
Junior Member
 
Location: London

Join Date: May 2012
Posts: 6
Default Finding CRISPR gRNA off-target matches using NGS mappers

Hi all,

To keep it brief I have about 3 million, 23bp sequences (gRNAs) which are putative CRIPSR-Cas9 sites, all of which I need to screened for off-target sites against the hg19 genome. I'm looking for mappings with up to 4 mis-matches, but for ALL alignments to be reported.

I've been using an R package called CRISPRseek, but to scan each gRNA would take ages, so I thought I would look at using NGS mappers to do the job.

The Sanger use bwa with these params:

bwa aln -n 5 -o 0 -l 20 -k 4 -N -m 1000000000 ${GENOME} ${CRISPR_FQ_FILE}

I tried this using just one gRNA, and the number of hits I got back were about 150 fewer than the R package function (350 vs 500). I manually looked at some sequences bwa missed, and I see no reason why bwa would reject them.

The question is whether there is another aligner I could consider? STAR will not work, and I don't think bowtie will allow up to 4mm. Alternatively, if there is a tweak to the above params I would be happy to hear it.

TIA,

Sham
shambo is offline   Reply With Quote
Old 07-24-2014, 03:56 AM   #2
lre1234
Senior Member
 
Location: US

Join Date: Aug 2011
Posts: 106
Default

While using a mapper to find matches is actually a nice idea, I am not sure that it would work. From my understanding (and i'm a little rusty on all of the details) of the CRISPR system is that it is not just the 23 nt sequence that is important, but also the few bp's that follow your sequence of interest also see (http://www.sciencemag.org/content/339/6121/823.full) . For example:

5'-YOUR23NTSEQUENCE-NGG-3'

Where the N is a wild card and can be any base followed by a GG. Essentially, you are looking for regions in the genome in which have your sequence (with or without some mismatches) followed by the NGG sequence. I would assume(?) that the R- program is taking this into account, but when mapping your not which may be why the mapping missed some sites. Perhaps, trying to artificially add the NGG to your sequence then re-map and see what you get might improve the system (not sure if this would work or if it is even possible).

Also, see here, in which they have a tool that can be used, but it might be a little cumbersome with 3 million sequences:
http://www.genome-engineering.org/crispr/?page_id=159
lre1234 is offline   Reply With Quote
Old 07-24-2014, 04:59 AM   #3
shambo
Junior Member
 
Location: London

Join Date: May 2012
Posts: 6
Default

You're absolutely right, there is a 3 base PAM (NGG) on the end of the sequence. In my case all the sequences had the NGG when they were supplied to the mapper.

You are also right when you say the R package takes the preservation of the NGG into account when looking for off-target sites, which is why I was surprised when bwa returned fewer when it is free to mis-match anywhere within the sequence. Mu thought was to use the mapper and then screen for hits where the NGG rule is preserved.

I was pretty much copying what was written here

http://www.sanger.ac.uk/htgt/wge/crispr_help

so in principle it seems it should work, so why it doesn't is bit of a mystery to me right now.

Thanks,

Sham
shambo is offline   Reply With Quote
Old 07-24-2014, 05:26 AM   #4
lre1234
Senior Member
 
Location: US

Join Date: Aug 2011
Posts: 106
Default

Maybe, when you are doing the mapping, you are getting mismatches in the 'GG' which is resulting in the lower number of hits. Not sure if it's possible, but maybe there is a way to tell BWA to allow 4 mismatches, but the last 'GG' has to be fixed and no mismatches allowed in those 2 bps. This way, the only mismatches are in the rest of the fragment.

Another thought would be that you would want to potential search the genome for the 'NGG' sequence first, then see if the rest of your sequence can align following that location with allowing the up to 4-mismatches.
lre1234 is offline   Reply With Quote
Old 07-24-2014, 05:38 AM   #5
shambo
Junior Member
 
Location: London

Join Date: May 2012
Posts: 6
Default

I had this thought too. Looking at the bwa command line, they specify the seed as -l 20bp with -k4 mismatches. I bumped up the number of mm to -k 5 (just to see) and not many more sites were found.

I think the second part of your suggestion would be quite tricky to go given the number of NGG's in the genome, but if you have a way of doing that quickly I'd be happy to hear.

Thanks for the brain storming session! I think I might try and contact the sanger guys and see if there is something I'm missing.

Sham
shambo is offline   Reply With Quote
Old 07-24-2014, 09:39 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Sham,

You might try BBMap, which is quite sensitive and supports short seeds (value of 'k'). By default k=13 and it gets exponentially slower (with higher sensitivity) as you reduce it, but it should still be fast enough for only 3m short sequences. If 9 is not sensitive enough you can try reducing it as low as 7, perhaps. The command would be something like this:

bbmap.sh -Xmx23g in=reads.fq ref=hg19.fa k=9 maxindel=4 slow out=mapped.sam ambig=all
Brian Bushnell is offline   Reply With Quote
Old 07-25-2014, 12:49 AM   #7
shambo
Junior Member
 
Location: London

Join Date: May 2012
Posts: 6
Default

Thanks Brian. I'll give it a shot and report back.
shambo is offline   Reply With Quote
Reply

Tags
cas9, crispr, mapping, ngs analysis, off-target

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:50 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO