SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
map with defined number of mismatches Pol8 Bioinformatics 1 10-16-2015 08:20 PM
BWA:getting hits with given number of mismatches and indels Chandana Bioinformatics 0 01-11-2012 10:03 AM
Number of mismatches in Bowtie Rachelly Bioinformatics 0 05-08-2011 12:55 AM
bowtie sam output, number of mismatches sridharacharya Bioinformatics 2 01-08-2011 05:22 PM
PubMed: High-resolution mapping of copy-number alterations with massively parallel se Newsbot! Literature Watch 0 12-02-2008 05:00 AM

Reply
 
Thread Tools
Old 07-21-2016, 02:55 AM   #1
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Question Allowing a high number of mismatches when mapping

Dear all,

I have sequences of 53bp, among which between 23 and 30 bases are of interest (=motifs). For simplicity, I took only the first 17 bases. Each sample has between 5 and 23 millions of reads.
The reference is composed of 7450 distinct sequences. I took the 17 first bases of the reference sequences for simplicity.
My goal is to map the motifs to the reference.

If there was no sequencing error, I would find only 7450 distinct motifs in my samples. There was a problem during the sequencing most likely and 25% of the reads have poor quality.
When mapping with bowtie
Code:
bowtie --best --strata -v 2 -k 1 -m 1 --norc
the mapping rate is ~ 70-82%.
I used -v 3 on two samples, and it increases the mapping rate of ~ 1.5% only.

Since my reference is small (7450 distinct sequences), I know that with less than 17 bases (sometimes 6 bases are sufficient), I can uniquely identify from which of the 7450 references the sequence comes. Thus, I need to allow for this specific case a higher number of mismatches (bowtie is limited to 3).

I intend to try bowtie2 in local mode. I do not know it, but RMAP (https://omictools.com/rmap-tool) seems to correspond to my question.

Could you please give me some suggestions/ideas to deal with this particular case?
Thank you a lot for your help.
Jane M is offline   Reply With Quote
Old 07-21-2016, 04:09 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I suggest you try BBMap, which is quite tolerant of low identity; it typically allows mapping down to around 60-70% identity. For very high sensitivity, try this command:

Code:
bbmap.sh in=reads.fq out=mapped.sam vslow minid=0.6 maxindel=5 k=11
Using only the first 17 bp of sequences will hurt the ability to map with BBMap, though; you need to use the full sequences.
Brian Bushnell is offline   Reply With Quote
Old 07-24-2016, 11:56 PM   #3
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Thank you Brian for your suggestion.
I am doing some tests with Bowtie on shorter sequences and if it doesn't work, I will try BBMap. The maximum length I can use is 23 bp. Would it be sufficient?

Last edited by Jane M; 07-25-2016 at 01:08 AM.
Jane M is offline   Reply With Quote
Old 07-25-2016, 10:10 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

23 is fine, but more bases will always increase specificity. If your sequences are 53 bp, why are you cutting them down to 23?
Brian Bushnell is offline   Reply With Quote
Old 07-26-2016, 01:31 AM   #5
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Thank you for your answer.

I am working on a sh screen. The first 22-30 bases are common to all sequences. Between 23 and 31 bases correspond to the sh in each sequence.
Since there is a problem of quality at the end (from the middle in fact), I use the minimum number of bases (from the left) needed to discriminate the sh.
Jane M is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:33 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO