SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Alignment to selected region of the reference genome houkto General 1 02-20-2012 05:51 AM
Too many reads mapping towards intronic region sanush SOLiD 6 04-14-2010 03:23 AM
Too many reads mapping towards intronic region sanush RNA Sequencing 1 04-13-2010 10:10 AM
Region mapped by reads in only one orientation bioinfosm Bioinformatics 2 04-10-2009 08:37 AM
why reads piled up at repeat region? qiudao Bioinformatics 6 10-09-2008 02:42 PM

Reply
 
Thread Tools
Old 03-16-2011, 05:37 AM   #1
m_elena_bioinfo
Member
 
Location: Ospedali Riuniti di Bergamo, ITALY

Join Date: Oct 2009
Posts: 99
Default alignment reads in homologous region

Hi NGS users,
I'm analyzing SureSelect sequencing
I noticed that BWA/SAMTOOLS in mapping step, align reads in target genes and also in regions that are homologous to my target gene but the are not in baits.
does anyone knows how correct this bias? This unappropriate alignment could affect the SNP calling.
Thanx a lot,
bye!
m_elena_bioinfo is offline   Reply With Quote
Old 03-16-2011, 05:58 AM   #2
stefanoberri
Member
 
Location: Cambridge area, UK

Join Date: Jan 2010
Posts: 35
Default

Hi. I have never done capture, but we will very soon, so I though about these problems. I hope my contribution is useful.

You could use, as "reference genome", a file that only contains your targets.

however, the pull down is probably not 100% on target, so this might introduce artifacts

The baits, on the other hand, might actually pull down a region very similar to their desired target, so it might be that your sequence comes from a homologous region, and the aligner get it right.
stefanoberri is offline   Reply With Quote
Old 03-16-2011, 06:42 AM   #3
m_elena_bioinfo
Member
 
Location: Ospedali Riuniti di Bergamo, ITALY

Join Date: Oct 2009
Posts: 99
Default

Thanx a lot Stefano.
I have already tried to align my reads against my target sequence (using sure select design as reference genome) but for experience, I can say that it forces too much the alignment with high risk of false positive.

I don't know if my problem is at an experimental level (in the enrichment protocol) or in the mapping, because it's not a problem of repetitive regions but a problem in enrichment for homology.
m_elena_bioinfo is offline   Reply With Quote
Old 03-16-2011, 06:45 AM   #4
stefanoberri
Member
 
Location: Cambridge area, UK

Join Date: Jan 2010
Posts: 35
Default

just to know, what sort of similarity is there between a homologous region and your target?
stefanoberri is offline   Reply With Quote
Old 03-16-2011, 06:53 AM   #5
adamdeluca
Member
 
Location: Iowa City, IA

Join Date: Jul 2010
Posts: 95
Default

Limiting your reference is a bad idea as similar regions differing by a single base will look like a SNP if both regions are not included in the reference.

A common approach to dealing with this is to scale the quality of your alignment based on the uniqueness of the mapping (following the GATK best practices will do this).

While some the the off-target effect you are seeing is likely due to incorrect mapping, because you are enriching for these regions by hybridization, there is likely off-target capture as well.
adamdeluca is offline   Reply With Quote
Old 03-16-2011, 06:53 AM   #6
m_elena_bioinfo
Member
 
Location: Ospedali Riuniti di Bergamo, ITALY

Join Date: Oct 2009
Posts: 99
Default

about 94% of homology
m_elena_bioinfo is offline   Reply With Quote
Old 03-16-2011, 07:03 AM   #7
m_elena_bioinfo
Member
 
Location: Ospedali Riuniti di Bergamo, ITALY

Join Date: Oct 2009
Posts: 99
Default

Adamdeluca, you are right!
in my alignment pipeline, I used always GATk realign and recalibration score. But I can observe reads mapped in homologuos region in the final bam.
So...the question is: it is an experimental problem of enrichment for homology (and in this case I cannot do anything) or there is a way (parameter-script...) to understand when I map reads off target in homologuos genes?

And, if the problem is an incorrect mapping (I use BWA and Samtools for this step), how can I limit the mapping to other region? In some case, mapping quality of the reads is very low, so I can discriminate off target region, in other cases I have a high mapping quality and a high coverage, so I can't differentiate target and off target (with the risk of false positive in SNP calling)
m_elena_bioinfo is offline   Reply With Quote
Old 03-16-2011, 07:27 AM   #8
adamdeluca
Member
 
Location: Iowa City, IA

Join Date: Jul 2010
Posts: 95
Default

Quote:
Originally Posted by m_elena_bioinfo View Post
Adamdeluca, you are right!
in my alignment pipeline, I used always GATk realign and recalibration score. But I can observe reads mapped in homologuos region in the final bam.
So...the question is: it is an experimental problem of enrichment for homology (and in this case I cannot do anything) or there is a way (parameter-script...) to understand when I map reads off target in homologuos genes?

And, if the problem is an incorrect mapping (I use BWA and Samtools for this step), how can I limit the mapping to other region? In some case, mapping quality of the reads is very low, so I can discriminate off target region, in other cases I have a high mapping quality and a high coverage, so I can't differentiate target and off target (with the risk of false positive in SNP calling)
great questions. Let me know if you figure them out.

By manually inspecting reads in these types of regions I can observe obvious miss-mapping, and obvious off-target capture.
adamdeluca is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:16 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO