SEQanswers

Go Back   SEQanswers > Applications Forums > Genomic Resequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
SNP calling from transcriptome data lre1234 Bioinformatics 2 08-26-2011 08:16 AM
SNP calling from a reference sequence blackrabite Genomic Resequencing 2 05-21-2011 08:48 PM
Find long indel from captured gene sequence tzuseq Introductions 0 03-15-2011 03:08 PM
SNP calling on 454 data bioinfosm 454 Pyrosequencing 13 12-23-2009 03:35 AM
SNP calling on 454 data bioinfosm Bioinformatics 0 10-15-2008 10:53 AM

Reply
 
Thread Tools
Old 03-22-2011, 11:56 PM   #1
Mali Salmon
Member
 
Location: UK

Join Date: Jul 2008
Posts: 24
Default SNP calling for captured sequence data

Hi All
I am currently analysing captured sequence data (DNA sequencing using illumina platform), with the aim of finding somatic mutations.
Can you please advice on the following issues:
1. Shall I align the reads to the captured gene sequences or to the whole genome (and then select for those reads that fall within interesting regions)?
2. Is it important to remove duplicated reads before SNP calling? duplicated reads can be due to PCR artifact, but since I deal with captured data maybe there is higher chance to get duplicated reads?
For example for one SNP, before removing duplicated reads I get 515 reads supporting that region, but after removing duplicates, this number dropped to only 4 reads.
3. Would you recommend to filter out SNPs are indicated only from one strand?
Thanks
Mali
Mali Salmon is offline   Reply With Quote
Old 03-23-2011, 12:39 AM   #2
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default

Hi mali,

1) We do alignment to the whole genome (that gives you hints about enrichment quality, e.g. counts of on- and off-target reads)

2) If you are looking for somatic mutations I would get rid of PCR duplicates before SNP calling. I actually do not know anything about your experimental setup, but if you expect mutations in low frequency (due to contamination of cell populations not carrying the mutation as in tumours) this is an important step. If you've git paired-end data duplication detection is a lot easier as PCR duplicates would have the same start and end for both mates, which rarely occurs by chance.

There is definitely a higher chance of duplicated reads in captured DNA (keeping in mind that each enrichment contains amplification steps)

3) As far as I know SNPs indicated only on one strand ar nearly all false positives, but any corrections welcome.

Just out of curiosity: When you check somatic mutations, are you interested in quantitating them as well (e.g. tumorgenetics: how many percent of the input samples derived from tumour)?
ulz_peter is offline   Reply With Quote
Old 03-23-2011, 12:40 AM   #3
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

Hi,

1. The danger of aligning to only a certain region is that the aligner will try to find a spot for all reads in that region. This means that reads that do not belong there will be placed there. So if you do decide to align to only a region, make sure you use very strict settings. (for example: CLCBio cannot handle aligning HiSeq exome reads to the entire genome, so we have no choice but to align per chromosome) Long story short: align to the entire genome (supercontigs included if any) if that is computationally possible.

2. We flag duplicates (Picard MarkDuplicates) but do not remove them. Our downstream analysis tools (GATK, SAMTools, Picard) recognises the flags.

3. in some situations, that may cause you to lose true positive variants. Perhaps you can flag those cases. If such a variant seems very interesting (because it passes all kinds of other filtersteps) you can always go back to eyeballing it.

These are very interesting issues and I'd like to read other people's methods and opinions too!
Bruins is offline   Reply With Quote
Old 03-23-2011, 12:45 AM   #4
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default

Regarding the strand bias thing:
GATK SNP calling outputs the SB value in it's VCF output format, but does anyone know, a threshold value when to regard a SNP as strand biased?
ulz_peter is offline   Reply With Quote
Old 03-23-2011, 01:12 AM   #5
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by ulz_peter View Post
Regarding the strand bias thing:
GATK SNP calling outputs the SB value in it's VCF output format, but does anyone know, a threshold value when to regard a SNP as strand biased?
I do sequencing of target enrichment for genes, looking for somatic mutations.

For 1000 targeted genes, I get ~80X median coverage per target bp before marking duplicates (drops down to 40X afterwards).

These are the "hard filters" I use for GATK SNP calling:

--clusterWindowSize 20
--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)"
--filterName "HARD_TO_VALIDATE"
-B:mask,Bed,$sampleID.indel.mask.bed
--maskName InDel
--filterExpression "QUAL < 30.0 || DP < 5 || QD < 1.5 || HRun > 5 || SB > -10.0 || MQ < 25.0"
--filterName StdFilter
--filterExpression "DP < 8 || QD < 2.5"
--filterName LowConf
NGSfan is offline   Reply With Quote
Old 03-23-2011, 01:19 AM   #6
Mali Salmon
Member
 
Location: UK

Join Date: Jul 2008
Posts: 24
Default

Thank you guys for your reply.
The data I have are from people the carry a known heterozygous mutation, and we want to check if we can find this mutation using capture+sequencing. We also expect to have contamination of cell populations not carrying the mutation, so if I understand you correctly, removing/flag duplicated reads is an important step.
Regarding MarkDuplicates, samtools will ignore flagged duplicated reads right? so I expect to get the same SNPs if I remove duplicates or flag them.
Mali Salmon is offline   Reply With Quote
Old 05-04-2011, 02:05 AM   #7
gavin.oliver
Senior Member
 
Location: uk

Join Date: Jan 2010
Posts: 110
Default

Hi all,

Sorry to resurrect an old thread but there's something I don't understand RE strand bias.

When we mention strand biased SNPs I am assuming we mean instances when the mutation appears on one strand, but not the other.

However the original post is about captured DNA data. SO far as I am aware, the capture process will only probe one strand - so how would we even be aware of strand biased SNPs in a set-up like this?

Responses would be GREATLY appreciated as I am designing a similar experiment and this has me stumped.

Cheers

Afterthought: The capture can probe the forward strand or the reverse strand but not both together as the baits will hybridise to one another during synthesis. Does this mean that two completely separate capture/sequencing processes would be required to capture both strands and discover strand biased SNPs?

Last edited by gavin.oliver; 05-04-2011 at 02:19 AM.
gavin.oliver is offline   Reply With Quote
Old 05-04-2011, 02:51 AM   #8
Mali Salmon
Member
 
Location: UK

Join Date: Jul 2008
Posts: 24
Default

As I understood it (from the experimentalist who did the capturing), the baits probe one strand but next there was a PCR amplification step (using the adapter indexes), so the sequenced fragments are not strand specific.
So if I understand it correctly, the steps were as follow:
1. Library preparation (DNA fragmentation + adapters ligation)
2. Hybridisation with RNA baits (Myselect)
3. PCR amplification
Mali
Mali Salmon is offline   Reply With Quote
Old 05-04-2011, 02:55 AM   #9
gavin.oliver
Senior Member
 
Location: uk

Join Date: Jan 2010
Posts: 110
Default

I'm a bit dumb when it comes to lab protocols, but doesn't that mean you would just be creating the reverse complement of the strand you captured, and then sequencing it?

Thus, the true genomic complement (perhaps containing a strand biased polymorphism) is still lost?
gavin.oliver is offline   Reply With Quote
Old 05-05-2011, 12:05 AM   #10
gavin.oliver
Senior Member
 
Location: uk

Join Date: Jan 2010
Posts: 110
Default

Hi all - does anyone have any thoughts on my question? Surely it's not so difficult
gavin.oliver is offline   Reply With Quote
Old 05-05-2011, 12:14 AM   #11
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by gavin.oliver View Post
Hi all - does anyone have any thoughts on my question? Surely it's not so difficult
.
If understand correctly, you can just capture either strand because one strand will hold the polymorphism, while the other will have the reverse complement of that polymorphism.

Strand bias is just a PCR artifact - not an inherent characteristic of the genome your probing.

Last edited by NGSfan; 05-05-2011 at 12:16 AM.
NGSfan is offline   Reply With Quote
Old 05-05-2011, 12:14 AM   #12
Mali Salmon
Member
 
Location: UK

Join Date: Jul 2008
Posts: 24
Default

I suppose they used both forward and reverse adapters as primers for PCR, so you get fragments from both strands
Mali
Mali Salmon is offline   Reply With Quote
Old 05-05-2011, 12:40 AM   #13
gavin.oliver
Senior Member
 
Location: uk

Join Date: Jan 2010
Posts: 110
Default

Quote:
Originally Posted by NGSfan View Post
.
Strand bias is just a PCR artifact - not an inherent characteristic of the genome your probing.
Unless I am mistaken, mutations can be induced during genomic strand separation/transcription. These will only occur on one of the two genomic strands. Hence, a mutation that is only present on one of the two strands. The opposite strand at this genomic location will still appear normal.

This single strand capture method would fail to detect instances like these.
gavin.oliver is offline   Reply With Quote
Old 05-05-2011, 02:15 AM   #14
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Yes, mutations could happen during genomic strand separation and/or transcription. I would assume that the DNA mismatch repair machinery would pick up on these and fix the daughter strands


Maybe my assumptions are not valid for the kind of analysis you are doing? Are you looking at defects in mismatch repair?
NGSfan is offline   Reply With Quote
Old 05-05-2011, 02:21 AM   #15
gavin.oliver
Senior Member
 
Location: uk

Join Date: Jan 2010
Posts: 110
Default

Not specifically. I am just trying to confirm that mutations of this kind could not be picked up using a strand-specific capture.

Simply for my own knowledge
gavin.oliver is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO