Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variant Calling for Exome Capture Analysis

    HI

    I am trying various tools to call variants in the exome data i have to get to standardize our work flow. I have few parameters to discuss, like what should be the min read depth to get less false positive variant calls. Ans when we do indel calls what should we use for indel-supported reads parameter.
    I am analyzing GATK and samtools for the same. If any body having experience in this area please let me know the acceptable parameters to use for comparison.
    Thanks

    Saurabh

  • #2
    I'm interested in this too. Keep me posted on the indel/SNP options please? I'm not finding BioScope to be satisfactory for clients' needs.

    Comment


    • #3
      I tried both samtools and GATK for indel calling for BWA alignment results. The numbers from Samtools are way more than by GATK. But when i see the overlap between them that looks promising. Samtools is covering allmost all the indel calls from GATK. I didn't check the exculsive calls from Samtools.
      For this I used 6X coverage criterion and min 5 indel-supported reads to call as a indel. Probably best thing is to check the exculsive calls from samtools using IGV and flag them as false positive.

      Comment


      • #4
        Originally posted by sbaheti View Post
        I tried both samtools and GATK for indel calling for BWA alignment results. The numbers from Samtools are way more than by GATK. But when i see the overlap between them that looks promising. Samtools is covering allmost all the indel calls from GATK. I didn't check the exculsive calls from Samtools.
        For this I used 6X coverage criterion and min 5 indel-supported reads to call as a indel. Probably best thing is to check the exculsive calls from samtools using IGV and flag them as false positive.
        I'm running both samtools and GATK as well for our exome-captured data. I'm curious why you would consider samtools-exclusive indels as false positives.

        Comment


        • #5
          Remember to set a quality threshold (about 50) on indels. Also the best indel caller so far is believed to be Dindel.

          Comment


          • #6
            On spot check on the exculsive Indels from Samtools, i have noticed that there are two indels close to each other and samtools call one as a indel not the other. I am not sure why and GATK discards both of them. But i cant generalize this as i just checked few of those only. I am not sure we can get the comparision if we run default Samtools and GATK as both have different defulat parameters.

            Comment


            • #7
              Thanks lh3, i will definately try threshold for Indel as 50 and will look how Dindel works..
              Last edited by sbaheti; 08-27-2010, 01:34 PM.

              Comment


              • #8
                Originally posted by lh3 View Post
                Remember to set a quality threshold (about 50) on indels. Also the best indel caller so far is believed to be Dindel.
                How did the belief that dindel was the best indel caller came about? I haven't seen any comparison papers (at the least I am not aware of them) so I would honestly like to know.

                Comment


                • #9
                  Hi all,

                  For the indels that you are looking for, what are your read lengths and what are the sizes of the indels you expect to find?

                  Thanks,
                  Naomi

                  Comment


                  • #10
                    Originally posted by Nomijill View Post
                    Hi all,

                    For the indels that you are looking for, what are your read lengths and what are the sizes of the indels you expect to find?

                    Thanks,
                    Naomi
                    I am looking for accuarte indel calls with less false positives. We want to use the indel caller for exome capture analysis. The read length we normally use are 50 and 75. I am not sure about the size of the indels, but we always target for short indels.

                    Thanks,
                    Saurabh

                    Comment


                    • #11
                      I have one more question
                      We were looking at MAQ aligned reads for a dataset, calling variants using MAQ and Samtools, and the 2 results are not as similar as we expected. Could you comment on something we are missing here, or this is what you would expect too!

                      Results : (These numbers are for SNPs)
                      I am following convention as (Aligner_Variant caller)
                      MAQ_MAQ = 13525(Exclusive)
                      MAQ_SAMTOOLS = 21701 (Exclusive)
                      COMMON = 69689

                      SCRIPTS USED:
                      $ samtools pileup -vcf *.sorted.bam > *.pileup

                      Call the variant using samtool.pl perl script and using Varfilter and –N 3 max SNPs in a window
                      $ samtools.pl varFilter -N 3 -D 100 *.pileup > *.raw.variant

                      Discard the calls for variant if phred is less than 40
                      $ cat *.raw.snp | awk ‘$6>=40’ > *.raw.filtered.snp


                      $VER/maq cns2snp consensus.cns > cns.snp
                      $VER/maq indelpe $genome $map_file > cns.indel.pe
                      $VER/scripts/maq.pl SNPfilter -f cns.indel -F cns.indel.pe cns.snp > cns.filter.snp
                      $VER/maq cns2win consensus.cns > cns.win
                      awk '$5>=40' cns.filter.snp > cns.final.snp

                      Thanks,
                      Saurabh

                      Comment


                      • #12
                        Originally posted by Lee Sam View Post
                        How did the belief that dindel was the best indel caller came about? I haven't seen any comparison papers (at the least I am not aware of them) so I would honestly like to know.
                        If you were part of the 1000 genomes project, you would know. The paper is coming.

                        Comment


                        • #13
                          Originally posted by lh3 View Post
                          If you were part of the 1000 genomes project, you would know. The paper is coming.
                          Doh! I've just spent a weekend running GATK :-(

                          [ flame mode on ]
                          BTW, I'm not a master of variations and I'm probably the last who can say anything about, nevertheless I don't think 1000 Genomes team holds the Truth. I wish I was part of that project, but unfortunately I have to work somewhere else...
                          [ flame mode off ]

                          d
                          Last edited by dawe; 08-31-2010, 01:23 AM.

                          Comment


                          • #14
                            Originally posted by lh3 View Post
                            If you were part of the 1000 genomes project, you would know. The paper is coming.
                            I have to politely disagree, since I do not know and I am a member of the project. I have not seen a re-alignment method comparison within the 1000 genomes project.

                            Comment


                            • #15
                              Just to make sure -- we are talking about PINDEL here, right? Or does the 1000 genomes have a new indel caller that is still unpublished (called DINDEL)

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X