Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variant Calling outside Torrent Suite and TVC

    Hello,

    After trying to build Torrent Suite 4 for weeks on a RedHat Linux without success (I got compiled programs but not working with my Linux Architecture), and exploring all the tools they "offer" inside their analysis suite.

    I'm scared.

    They use their own variant caller (tvc), a tool that supports hotspots.. freebayes and normalized measurements. In my opinion an in-house variant caller to deal with the homopolymer problem that we all know.

    They also use a modified version of GATK, that includes the IndelAssembly toolkit. I think it's their own modified version of GATK. I can't find any information about IndelAssembly (toolkit intended to find large indels) outside Life realms.

    I would like to analyze some Ion Torrent data, but I don't like this way. I would like to know what tools are you using for variant calling outside the straight way from Life and they Torrent Suite one click analyzer. Tools supported by all the scientific community. For me a golden standard version of GATK would be ok.

    My purpose at this moment is to analyze AmpliSeq exomes.

    Thanks !
    Last edited by gmarco; 03-25-2014, 04:17 AM.

  • #2
    If you analyse human data, GATK is a very good tool.

    Comment


    • #3
      I'm using GATK as I do with Illumina data. I'm experiencing very slow variant calling process with UnifiedGenotyper.

      Comment


      • #4
        Could you ask your service provider to analyze the data on their Torrent Suite? It extremely simple on the Server.

        Comment


        • #5
          Hello, i'm analyzing Ampliseq runs on custom gene panels, using a 316 Chip

          i also want to try something different, i also try the GATK, but after the MarkDuplicates step, i found that due to my amplicons start and end in the same locations in the alignment thay are marked as PCR duplicates and removed from the next steps..

          so i can't do this step, if i follow the best practices guidelines, and aplly hard filtering, i get a lot more variants that with the ION variantCaller pipeline

          i mean aprox 30 variants in the ION pipeline for each barcado, and aprox 150 with GATK after the hard filtering step

          i don't know how good is this disagrement

          what experience do you have?

          Comment


          • #6
            Ampliseq library will be mostly duplicates by design. You should not be filtering these reads out.

            GATK is likely calling false positives because it does not have any ion specific rules.

            Comment


            • #7
              I'm glad to see people who think like me about the Torrent Server Suite and its Tool Box.
              I'm used to deal with PGM data, and here is my pipeline :

              -Alignment is done with bwasw. I tested several alignment programs (tmap, novoalign) and even if the percentage of mapped reads is 4-5 % smaller than tmap, the indel mismatch is better.

              -For an exome i use MarkDuplicates.jar from PICARD. For targeted sequencing, it's not recommended because 80-90% of reads will be marked.

              -Then, i use FreeBayes to call SNP and UnifiedGenotyper to call INDEL. I prefer to use FreeBayes to call SNP because i can set the min variant frequency and be more sensitive than UnifiedGenotyper. But It requires additional filtering (Strand Bias, Quality, etc...).
              For more specificity without to much work, i suggest you UnifiedGenotyper for both SNP and INDEL.
              But to be honest you have better chance to call a true INDEL with flipping a coin. There are too many False Positive, and for an exome it's a misery.
              Last edited by arnaud83; 04-10-2014, 11:52 PM.

              Comment


              • #8
                @arnaud83: how you do the strand bias filtering?

                Comment


                • #9
                  Originally posted by c_ro87 View Post
                  @arnaud83: how you do the strand bias filtering?
                  For strand bias based on Fisher's exact test (Unifiedgenotyper), i use a threshold of 60 ( p=0.000001). Variants below this threshold will be keep.
                  FreeBayes doesn't include strand bias in the vcf output, but you can easily compute this with some programming skills

                  Comment


                  • #10
                    I also kind of gave up on the IonTorrent Suite.

                    Currently i am using NextGenMap for Alignment.

                    For Variant Calling i use Platypus. The principle is kind of similar to FreeBayes,
                    but the QC statistics and filters are much more complete. As in everything you can think of.

                    One additional fillter I use is implemented in the BioConductor VariantTools package.
                    It tells you at how many different in read positions a variant was found.
                    This is kind of important as removing PCR duplicates is not really possible for
                    Amplicon data.
                    Last edited by IonTom; 04-24-2014, 11:54 AM.

                    Comment


                    • #11
                      I did not know these tools. Thank you.
                      I will test them.

                      Comment


                      • #12
                        @arnaud83: How did they work for you ?


                        There is a nice paper discussing the topic of using aligners on ion torrent data:
                        Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. Results In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. Conclusions A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform.
                        Last edited by IonTom; 04-24-2014, 01:09 PM.

                        Comment


                        • #13
                          The homopolymer issue in IonTorrent can be semi-mitigated through setting frequency thresholds based on mixture fractions. The solution can be applied through post-processing or integrated into one of these variant caller applications. We're working on a paper right now that demonstrates the methodology in a productional lab environment (vs. academic environment you see in most papers).

                          Comment


                          • #14
                            Originally posted by IonTom View Post
                            @arnaud83: How did they work for you ?


                            There is a nice paper discussing the topic of using aligners on ion torrent data:
                            http://www.biomedcentral.com/1471-2164/15/264/

                            Well, to be honest, i'm a little bit disappointed by mosaik. The mentioned paper shows promising results but i obtained worse results than bwa or tmap.

                            Comment


                            • #15
                              I'm very happy seeing this topic has received many answers. I'm wiling to try all these tools.

                              Originally posted by wolfpack14 View Post
                              The homopolymer issue in IonTorrent can be semi-mitigated through setting frequency thresholds based on mixture fractions. The solution can be applied through post-processing or integrated into one of these variant caller applications. We're working on a paper right now that demonstrates the methodology in a productional lab environment (vs. academic environment you see in most papers).
                              Hello wolfpack do you have any ETA?

                              I expected very very very slow GATK UnifiedGenotyper variant calling with Ion Torrent exome variant calling. Anyone had this issue?

                              Ion Torrent data has 2 major issues:
                              1 - Dealing with homopolymer problem (how the hell we're supposed to filter those reads, or deal with them)
                              2 - Setup correct variant calling settings.
                              Last edited by gmarco; 06-09-2014, 12:39 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X