Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gmarco
    Member
    • Oct 2012
    • 36

    Variant Calling outside Torrent Suite and TVC

    Hello,

    After trying to build Torrent Suite 4 for weeks on a RedHat Linux without success (I got compiled programs but not working with my Linux Architecture), and exploring all the tools they "offer" inside their analysis suite.

    I'm scared.

    They use their own variant caller (tvc), a tool that supports hotspots.. freebayes and normalized measurements. In my opinion an in-house variant caller to deal with the homopolymer problem that we all know.

    They also use a modified version of GATK, that includes the IndelAssembly toolkit. I think it's their own modified version of GATK. I can't find any information about IndelAssembly (toolkit intended to find large indels) outside Life realms.

    I would like to analyze some Ion Torrent data, but I don't like this way. I would like to know what tools are you using for variant calling outside the straight way from Life and they Torrent Suite one click analyzer. Tools supported by all the scientific community. For me a golden standard version of GATK would be ok.

    My purpose at this moment is to analyze AmpliSeq exomes.

    Thanks !
    Last edited by gmarco; 03-25-2014, 04:17 AM.
  • TiborNagy
    Senior Member
    • Mar 2010
    • 329

    #2
    If you analyse human data, GATK is a very good tool.

    Comment

    • gmarco
      Member
      • Oct 2012
      • 36

      #3
      I'm using GATK as I do with Illumina data. I'm experiencing very slow variant calling process with UnifiedGenotyper.

      Comment

      • snetmcom
        Senior Member
        • Oct 2008
        • 159

        #4
        Could you ask your service provider to analyze the data on their Torrent Suite? It extremely simple on the Server.

        Comment

        • c_ro87
          Member
          • Feb 2012
          • 12

          #5
          Hello, i'm analyzing Ampliseq runs on custom gene panels, using a 316 Chip

          i also want to try something different, i also try the GATK, but after the MarkDuplicates step, i found that due to my amplicons start and end in the same locations in the alignment thay are marked as PCR duplicates and removed from the next steps..

          so i can't do this step, if i follow the best practices guidelines, and aplly hard filtering, i get a lot more variants that with the ION variantCaller pipeline

          i mean aprox 30 variants in the ION pipeline for each barcado, and aprox 150 with GATK after the hard filtering step

          i don't know how good is this disagrement

          what experience do you have?

          Comment

          • snetmcom
            Senior Member
            • Oct 2008
            • 159

            #6
            Ampliseq library will be mostly duplicates by design. You should not be filtering these reads out.

            GATK is likely calling false positives because it does not have any ion specific rules.

            Comment

            • arnaud83
              Junior Member
              • Apr 2014
              • 6

              #7
              I'm glad to see people who think like me about the Torrent Server Suite and its Tool Box.
              I'm used to deal with PGM data, and here is my pipeline :

              -Alignment is done with bwasw. I tested several alignment programs (tmap, novoalign) and even if the percentage of mapped reads is 4-5 % smaller than tmap, the indel mismatch is better.

              -For an exome i use MarkDuplicates.jar from PICARD. For targeted sequencing, it's not recommended because 80-90% of reads will be marked.

              -Then, i use FreeBayes to call SNP and UnifiedGenotyper to call INDEL. I prefer to use FreeBayes to call SNP because i can set the min variant frequency and be more sensitive than UnifiedGenotyper. But It requires additional filtering (Strand Bias, Quality, etc...).
              For more specificity without to much work, i suggest you UnifiedGenotyper for both SNP and INDEL.
              But to be honest you have better chance to call a true INDEL with flipping a coin. There are too many False Positive, and for an exome it's a misery.
              Last edited by arnaud83; 04-10-2014, 11:52 PM.

              Comment

              • c_ro87
                Member
                • Feb 2012
                • 12

                #8
                @arnaud83: how you do the strand bias filtering?

                Comment

                • arnaud83
                  Junior Member
                  • Apr 2014
                  • 6

                  #9
                  Originally posted by c_ro87 View Post
                  @arnaud83: how you do the strand bias filtering?
                  For strand bias based on Fisher's exact test (Unifiedgenotyper), i use a threshold of 60 ( p=0.000001). Variants below this threshold will be keep.
                  FreeBayes doesn't include strand bias in the vcf output, but you can easily compute this with some programming skills

                  Comment

                  • IonTom
                    Member
                    • Apr 2014
                    • 32

                    #10
                    I also kind of gave up on the IonTorrent Suite.

                    Currently i am using NextGenMap for Alignment.

                    For Variant Calling i use Platypus. The principle is kind of similar to FreeBayes,
                    but the QC statistics and filters are much more complete. As in everything you can think of.

                    One additional fillter I use is implemented in the BioConductor VariantTools package.
                    It tells you at how many different in read positions a variant was found.
                    This is kind of important as removing PCR duplicates is not really possible for
                    Amplicon data.
                    Last edited by IonTom; 04-24-2014, 11:54 AM.

                    Comment

                    • arnaud83
                      Junior Member
                      • Apr 2014
                      • 6

                      #11
                      I did not know these tools. Thank you.
                      I will test them.

                      Comment

                      • IonTom
                        Member
                        • Apr 2014
                        • 32

                        #12
                        @arnaud83: How did they work for you ?


                        There is a nice paper discussing the topic of using aligners on ion torrent data:
                        Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. Results In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. Conclusions A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform.
                        Last edited by IonTom; 04-24-2014, 01:09 PM.

                        Comment

                        • wolfpack14
                          Member
                          • Jan 2014
                          • 12

                          #13
                          The homopolymer issue in IonTorrent can be semi-mitigated through setting frequency thresholds based on mixture fractions. The solution can be applied through post-processing or integrated into one of these variant caller applications. We're working on a paper right now that demonstrates the methodology in a productional lab environment (vs. academic environment you see in most papers).

                          Comment

                          • arnaud83
                            Junior Member
                            • Apr 2014
                            • 6

                            #14
                            Originally posted by IonTom View Post
                            @arnaud83: How did they work for you ?


                            There is a nice paper discussing the topic of using aligners on ion torrent data:
                            http://www.biomedcentral.com/1471-2164/15/264/

                            Well, to be honest, i'm a little bit disappointed by mosaik. The mentioned paper shows promising results but i obtained worse results than bwa or tmap.

                            Comment

                            • gmarco
                              Member
                              • Oct 2012
                              • 36

                              #15
                              I'm very happy seeing this topic has received many answers. I'm wiling to try all these tools.

                              Originally posted by wolfpack14 View Post
                              The homopolymer issue in IonTorrent can be semi-mitigated through setting frequency thresholds based on mixture fractions. The solution can be applied through post-processing or integrated into one of these variant caller applications. We're working on a paper right now that demonstrates the methodology in a productional lab environment (vs. academic environment you see in most papers).
                              Hello wolfpack do you have any ETA?

                              I expected very very very slow GATK UnifiedGenotyper variant calling with Ion Torrent exome variant calling. Anyone had this issue?

                              Ion Torrent data has 2 major issues:
                              1 - Dealing with homopolymer problem (how the hell we're supposed to filter those reads, or deal with them)
                              2 - Setup correct variant calling settings.
                              Last edited by gmarco; 06-09-2014, 12:39 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 10:17 AM
                              0 responses
                              6 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              59 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Working...