Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • wizard_ofchaos
    Junior Member
    • May 2014
    • 3

    Ion Torrent PGM Data Analysis problem!!

    Hi everybody,

    Recently I sequenced some genes (PCR amplicons; app. 5,5kb per gene) from Candida glabrata, to figure out if they contain mutations that can be linked to echinocandin resistance. I used 55 different glabrata strains and the Ion Torrent PGM (two different runs on chip318C & chip316) to get the sequences. Now I am analyzing the data using CLC genomics workbench 7 and found some unexpected results and was wondering if anybody has seen stuff like this or knows whats going on.

    The case is that when I map the reads to a reference, for one of the genes it shows a deletion at a site that results in a framshift and a premature stopcodon. Sure there is nothing wrong with a result like that, but the striking thing here is that when I check the raw data I see that the deletion is caused by a missing G on the reverse strand. Like 99% of the forward strands are showing GGG where almost 95% of the reverse strands are only showing GG. Now I checked the rest of the sequence and there is no other place where this is happening, although there are a lot more places where multiple G's are causing no problems at all (up to 5 in a row). And this is true for all 55 samples and both runs.

    Any thoughts??
  • IonTom
    Member
    • Apr 2014
    • 32

    #2
    Seems like it is behaving like Ion Torrent data normally does.
    At least from my experience the homopolymer problem is kind of hard to
    fix. Normally you would expect something more stochastic, like 4 G on some reads 6 on another and i.e 3 on some and so on. But the reality is more like in your case.


    If you only care about coding regions the newly published Ramics aligner could help: http://nar.oxfordjournals.org/conten...ar.gku473.full

    They even show an example that is close to your case.

    Comment

    • snetmcom
      Senior Member
      • Oct 2008
      • 159

      #3
      I agree it's not unusual to see some homopolymer errors, but I do find it unusual to see it trend on only one strand. Are you aligning to the whole genome? Have you tried aligning the data with the Ion Tools?

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        That does sound like some kind of strand bias or cigar string asymmetry coming from the aligner. You could give BBMap a try; it has negligible strand bias and will produce identical cigar strings for a read or its reverse-complement - some aligners produce different ones, especially in the presence of homopolymer indels.

        Comment

        • woodydon
          Member
          • Jan 2010
          • 52

          #5
          Originally posted by wizard_ofchaos View Post
          Hi everybody,

          Recently I sequenced some genes (PCR amplicons; app. 5,5kb per gene) from Candida glabrata, to figure out if they contain mutations that can be linked to echinocandin resistance. I used 55 different glabrata strains and the Ion Torrent PGM (two different runs on chip318C & chip316) to get the sequences. Now I am analyzing the data using CLC genomics workbench 7 and found some unexpected results and was wondering if anybody has seen stuff like this or knows whats going on.

          Any thoughts??
          Why would you spend money on PGM? PGM data is bad and is known to have homopolymer problems for a long time. However, the company claimed that they have their way to fix the homopolymer problem by their in-house scripts. You may want to contact them first before using any of the conventional softwares, that don't have support for this.

          Comment

          • wizard_ofchaos
            Junior Member
            • May 2014
            • 3

            #6
            Originally posted by snetmcom View Post
            I agree it's not unusual to see some homopolymer errors, but I do find it unusual to see it trend on only one strand. Are you aligning to the whole genome? Have you tried aligning the data with the Ion Tools?
            I did not align to the whole genome, I used references posted on Genbank that have the ORF we sequenced.

            So far I only used the CLC workbench to make the alignments as the technician Im working with uses that all the time and he never had problems with it. But I will try some other alignment tools soon.

            We also contacted someone from Life Technologies but they are passing it on as they didnt really have a clue what is going on here. Didnt get an explanation from them yet.

            Comment

            • IonTom
              Member
              • Apr 2014
              • 32

              #7
              Originally posted by wizard_ofchaos View Post
              I did not align to the whole genome, I used references posted on Genbank that have the ORF we sequenced.

              So far I only used the CLC workbench to make the alignments as the technician Im working with uses that all the time and he never had problems with it. But I will try some other alignment tools soon.

              We also contacted someone from Life Technologies but they are passing it on as they didnt really have a clue what is going on here. Didnt get an explanation from them yet.

              One explanation might come from TMAP.
              It basically combines 4 or 5 aligners. Maybe some reads were aligned by another aligner ?
              Is the read length the same in all cases ?
              Last edited by IonTom; 06-18-2014, 12:34 AM.

              Comment

              • gringer
                David Eccles (gringer)
                • May 2011
                • 845

                #8
                However, the company claimed that they have their way to fix the homopolymer problem by their in-house scripts. You may want to contact them first before using any of the conventional softwares, that don't have support for this.
                I don't see any way that this can be fixed completely, because the sequencing is a stochastic process. Phasing issues (which get worse as the reads get longer, but still exist in shorter reads) mean that at any point in the process more (or fewer) bases may be added than desired. It's impossible to flow bases in such a way that the same number of bases will be added to each sequence in a cluster (especially if they are non-terminating, as in the IonTorrent process), and that substantially reduces the reliability of cluster consensus counting for sequencing homopolymers.

                Comment

                • wizard_ofchaos
                  Junior Member
                  • May 2014
                  • 3

                  #9
                  @gringer

                  Sure I totally agree but I dont see this occurring throughout the gene, which you would expect in a stochastic process like this if thats the case. In addition, the site where I see the GGG/GG is the same for all of my samples and I observed the GGG on the forward strand only and the GG on the reverse strand only.

                  Comment

                  • gringer
                    David Eccles (gringer)
                    • May 2011
                    • 845

                    #10
                    oh, right. That's weird. Sorry, can't really help you out there.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Pathogen Surveillance with Advanced Genomic Tools
                      by seqadmin




                      The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                      03-24-2025, 11:48 AM
                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 10:17 AM
                    0 responses
                    7 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    49 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    59 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    50 views
                    0 reactions
                    Last Post seqadmin  
                    Working...