Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ion Torrent PGM Data Analysis problem!!

    Hi everybody,

    Recently I sequenced some genes (PCR amplicons; app. 5,5kb per gene) from Candida glabrata, to figure out if they contain mutations that can be linked to echinocandin resistance. I used 55 different glabrata strains and the Ion Torrent PGM (two different runs on chip318C & chip316) to get the sequences. Now I am analyzing the data using CLC genomics workbench 7 and found some unexpected results and was wondering if anybody has seen stuff like this or knows whats going on.

    The case is that when I map the reads to a reference, for one of the genes it shows a deletion at a site that results in a framshift and a premature stopcodon. Sure there is nothing wrong with a result like that, but the striking thing here is that when I check the raw data I see that the deletion is caused by a missing G on the reverse strand. Like 99% of the forward strands are showing GGG where almost 95% of the reverse strands are only showing GG. Now I checked the rest of the sequence and there is no other place where this is happening, although there are a lot more places where multiple G's are causing no problems at all (up to 5 in a row). And this is true for all 55 samples and both runs.

    Any thoughts??

  • #2
    Seems like it is behaving like Ion Torrent data normally does.
    At least from my experience the homopolymer problem is kind of hard to
    fix. Normally you would expect something more stochastic, like 4 G on some reads 6 on another and i.e 3 on some and so on. But the reality is more like in your case.


    If you only care about coding regions the newly published Ramics aligner could help: http://nar.oxfordjournals.org/conten...ar.gku473.full

    They even show an example that is close to your case.

    Comment


    • #3
      I agree it's not unusual to see some homopolymer errors, but I do find it unusual to see it trend on only one strand. Are you aligning to the whole genome? Have you tried aligning the data with the Ion Tools?

      Comment


      • #4
        That does sound like some kind of strand bias or cigar string asymmetry coming from the aligner. You could give BBMap a try; it has negligible strand bias and will produce identical cigar strings for a read or its reverse-complement - some aligners produce different ones, especially in the presence of homopolymer indels.

        Comment


        • #5
          Originally posted by wizard_ofchaos View Post
          Hi everybody,

          Recently I sequenced some genes (PCR amplicons; app. 5,5kb per gene) from Candida glabrata, to figure out if they contain mutations that can be linked to echinocandin resistance. I used 55 different glabrata strains and the Ion Torrent PGM (two different runs on chip318C & chip316) to get the sequences. Now I am analyzing the data using CLC genomics workbench 7 and found some unexpected results and was wondering if anybody has seen stuff like this or knows whats going on.

          Any thoughts??
          Why would you spend money on PGM? PGM data is bad and is known to have homopolymer problems for a long time. However, the company claimed that they have their way to fix the homopolymer problem by their in-house scripts. You may want to contact them first before using any of the conventional softwares, that don't have support for this.

          Comment


          • #6
            Originally posted by snetmcom View Post
            I agree it's not unusual to see some homopolymer errors, but I do find it unusual to see it trend on only one strand. Are you aligning to the whole genome? Have you tried aligning the data with the Ion Tools?
            I did not align to the whole genome, I used references posted on Genbank that have the ORF we sequenced.

            So far I only used the CLC workbench to make the alignments as the technician Im working with uses that all the time and he never had problems with it. But I will try some other alignment tools soon.

            We also contacted someone from Life Technologies but they are passing it on as they didnt really have a clue what is going on here. Didnt get an explanation from them yet.

            Comment


            • #7
              Originally posted by wizard_ofchaos View Post
              I did not align to the whole genome, I used references posted on Genbank that have the ORF we sequenced.

              So far I only used the CLC workbench to make the alignments as the technician Im working with uses that all the time and he never had problems with it. But I will try some other alignment tools soon.

              We also contacted someone from Life Technologies but they are passing it on as they didnt really have a clue what is going on here. Didnt get an explanation from them yet.

              One explanation might come from TMAP.
              It basically combines 4 or 5 aligners. Maybe some reads were aligned by another aligner ?
              Is the read length the same in all cases ?
              Last edited by IonTom; 06-18-2014, 12:34 AM.

              Comment


              • #8
                However, the company claimed that they have their way to fix the homopolymer problem by their in-house scripts. You may want to contact them first before using any of the conventional softwares, that don't have support for this.
                I don't see any way that this can be fixed completely, because the sequencing is a stochastic process. Phasing issues (which get worse as the reads get longer, but still exist in shorter reads) mean that at any point in the process more (or fewer) bases may be added than desired. It's impossible to flow bases in such a way that the same number of bases will be added to each sequence in a cluster (especially if they are non-terminating, as in the IonTorrent process), and that substantially reduces the reliability of cluster consensus counting for sequencing homopolymers.

                Comment


                • #9
                  @gringer

                  Sure I totally agree but I dont see this occurring throughout the gene, which you would expect in a stochastic process like this if thats the case. In addition, the site where I see the GGG/GG is the same for all of my samples and I observed the GGG on the forward strand only and the GG on the reverse strand only.

                  Comment


                  • #10
                    oh, right. That's weird. Sorry, can't really help you out there.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    18 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    17 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X