Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable Bismark mapping efficiency of targetted BS data for DNA methylation

    I have sequenced numerous multiplexed pools of BS amplicon-seq libraries derived from human samples on a MiSeq over the past few weeks. I have been utilising trim-galore and Bismark for alignment and am finding the mapping efficiency to be highly variable across pools:

    pool1 - ~85% - no PhiX spiked in
    pool2 - ~70% - no PhiX spiked-in
    pool3 - ~55% - 10% PhiX spiked-in
    pool4 - ~30% - 10% PhiX spiked-in

    How do I go about trouble shooting this? Which factors are likely to affect mapping efficiency?

    The MiSeq run metrics were all very ideal for all of these pools, so I don't think anything strange happened on the sequencing side of things. The amplicons were around 135bp before transformation to sequencing libraries and the later two pools (pool3 and pool4) are between 100-135bp.

    If any additional info is needed then please do ask.

    Edit:
    I forgot to look at the fastqc files for the last two pools. Quickly looked at them and most of the per base sequence content for cytosine hovers around 10-20% throughout the entire read. I assume this means that bisulfite conversion was unsuccessful before library construction? This would likely affect the mapping efficiency?
    Last edited by dross11; 05-23-2017, 10:03 AM. Reason: Fastqc

  • #2
    There are a couple of things that could affect the mapping efficiency, I'll list a few here:

    - depending on the amplification strategy you might have to use --non_directional for mapping. This might also explain why you are seeing cytosine levels of ~10-20%

    - very often the spike-in is not present at the amount you were aiming for. We have seen libraries that were supposed to have a 5% Lambda spike-in that actually contained 90% of Lambda.

    - There might be other contaminants in the library you weren't expecting (e.g. human, bacteria etc). What would help in this case would be to run Fastq_Screen on the data (also for the PhiX spike-in)

    - did you run single-end or paired-end alignments? If you had PE libraries you might want to run Read 1 in SE mode to see if that helps

    If you wanted I could offer you to run a quick screen of your 4 samples. Just send me an email and attach ~100,000 sequences for each sample (gzipped, should well fit as attachment) and I can take a quick look. Cheers, Felix

    Comment


    • #3
      I really appreciate you helping out here Felix
      We bisulfite convert the DNA and then performed PCR with primers designed to either OT or OB strands. I am brand new to field so do excuse my ignorance, but if my OT primer amplifies then I expect the resulting amplicons and libraries to be in either the OT or CTOT configuration? If so, then the --no_directional flag usage makes sense but it is quite bizarre how am getting 85% mapping efficiency with pool1 without the --no_directional flag.

      I tried the --no_directional flag with one pair of fastq files and it increased the mapping efficiency from 32% to 83%!

      My cytosine levels in the fastqc files are high because the CTOT and CTOB strands would have Cytosines; complementary to Thymine in the respective OT/OB strand?

      Someone in our lab is constructing libraries using mouse DNA so Fastq_screen is likely a good idea. I'll try this out soon.

      Read 1 in SE mode produced the same mapping efficiency percentages (~30%).

      Comment


      • #4
        Sorry for the slow reply, it seems that I am not allowed to post anymore when I am at work, trying from home now...

        i again,

        this is where it is getting confusing, I just had to draw this out on a sheet of paper myself... I believe there are basically 2 ways of making bisulfite amplicon libraries:

        1) If we assume you are creating an amplicon against the top strand you use a primer (1) that looks like the top strand (bisulfite converted, OT) and second primer (2) that is complementary to the top (bisulfite converted, CTOT). People here design primer (1) so that it starts with the Illumina PE1 portion and then the sequences of interest, and primer (2) starts with the Illumina PE2 portion and then the sequence of interest. After the initial amplification you make the libraries with the Illumina PE primers, and consequently the OT sequence will always carry the PE1 primer and be sequenced first. This will mean that the alignment strand will be always OT for both single-end or paired-end sequencing (The second read of paired-end libraries taken alone would map to the CTOT strand, but this doesn’t happen during PE mapping).
        The same is true for OB amplicons which I left out here. If your libraries were constructed like this then you should only get alignments to OT and OB depending on which strand you targeted, and the FastQC plots should show low C content for Read1, and low G content for Read2.

        2) You could also design primers to the OT and CTOT strand as above, lets call them (A) and (B). Instead of carrying the Illumina PE portions as well you could simply amplify the genomic loci, then perform A-tailing and subsequently ligate on the sequencing adapters. In this scenario you might end up getting the PE1 primer on either the OT side or the CTOT side, and thus you would get both OT as well as CTOT alignments. (and also OB and CTOB if you also targeted the bottom strand). In these kinds of libraries G and C should be at a similar level in both Read1 and Read2, and the libraries will be non-directional.

        Given that you are getting both directional results (pools 1 and 2) and non-directional results (pools 3 and 4), is there any chance that you changed the amplification protocol or primers during the course of the experiment?

        If you can bring up the mapping efficiency to over 80% I don’t think that Fastq_Screen will find any major contaminants because most of the data is already well… Let me know if you would like me to take a look at some of the data (or do a quick screen) for you.

        Comment


        • #5
          So I spent the better part of yesterday reading the Bismark publication and the new version (v0.18) of the Bismark docs to greater my understanding of strand configurations. As I understand it, between my reading and your reply, that:

          Directional has to be designed in such a way that the OT and OB strand are tagged so that the primers adhere to these tags thus only amplify these configurations; post-bisulfite adapter tagging (PBAT) sequencing (the EpiGnome library prep workflow is a good example of this?). I suppose the tagging is the PE1/2 portion of the sequence?

          Non-directional has no such tagging so all configurations (OT,OB,CTOT,CTOB) are amplified.

          We designed the forward and reverse primers to the same strand configuration, either OT or OB (not to CTOT or CTOB at all), and they were not designed to partially anneal to the Illumina PE1/2 sequences. Could the Forward/Reverse primers designed to OT configuration anneal to CTOT? If so, I believe we have designed the amplicons in an non-directional fashion, unfortunately our lab people are also confused with what CTOT and CTOB truly is. All four pools contain different samples and primers but the amplification protocol has remained the same. However, pool3 and pool4 contain primers designed by an student whom I cannot confirm definitely designed the primers in the correct configuration.

          Nevertheless, I have sent sample post-trimmed fastq pairs from pool1 and pool4 to your email address specified on the bismark webpage.

          Comment


          • #6
            Update:

            Talking to the lab people I found that the primers used to generated amplicon-libs in pool1 and pool2 were produced using BisulfitePrimerSeeker and these primers seemed to have been designed specifically to OT and OB strand configuration. Primers used to generate amplicon-libs in pool3 and pool4 were produced using PrimerSuite and were designed to OT and CTOB strand configuration by accident; they thought they were designing to OB and not CTOB. After aligning pool4 fastqs to bisamrk using the --non_directional flag, I used bismark methylation extraction and found the resulting outputted file with OT or CTOB in their name were much larger files than their counterpart files with OB or CTOT in their filename. I think this explains why pool1 & 2 have a high mapping efficiency with --directional and pool3 and 4 have a high mapping efficiency with --non_directional. Does this logic seem sound?

            Comment


            • #7
              Originally posted by dross11 View Post
              Update:

              Talking to the lab people I found that the primers used to generated amplicon-libs in pool1 and pool2 were produced using BisulfitePrimerSeeker and these primers seemed to have been designed specifically to OT and OB strand configuration. Primers used to generate amplicon-libs in pool3 and pool4 were produced using PrimerSuite and were designed to OT and CTOB strand configuration by accident; they thought they were designing to OB and not CTOB. After aligning pool4 fastqs to bisamrk using the --non_directional flag, I used bismark methylation extraction and found the resulting outputted file with OT or CTOB in their name were much larger files than their counterpart files with OB or CTOT in their filename. I think this explains why pool1 & 2 have a high mapping efficiency with --directional and pool3 and 4 have a high mapping efficiency with --non_directional. Does this logic seem sound?
              I just replied this via email:

              Hi David,

              Thanks for the sequences and the other update on SeqAnswers.

              I had a look at your sequencing files, and came to the same conclusions. Both files align to the human genome with >90%, which is a good start. Pool 4 has some 5% or so of PhiX, but there are no contaminations worth noting.

              Pool 1 aligns to the OT (40%) and OB (60%) strands, but Pool 2 aligns to the OT (35%), CTOB (60%) and OB (5%). I had a look at the non-deduplicated alingments in SeqMonk and the amplicons look fantastically clean with almost no background whatsoever. Since some of the Pool 1 and Pool 4 reads overlapped perfectly I also concluded that at least some of the regions must have been designed to the same locus but this different primers and/or using a different protocol somehow (students, ey? )

              So overall I believe that you should be just fine looking at OT/OB for pools 1 and 2, and OT/CTOB for pools 3 and 4, the information should be the same (at least theoretically).

              All the best, Felix

              Comment


              • #8
                Originally posted by fkrueger View Post
                Sorry for the slow reply, it seems that I am not allowed to post anymore when I am at work, trying from home now...
                @Felix: @ECO has turned the DDoS filter back on and forum software is aggressively marking posts for approval after a recent attack. I try to keep an eye out and approve legitimate posts soon as I can.

                Comment


                • #9
                  Excellent, sorry for being so ranty! :P

                  I tried three different computers (PC/Mac) on site yesterday, tried posting from my phone (on Eduroam but on site as well) but then had to wait until I was at home in the evening where the same post went through immediately (on a MacBook). So I am assuming that our Institute IP rage is probably flagged up as known Spam?

                  In any case, thanks for your support! Cheers, Felix

                  Comment


                  • #10
                    Originally posted by fkrueger View Post
                    Excellent, sorry for being so ranty! :P

                    I tried three different computers (PC/Mac) on site yesterday, tried posting from my phone (on Eduroam but on site as well) but then had to wait until I was at home in the evening where the same post went through immediately (on a MacBook). So I am assuming that our Institute IP rage is probably flagged up as known Spam?

                    In any case, thanks for your support! Cheers, Felix
                    It is possible that your institutes IP was being flagged temporarily (you seem to be able to post today). Those border filter appliances are necessary evil we have to live with now.

                    Comment


                    • #11
                      I couldn't tell why this would be so. In my first message yesterday I included the [ MAIL ] tags, maybe this could be seen as malicious and/or spam and thus flag the institute IP as a potential threat for a day? Just guessing here...

                      Sorry for taking this thread off-topic (but I think it should have been solved anyhow).

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      67 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X