Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bismark PE mapping low efficiency

    Hi all,

    I am now learning WGBS analysis using Bismark ver1.9.

    I'm facing low mapping efficiency problem. When I use with PE mode, Mapping efficiency turn to be 1.8%. But when I use either of that sequence in SE mode, this gives me 88% mapping efficiency.
    My sample is not PBAT.

    I can't solve this problem by myself. Could anyone answer my problem?

    Followings are my procedure.
    1. remove poor read quality reads.
    2. remove adaptor sequence.
    3. convert hg19 refgenome by bismark_genome_preparation
    4. try mapping using bismark either PE mode or SE mode

    PE mode
    Code:
    bismark -q --bowtie2 -N 0 -L 20 -u 10000 -X 2000 --score_min L,0,-0.6 /refgenome --1 R1.fastq --2 R2.fastq --sam  -o ./bismark_result
    ======================
    Sequence pairs analysed in total: 10000
    Number of paired-end alignments with a unique best hit: 175
    Mapping efficiency: 1.8%

    Sequence pairs with no alignments under any condition: 9817
    Sequence pairs did not map uniquely: 8
    Sequence pairs which were discarded because genomic sequence could not be extracted: 0

    Number of sequence pairs with unique best (first) alignment came from the bowtie output:
    CT/GA/CT: 78 ((converted) top strand)
    GA/CT/CT: 0 (complementary to (converted) top strand)
    GA/CT/GA: 0 (complementary to (converted) bottom strand)
    CT/GA/GA: 97 ((converted) bottom strand)

    Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

    Final Cytosine Methylation Report
    =================================
    Total number of C's analysed: 6079

    Total methylated C's in CpG context: 201
    Total methylated C's in CHG context: 7
    Total methylated C's in CHH context: 23
    Total methylated C's in Unknown context: 0

    Total unmethylated C's in CpG context: 157
    Total unmethylated C's in CHG context: 1271
    Total unmethylated C's in CHH context: 4420
    Total unmethylated C's in Unknown context: 14


    C methylated in CpG context: 56.1%
    C methylated in CHG context: 0.5%
    C methylated in CHH context: 0.5%
    C methylated in unknown context (CN or CHN): 0.0%
    =====================


    SE mode
    Code:
     bismark -q --bowtie2 -N 0 -L 20 --score_min L,0,-0.6 /refgenome --se R1.fastq --sam  -o ./bismark_result
    ======================
    Sequences analysed in total: 3014078
    Number of alignments with a unique best hit from the different alignments: 2664742
    Mapping efficiency: 88.4%

    Sequences with no alignments under any condition: 107753
    Sequences did not map uniquely: 241583
    Sequences which were discarded because genomic sequence could not be extracted: 10

    Number of sequences with unique best (first) alignment came from the bowtie output:
    CT/CT: 1329899 ((converted) top strand)
    CT/GA: 1334833 ((converted) bottom strand)
    GA/CT: 0 (complementary to (converted) top strand)
    GA/GA: 0 (complementary to (converted) bottom strand)

    Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

    Final Cytosine Methylation Report
    =================================
    Total number of C's analysed: 43696755

    Total methylated C's in CpG context: 1442578
    Total methylated C's in CHG context: 29761
    Total methylated C's in CHH context: 110235
    Total methylated C's in Unknown context: 654

    Total unmethylated C's in CpG context: 395138
    Total unmethylated C's in CHG context: 9033660
    Total unmethylated C's in CHH context: 32685383
    Total unmethylated C's in Unknown context: 13481

    C methylated in CpG context: 78.5%
    C methylated in CHG context: 0.3%
    C methylated in CHH context: 0.3%
    C methylated in Unknown context (CN or CHN): 4.6%
    ===================================

    Thanks alot,
    Taiki

  • #2
    Hi Tsutsui,

    In a case like yours Read 1 seems to be absolutely fine, and your library is directional, so that looks all fine. If I had to guess what the reason for the low mapping efficiency in PE mode is I would consider one of the following options:

    1. The FastQ files for R1 and R2 are not in the same order. Going back to the raw FastQ files and trimming with Trim Galore in --paired mode will fix this problem.

    2. Read 2 has particularly poor qualities or suffered a disastrous fault during the run. The FastQC profile of R2 might tell you if this was the case. Again, Trim Galore should fix quality issues from at least on the 3' end.

    3. The R2 was somehow special, e.g. the first 8bp could be a UMI sequence that prevents the reads from mapping? To see if there is a general mappability problem with R2 alone you can run the same SE command as for read1, but you need to also include --pbat. If that efficiency is equally high as R1 then the read order is the most likely suspect.

    Let me know how you are getting on. I could also offer to take a quick look for you if you could send some 100-200K reads via email.

    Cheers, Felix

    Comment


    • #3
      Hi Felix,

      Thank you for your kind reply.
      I tried Trim Galore in stead of fastq_quality_filter which I previously used.

      In the end, I found that Trim Galore works fine!
      I got 84% mapping efficiency using -PE in bismark.

      Thank you Felix.

      Comment


      • #4
        when to merge PE and SE alignments in Bismark

        Due to some R2 quality issues (I think), I am getting low paired end mapping efficiencies. When I align the unmapped reads in single end mode, I am able to recover quite a few of the reads. I am unsure where in the pipeline I can "merge" the outputs for the paired-end and single-end alignments. Can both files be given to the methylation extractor for 1 file output or do I just need to merge the counts in the reports such as the coverage output after.

        Comment


        • #5
          When you have both paired-end (PE) and single-end (SE) alignments I would methylation extract the files separately (the methylation extractor should auto-detect what to do), and then use the CpG* output files from both PE and SE as input for bismark2bedGraph to generate a coverage file. The command should be something like this:

          Code:
          bismark2bedGraph --buffer 10G -o output_file CpG*
          I hope this is what you were looking for?

          Comment


          • #6
            Thanks so much. That sounds like it will work.

            Comment


            • #7
              Just for clarification... R2 singles need to be aligned in pbat mode to get proper mapping?

              Comment


              • #8
                Originally posted by shawpa View Post
                Just for clarification... R2 singles need to be aligned in pbat mode to get proper mapping?
                yes, that's correct.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X