SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Variable Bismark mapping efficiency of targetted BS data for DNA methylation dross11 Bioinformatics 10 05-26-2017 06:15 AM
Low mapping efficiency of WGBS data for DNA methylation naveed.jhamat Bioinformatics 3 11-17-2016 08:30 AM
High non-CG methylation and low mapping efficiency in PBAT library yub Epigenetics 1 04-13-2015 01:02 AM
Bismark: paired-end low mapping efficiency dideco Epigenetics 31 02-18-2015 07:01 AM
low mapping efficiency of cd-hit-otu SymbioNGS Illumina/Solexa 0 10-06-2014 04:53 AM

Reply
 
Thread Tools
Old 11-22-2017, 12:25 AM   #1
Ttsutsui
Junior Member
 
Location: Japan

Join Date: Nov 2017
Posts: 2
Default Bismark PE mapping low efficiency

Hi all,

I am now learning WGBS analysis using Bismark ver1.9.

I'm facing low mapping efficiency problem. When I use with PE mode, Mapping efficiency turn to be 1.8%. But when I use either of that sequence in SE mode, this gives me 88% mapping efficiency.
My sample is not PBAT.

I can't solve this problem by myself. Could anyone answer my problem?

Followings are my procedure.
1. remove poor read quality reads.
2. remove adaptor sequence.
3. convert hg19 refgenome by bismark_genome_preparation
4. try mapping using bismark either PE mode or SE mode

PE mode
Code:
bismark -q --bowtie2 -N 0 -L 20 -u 10000 -X 2000 --score_min L,0,-0.6 /refgenome --1 R1.fastq --2 R2.fastq --sam  -o ./bismark_result
======================
Sequence pairs analysed in total: 10000
Number of paired-end alignments with a unique best hit: 175
Mapping efficiency: 1.8%

Sequence pairs with no alignments under any condition: 9817
Sequence pairs did not map uniquely: 8
Sequence pairs which were discarded because genomic sequence could not be extracted: 0

Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT: 78 ((converted) top strand)
GA/CT/CT: 0 (complementary to (converted) top strand)
GA/CT/GA: 0 (complementary to (converted) bottom strand)
CT/GA/GA: 97 ((converted) bottom strand)

Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

Final Cytosine Methylation Report
=================================
Total number of C's analysed: 6079

Total methylated C's in CpG context: 201
Total methylated C's in CHG context: 7
Total methylated C's in CHH context: 23
Total methylated C's in Unknown context: 0

Total unmethylated C's in CpG context: 157
Total unmethylated C's in CHG context: 1271
Total unmethylated C's in CHH context: 4420
Total unmethylated C's in Unknown context: 14


C methylated in CpG context: 56.1%
C methylated in CHG context: 0.5%
C methylated in CHH context: 0.5%
C methylated in unknown context (CN or CHN): 0.0%
=====================


SE mode
Code:
 bismark -q --bowtie2 -N 0 -L 20 --score_min L,0,-0.6 /refgenome --se R1.fastq --sam  -o ./bismark_result
======================
Sequences analysed in total: 3014078
Number of alignments with a unique best hit from the different alignments: 2664742
Mapping efficiency: 88.4%

Sequences with no alignments under any condition: 107753
Sequences did not map uniquely: 241583
Sequences which were discarded because genomic sequence could not be extracted: 10

Number of sequences with unique best (first) alignment came from the bowtie output:
CT/CT: 1329899 ((converted) top strand)
CT/GA: 1334833 ((converted) bottom strand)
GA/CT: 0 (complementary to (converted) top strand)
GA/GA: 0 (complementary to (converted) bottom strand)

Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

Final Cytosine Methylation Report
=================================
Total number of C's analysed: 43696755

Total methylated C's in CpG context: 1442578
Total methylated C's in CHG context: 29761
Total methylated C's in CHH context: 110235
Total methylated C's in Unknown context: 654

Total unmethylated C's in CpG context: 395138
Total unmethylated C's in CHG context: 9033660
Total unmethylated C's in CHH context: 32685383
Total unmethylated C's in Unknown context: 13481

C methylated in CpG context: 78.5%
C methylated in CHG context: 0.3%
C methylated in CHH context: 0.3%
C methylated in Unknown context (CN or CHN): 4.6%
===================================

Thanks alot,
Taiki
Ttsutsui is offline   Reply With Quote
Old 11-22-2017, 05:26 AM   #2
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

Hi Tsutsui,

In a case like yours Read 1 seems to be absolutely fine, and your library is directional, so that looks all fine. If I had to guess what the reason for the low mapping efficiency in PE mode is I would consider one of the following options:

1. The FastQ files for R1 and R2 are not in the same order. Going back to the raw FastQ files and trimming with Trim Galore in --paired mode will fix this problem.

2. Read 2 has particularly poor qualities or suffered a disastrous fault during the run. The FastQC profile of R2 might tell you if this was the case. Again, Trim Galore should fix quality issues from at least on the 3' end.

3. The R2 was somehow special, e.g. the first 8bp could be a UMI sequence that prevents the reads from mapping? To see if there is a general mappability problem with R2 alone you can run the same SE command as for read1, but you need to also include --pbat. If that efficiency is equally high as R1 then the read order is the most likely suspect.

Let me know how you are getting on. I could also offer to take a quick look for you if you could send some 100-200K reads via email.

Cheers, Felix
fkrueger is offline   Reply With Quote
Old 11-26-2017, 07:07 PM   #3
Ttsutsui
Junior Member
 
Location: Japan

Join Date: Nov 2017
Posts: 2
Default

Hi Felix,

Thank you for your kind reply.
I tried Trim Galore in stead of fastq_quality_filter which I previously used.

In the end, I found that Trim Galore works fine!
I got 84% mapping efficiency using -PE in bismark.

Thank you Felix.
Ttsutsui is offline   Reply With Quote
Old 11-27-2017, 07:51 AM   #4
shawpa
Member
 
Location: Pittsburgh

Join Date: Aug 2011
Posts: 68
Default when to merge PE and SE alignments in Bismark

Due to some R2 quality issues (I think), I am getting low paired end mapping efficiencies. When I align the unmapped reads in single end mode, I am able to recover quite a few of the reads. I am unsure where in the pipeline I can "merge" the outputs for the paired-end and single-end alignments. Can both files be given to the methylation extractor for 1 file output or do I just need to merge the counts in the reports such as the coverage output after.
shawpa is offline   Reply With Quote
Old 11-28-2017, 01:46 AM   #5
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

When you have both paired-end (PE) and single-end (SE) alignments I would methylation extract the files separately (the methylation extractor should auto-detect what to do), and then use the CpG* output files from both PE and SE as input for bismark2bedGraph to generate a coverage file. The command should be something like this:

Code:
bismark2bedGraph --buffer 10G -o output_file CpG*
I hope this is what you were looking for?
fkrueger is offline   Reply With Quote
Old 11-28-2017, 05:22 AM   #6
shawpa
Member
 
Location: Pittsburgh

Join Date: Aug 2011
Posts: 68
Default

Thanks so much. That sounds like it will work.
shawpa is offline   Reply With Quote
Old 11-28-2017, 06:54 AM   #7
shawpa
Member
 
Location: Pittsburgh

Join Date: Aug 2011
Posts: 68
Default

Just for clarification... R2 singles need to be aligned in pbat mode to get proper mapping?
shawpa is offline   Reply With Quote
Old 11-28-2017, 06:55 AM   #8
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 597
Default

Quote:
Originally Posted by shawpa View Post
Just for clarification... R2 singles need to be aligned in pbat mode to get proper mapping?
yes, that's correct.
fkrueger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO