Elsie 09-25-2012 07:14 PM

Agilent Methyl-Seq - enrichment + BS + sequencing
Dear all, long time listener first time caller here.
I've got some Agilent Methyl-Seq data, I'm looking for feedback from anyone with experience with this data - i.e. enrichment for regions of interest, then BS then sequencing. I've been using Bismark (great!, thanks very much Babraham!) and have just given Trim Galore! a whirl. Any tips on parameters used, trimming etc, I've only got about 66% mapping efficiency and would like to improve upon that.

fkrueger 09-26-2012 12:44 AM

Hi Elsie,

66% mapping efficiency doesn't sound too bad, but could you give us some more information about your data such as read length, whether it was single or paired-end, the mapping parameters used or also the number of sequence that did not map at all or mapped ambiguously (this should be stated in the mapping report)?

Elsie 09-26-2012 01:44 PM

Hi fkrueger,
thanks for the reply.
100bp reads, paired-end, default bismark + trim galore other than specifying paired-ends and I used the default directional.
Sequence pairs with no alignments under any condition: 13154097
Sequence pairs did not map uniquely: 871370
Maybe this is as good as it gets? First time with NGS data so I don't have any feeling yet for good/bad data.

fkrueger 09-26-2012 10:28 PM

Compared to the sequences that did not align at all it seems that only a small number of sequences could did not map uniquely, which shows that you don't seem to have a problem with highly repetitive sequences (which you probably wouldn't expect from a sequence capture). For shotgun BS-Seq data one can typically expect around 75-80% mapping efficiency with 100bp paired-end reads, not quite sure how this figure is affected by the Agilent sequence capture though.

The most common problems with low mapping efficiency for paired-end sequencing (apart from quality and adapter issues) are either that the sequenced fragments are getting so short that both reads of the pair completely contain each other or that the specified insert size is too small (controlled by the -X parameter, 500bp is the default). Trim Galore has an option '--trim1' to avoid the former case:


-t/--trim1          Trims 1 bp off every read from its 3' end.
                    This may be needed for FastQ files that
                    are to be aligned as paired-end data with
                    Bowtie. This is because Bowtie (1) regards
                    alignments like this:

                    R1 --------------------------->
                    R2 <---------------------------
                    as invalid (whenever a start/end coordinate
                    is contained within the other read).

We have seen in the past that simply inlcuding '--trim1' for 100bp paired-end reads managed to increase the mapping efficiency in a sample with fairly small insert sizes from ~49% to 78%!

Just as a side note: if you find that a lot of your fragments are overlapping in the middle you should use the methylation_extractor option '--no_overlap' to avoid having a coverage bias in overlapping parts of the read. I hope this helps.

Elsie 10-02-2012 05:25 PM

Hi Felix,
thanks very much for your suggestions.
Just tried --trim1 and now have 0% mapping efficiency!, argh!
Emailed Agilent to see what they do with this data (their datasheet suggests Bismark), they said try GeneSpring even though GeneSpring cannot yet cope with this sort of data!
thanks for your help.

fkrueger 10-03-2012 12:20 AM

Then there seems to be something going very wrong... Could you maybe email me the details (precise commands) of what you have done so I can try and assist you further?

mariebreen 02-12-2013 09:13 AM

Methyl-Seq Analysis
Hi all,

I too am having trouble with Methyl-Seq analysis.

I was wondering how many samples you multiplexed for Methyl-Seq? I have been trying to pool 4 samples and thought this may be the cause of my problems.

I am using Bismark but my initial % on target is 7% :eek: which I hope cannot be correct!

Any help would be greatly appreciated,


Elsie 02-19-2013 03:53 PM

Hi mariebreen

Sorry can't help with the pooling question, I just received the data with minimal background information (it is more a test of the technology than anything else). I found the comments from Felix extremely helpful and basically followed his suggestions/Bismark manual. FYI Genespring can now cope with this sort of data so maybe you could give that a whirl and see what your results look like?


