Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mixter
    Member
    • May 2010
    • 22

    BS-Seq mapping efficiency, what can be expected?

    Hi,

    I'm interested in references to expected and acceptable mapping efficiency (i.e. % of mappable reads) in different BS-Seq scenarios due to my own experiments but this should also be of general interest. I have seen little definite references in BS-seq papers about this as of yet.

    100% efficiency can't be expected since there's always at least a bit of DNA degradation by bisulfite. Furthermore, there are differences between genome-wide and RRBS data, e.g. due to the amount of repeats and ambiguous reads.

    One recent publication by Babraham institute seems to indicate that 80-90% mapping efficiency can be routinely expected in BS-seq base space (Fig 2b of "DNA methylome analysis using short bisulfite sequencing data", http://www.nature.com/nmeth/journal/...meth.1828.html). Did I understand that right, or was this on simulated/ideal data after all?

    But I also found a post by Felix Krueger stating that 68% mapping efficiency is already fair for BS-Seq paired-end (http://seqanswers.com/forums/showthr...?t=8140&page=3).

    About paired-end: as I understand, mapping quality is usually slightly lower in comparison to single end because both mate pairs need to be acceptable. It would also be interesting to elucidate whether there are BS-seq specific differences in mapping efficiency in single- vs. paired-end as well.
  • fkrueger
    Senior Member
    • Sep 2009
    • 627

    #2
    Hi Mixter,

    I think it is fair to say that mapping efficiency in BS-Seq is a function of the read length, altough the gain in mapping efficiency gets smaller with increasing read lengths. The figure you are probably referring to (Fig. 2?) was indeed done with simulated data that did not contain any Ns.

    Real world datasets tend to contain quite a number of sequences that can't be mapped, and this is probably a combination of several factors:
    - reads that come from regions in the genome that are not actually present in the genome assembly (e.g. plenty of sequence in the genome builds around centromeres or towards the ends is simply masked by Ns)
    - reads from repetitive regions that can't be mapped uniquely
    - reads with adapter or primer contamination or other artefacts generated during library generation

    Just to give you some ballpark figures, we regularly see around 60-68% mapping efficiency for 40bp long RRBS (SE) reads. I have seen some high quality (quality and adapter trimmed) longer datasets of 75-100bp that were getting close to the 80% mark, and this is already quite high for standard genomic sequence mapping.

    We have seen that paired-end reads tend to increase the mapping efficiency by a few percent (up to 3 or 4% for 40bp RRBS reads), however this increase in mapping efficiency does not necessarily translate into a linear increase in methylation data because paired-end reads may overlap, and such overlaps generate redundant data. I have tried to write up a few more things about this in a brief RRBS guide that is available here. I believe the homepage might currently experience some difficulties but hopefully it'll be back up soon. If you have any specific queries about your dataset don't hesitate to send me an email directly.

    Comment

    • mixter
      Member
      • May 2010
      • 22

      #3
      Many thanks! For now, we are just looking at public data sets. I just wanted to say that we found this an extremely helpful orientation.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      24 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      29 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      39 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      61 views
      0 reactions
      Last Post SEQadmin2  
      Working...