View Single Post
Old 09-26-2012, 10:28 PM   #4
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Compared to the sequences that did not align at all it seems that only a small number of sequences could did not map uniquely, which shows that you don't seem to have a problem with highly repetitive sequences (which you probably wouldn't expect from a sequence capture). For shotgun BS-Seq data one can typically expect around 75-80% mapping efficiency with 100bp paired-end reads, not quite sure how this figure is affected by the Agilent sequence capture though.

The most common problems with low mapping efficiency for paired-end sequencing (apart from quality and adapter issues) are either that the sequenced fragments are getting so short that both reads of the pair completely contain each other or that the specified insert size is too small (controlled by the -X parameter, 500bp is the default). Trim Galore has an option '--trim1' to avoid the former case:

Code:
-t/--trim1           Trims 1 bp off every read from its 3' end. 
                     This may be needed for FastQ files that 
                     are to be aligned as paired-end data with 
                     Bowtie. This is because Bowtie (1) regards
                     alignments like this:

                     R1 --------------------------->
                     R2 <---------------------------
                     
                     as invalid (whenever a start/end coordinate
                     is contained within the other read).
We have seen in the past that simply inlcuding '--trim1' for 100bp paired-end reads managed to increase the mapping efficiency in a sample with fairly small insert sizes from ~49% to 78%!

Just as a side note: if you find that a lot of your fragments are overlapping in the middle you should use the methylation_extractor option '--no_overlap' to avoid having a coverage bias in overlapping parts of the read. I hope this helps.
fkrueger is offline   Reply With Quote