SEQanswers (
-   Bioinformatics (
-   -   Bowtie (Bismark) 100% of reads fail to align (

mikeusyk 04-04-2016 05:48 AM

Bowtie (Bismark) 100% of reads fail to align
I am using bismark in order to observe the progression of methylation on specific CpG sites of different viral variants. One of the steps of bismark is to use Bowtie in order to align paired end reads to a CT converted reference genome. At this step all reads fail to align. This is strange since the issue is isolated to this single variant. I also aligned all of the reads for this variant to a CT converted version of the reference genome with BWA and all of the reads align at the expected position. Is this due to bowtie masking the reads because of low complexity. Or is there something else that is different between BWA and bowtie that is causing the issue?

Using bowtie just to align reads to the converted reference yields the same alignment error. I also tried playing with the options in bowtie to allow more mismatches and increase the insertion size, but the same failure persists.

dpryan 04-04-2016 06:27 AM

How many of the reads get soft-clipped when you use BWA (I'm assuming via bwa-meth)? I think bismark is using end-to-end alignment, which won't work well if you need some soft-clipping (you could alternatively use Trim Galore! beforehand).

mikeusyk 04-04-2016 10:04 AM

Thank you for your response Ryan. There I quality trim the reads beforehand with print-seq. I did think that it could be the misalignment in the tails that is causing the issue and actually tries over trimming by 20+ bases. This did not help. I made the alignment with vanilla bwa and used the bisulfite converted reference from bismark.

Thank you for pointing me towards bwa-meth. It looks promising and I will likely just switch to it if I cannot get the bowtie to work properly.

fkrueger 04-05-2016 02:11 AM

2 Attachment(s)
Hi Mykhaylo,

I just ran a few tests with your data and it looks like the reason for the poor alignment rates is that your data is riddled with Insertions between bases 120-150.

The general quality towards the 3' end is poor but not not shocking (see the attached FastQC profile).

There are however lots and lots of insertions towards the 3' end (up to 80% for certain positions, see the attached BamQC plot), which is the reason for the poor mapping efficiency. I suspect that something weird might have happened during the run, or maybe it is just some kind of artefact due to the sequence composition? Just briefly looking over it there are at least 10 CTTs and other CCCTTT repeats in the region in question... Alternatively it could of course be the case that the reference genome in that very regions is simply wrong.

Hard trimming the reads to 110bp and Bismark defaults (as in quite strict) already brought the mapping efficiency up to > 80%, allowing more InDels with --score_min L,0,-0.4 brought it up to almost 97%. Just allowing more mismatches on the file as you provided it --score_min L,0,-0.6 also yielded 96% mapping efficiency.

Switching tools is one thing and fine (you can only hope that the data will be clipped), but you need to understand that the data provided (or potentially the genome for the region in question) is flawed.

Cheers, Felix

robertacarraro01 02-02-2017 12:54 AM

Bisulfighter for mapping of bisulfite-converted reads
Hi everyone!
I am trying to use Bisulfighter instead of Bismark for mapping and mc detection of bisulfite-converted samples. I used bsf-call and all seems ok apparently, the mapping works. But then, when parsing the .maf file produced, It tells "ERROR Exception has occured". Does anyone have an idea of what could be the problem? Second question, does anyone have an idea of what is the meaning of the "blocks" into the .maf file?
Really thank you.

All times are GMT -8. The time now is 10:52 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.