Hi,
I have a question regarding determining sequence contamination from Methyl-Seq experiments.
As we all know, for Methyl-Seq experiments, the bisulfite conversion step converts unmethylated C's to U's, which after sequencing become T's.
I have Methyl-Seq data from a supposedly "human" sample sequenced on MiSeq, and the problem is that only 20% of the reads map to human reference hg19, using a methyl-seq specialized aligner BSMAP.
I wish to find out where the 80% of the sequences are coming from, since they don't map to human.
As we can intuitively see, I cannot just do something like take the overrepresented sequences from FASTQC, and do a quick BLAST to search for possible contamination from other organisms, since the overrepresented sequences could be bisulfite converted.
Is there a tool out there that works like BLAST but takes into account bisulfite conversion while mapping sequences ? I know I could use BSMAP on the unmapped sequences (from human) and try and map them to other organisms, but that would take a longer time.
Are there any other easy to use approaches I am missing out on ?
I have a question regarding determining sequence contamination from Methyl-Seq experiments.
As we all know, for Methyl-Seq experiments, the bisulfite conversion step converts unmethylated C's to U's, which after sequencing become T's.
I have Methyl-Seq data from a supposedly "human" sample sequenced on MiSeq, and the problem is that only 20% of the reads map to human reference hg19, using a methyl-seq specialized aligner BSMAP.
I wish to find out where the 80% of the sequences are coming from, since they don't map to human.
As we can intuitively see, I cannot just do something like take the overrepresented sequences from FASTQC, and do a quick BLAST to search for possible contamination from other organisms, since the overrepresented sequences could be bisulfite converted.
Is there a tool out there that works like BLAST but takes into account bisulfite conversion while mapping sequences ? I know I could use BSMAP on the unmapped sequences (from human) and try and map them to other organisms, but that would take a longer time.
Are there any other easy to use approaches I am missing out on ?
Comment