Seqanswers Leaderboard Ad

**fkrueger** · 03-25-2011, 08:44 AM

Hi Zeam,

you are right that Bismark doesn't dynamically trim reads at the moment, so you would have to run the sequence file through an appropriate trimmer prior to alignments.

Best,
Felix

**zee** · 03-27-2011, 07:19 PM

The bismark_to_SAM_v3.pl script appears to have a small bug where the negative strand alignments are off by 1 base. This can be rectified in the script by adding 1 to the "-" strand alignment start position.

**olivertam** · 03-27-2011, 07:22 PM

Hi Zee,

Is this in the single-end or paired-end analysis?

Cheers,
Oliver

**zee** · 03-27-2011, 07:24 PM

Hi Oliver,

We have picked it up in paired-end analysis. However we have seen the same bit of code in the single-end subroutine code so it will probably be the same thing because these are "GA" alignments.

Z

**olivertam** · 03-27-2011, 07:35 PM

Hi Z,

I made the changes. Could you test whether it works properly now?

Thanks heaps for picking it up.

Cheers,
Oliver

Attached Files

bismark_to_SAM_v3a.pl (12.4 KB, 40 views)

**zee** · 03-27-2011, 07:55 PM

Yep it works fine now. It would be good if you rather created a BAM file with a pipe to "samtools view -t <chr.sizes> -bS -" for space saving. Just a suggestion because I like working with BAM files that take less disk space.

**olivertam** · 03-27-2011, 08:02 PM

Hi Z,

That's not a bad idea at all. I'll look into that.

Thanks heaps.

Cheers,
Oliver

**zee** · 03-27-2011, 08:09 PM

Hi Oliver,

Attached is a version we modified for our work with novoalign-novomethyl and bismark. We output an extra SAM tag ZB:Z:GA or ZB:Z:CT so that we're able to split these out later in our pipeline.

Attached Files

bismark2bam.pl (12.8 KB, 53 views)

**fkrueger** · 04-07-2011, 02:53 AM

Hi Zee,

Would it be OK if I stuck your bismark2bam.pl script up on our website?

Felix

**zee** · 04-07-2011, 03:05 AM

Sure thing it's open to anybody who wants to use it.

**zeam** · 04-15-2011, 12:18 AM

Hey guys,
Recently ,I was using bismark to process my methyC-seq data, but the efficiency of mapping is not so good.And I know the species I work on is transposon rich which is greter than 60%.In most papers,their mapping efficiency is greater than 70%.But for my methylome data,it's about 40% for single end reads, and 68% for paired ends reads.

Does anyone have encounter the similar mapping problem?Or someone can give me some suggestions about the mapping strategy.Half of my reads are single end.

**fkrueger** · 04-15-2011, 12:38 AM

Hi Zeam,

could you give us a few more details about your actual experiment? A paired-end mapping efficiency of 68% for BS-Seq data sounds quite good to me, but 40% for SE is indeed a bit low.

What was the read length you used for the single end files, what were the mapping parameters and what did the QC of the FastQ files look like?

Also, Bismark should produce the stats:

Sequences with no alignments under any condition: 123
Sequences did not map uniquely: 73591

which should give you a feel whether sequences just fail to align (too many errors, residual adapter sequence or the like) or if they get rejected because they align in too many places (this could be indicative of a high repetitive element content). Another possibility could be that your genome of interest contains something like properly sequenced but unplaced scaffolds which could share a high sequence similarity to other chromosomes. These might also result in a high number of sequences being rejected due to non-unique mapping.

If you like you could send me the Bismark mapping report and the FastQC report (the zipped file) to take a look, maybe it tells us more.

Best,
Felix

**zeam** · 04-15-2011, 01:44 AM

Report from bismark mapping

Hi Felix,
The two attachments are bismark mapping report for a PE and SE lane respectively,and the fastqC report will be emailed to you because of its file size.

The raw reads' length is 100bp base pair.And after trimmed by q 13,small propotion was less than 100 bp.

Thanks for replying!

Best wishes,
Zeam

Attached Files

**fkrueger** · 04-15-2011, 02:06 AM

Hi Zeam,

thanks for the attachments. By just briefly looking at the mapping report you seem to have 45 million alignments which got rejected because of ambiguous mappings. These mismappings do not only mean that the reads map somewhere else, but they map at least twice with the same number of lowest mismatches.

To me this looks like you are using a newly assembled plant genome that contains either a lot of smaller scaffolds that could not be placed into the main genome or something like an unmapped_chromosome. The only solution there is for such a problem is re-indexing your genome but removing very small scaffold first. We once had a similar problem with Chlamydomonas, and if I recall it correctly removing unplacable scaffolds worked like a charm.

A quick word about your mapping parameters. 100bp reads are really quite long for BS-Seq, using -n 2 -l 28 (which are the default settings) is tolerating quite a lot of errors. If I were you I would be much more stringent about the parameters, maybe even use something like -n 2 -l 70 or so, as sequencing errors can not only allow mismappings but will also lead to false methylation calls. Also with reads this long you are likely to read into the adapter on the other side, so you might want to use an adapter trimmer on the reads as well.

Please let me know if I can be of more help,

Best,
Felix

**fkrueger** · 04-21-2011, 06:07 AM

I would like to announce that Bismark v0.5.0 has been released today.

The 3 main modifications are:

- paired-end alignments should now be performed correctly irrespective of the sequence ID format in the FastQ file. This hopefully means that the new format which will be output by the Illumina Casava version 1.8 will no longer cause Bismark to stop.

- the alignment output will now also include extra column(s) for sequence basecall quality scores (both for single and paired-end data). This should facilitate filtering on qualities later on if desired.

- fixed a bug with paired-end alignments where alignments to the CTOT strand were accidentially assigned to the CTOB strand and vice versa.

All associated files can be obtained from:

http://www.bioinformatics.bbsrc.ac.uk/projects/index.html

I hope the modifications do not break too many downstream analysis scripts ... If you spot any flaws please let me know.

Best,
Felix

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News