SEQanswers

Go Back   SEQanswers > Applications Forums > Epigenetics



Similar Threads
Thread Thread Starter Forum Replies Last Post
htseq-count for sam and gff3 sofia17 RNA Sequencing 45 11-04-2016 03:32 PM
multiBamCov or htseq-count to count read per feature ? NicoBxl Bioinformatics 1 07-03-2012 02:05 AM
Problem using HTSeq count with SAM file without quality score flashton Bioinformatics 2 04-11-2012 03:29 AM
count reads or count base-pairs yuelics Introductions 3 07-29-2011 05:41 AM
Quantification: count reads or count base pairs? yuelics Bioinformatics 0 07-27-2011 04:48 AM

Reply
 
Thread Tools
Old 10-01-2012, 05:26 AM   #1
mixter
Member
 
Location: Munich, Germany

Join Date: May 2010
Posts: 22
Default Non-BS mismatch count in .SAM (esp. Bismark)?

Hello,

This question should be of general interest, and judging from documentation/publication I am not sure about the issue, so here it goes:

In Bismark, where can I see the number of non-BS-mismatches in aligned reads (i.e. in the SAM file)?

I'm especially asking whether the number ## in NM:i:## is reliable. As I understand, it reflects only the true non-BS-mismatches (no C->T and no G->A conversion), since it is always lower than the number of total methylated+unmethylated positions.

So, I'd just like to be sure that BS-mismatches aren't in any way taken into account in the number of the NM string.

I'm interested in Bismark's behavior but note that this question may also be interesting for other mappers like bsmap, which might report differently here.

Thanks!
mixter is offline   Reply With Quote
Old 10-01-2012, 06:42 AM   #2
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Quote:
Originally Posted by mixter View Post
Hello,

This question should be of general interest, and judging from documentation/publication I am not sure about the issue, so here it goes:

In Bismark, where can I see the number of non-BS-mismatches in aligned reads (i.e. in the SAM file)?

I'm especially asking whether the number ## in NM:i:## is reliable. As I understand, it reflects only the true non-BS-mismatches (no C->T and no G->A conversion), since it is always lower than the number of total methylated+unmethylated positions.

So, I'd just like to be sure that BS-mismatches aren't in any way taken into account in the number of the NM string.

I'm interested in Bismark's behavior but note that this question may also be interesting for other mappers like bsmap, which might report differently here.

Thanks!
Hi Mixter,

The NM field in the SAM output gives you the edit distance of the observed sequence to the genomic sequence (detailed in the XX field), so this includes all kinds of mismatchs (including bisulfite ones).

In the default version, there is no special field describing non-BS mismatches, I did have however have some applications myself in which I needed to know this number. Bismark is prepared to output the number of non-BS mismatched as an addtional column but you need to manually change the last position in the Bismark print SAM command:

locate in Bismark the SAM output command (roughly line 5873):
Code:
  
# SAM format: QNAME, FLAG, RNAME, 1-based POS, MAPQ, CIGAR, RNEXT, PNEXT, TLEN, SEQ, QUAL, optional fields
  print OUT join("\t",($id,$flag,$chr,$start,$mapq,$cigar,$rnext,$pnext,$tlen,$actual_seq,$qual,$NM_tag,$XX_tag,$XM_tag,$XR_tag,$XG_tag)),"\n";
and add at the end but still within the braces: $number_of_mismatches, like so:
Code:
  print OUT join("\t",($id,$flag,$chr,$start,$mapq,$cigar,$rnext,$pnext,$tlen,$actual_seq,$qual,$NM_tag,$XX_tag,$XM_tag,$XR_tag,$XG_tag,$number_of_mismatches)),"\n";
For paired-end data it is quite similar, but the variables are then called $number_of_mismatches_1 and $number_of_mismatches_2. Let me know if you require any help with this.

Cheers,
Felix
fkrueger is offline   Reply With Quote
Old 10-10-2012, 03:41 AM   #3
mixter
Member
 
Location: Munich, Germany

Join Date: May 2010
Posts: 22
Default

Hello,

Many thanks for the help. I've tried making these modifications, but it seems I need a bit more help. The mismatch number was not displayed in SAM output for either- single or paired-end (adding the numbers together with optional tags XA and XB did display those tags, no python warnings...).

I've attached the modified binary and a diff (bismark 0.7.6), it would be much appreciated if you could correct the output whenever you have time. This probably would be an attractive option for other users in a future bismark version as well, especially for bioinformatics/statistics purposes.

Thanks!
Attached Files
File Type: txt bismark.diff.txt (1.9 KB, 2 views)
File Type: zip bismark.modified.zip (49.4 KB, 1 views)
mixter is offline   Reply With Quote
Old 10-10-2012, 04:59 AM   #4
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Quote:
Originally Posted by mixter View Post
Hello,

Many thanks for the help. I've tried making these modifications, but it seems I need a bit more help. The mismatch number was not displayed in SAM output for either- single or paired-end (adding the numbers together with optional tags XA and XB did display those tags, no python warnings...).

I've attached the modified binary and a diff (bismark 0.7.6), it would be much appreciated if you could correct the output whenever you have time. This probably would be an attractive option for other users in a future bismark version as well, especially for bioinformatics/statistics purposes.

Thanks!
Hi Mixter,

I have just included the code into the current version of Bismark, 0.7.7, and it does work exactly as expected. I'll attach a copy here. It is probably a good idea to have an output for this in a coming release. Let me know how you get on.
Attached Files
File Type: pl bismark 0.7.7.pl (310.5 KB, 1 views)
fkrueger is offline   Reply With Quote
Old 10-11-2012, 01:07 AM   #5
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Quote:
Originally Posted by mixter View Post
Hello,

Many thanks for the help. I've tried making these modifications, but it seems I need a bit more help. The mismatch number was not displayed in SAM output for either- single or paired-end (adding the numbers together with optional tags XA and XB did display those tags, no python warnings...).

I've attached the modified binary and a diff (bismark 0.7.6), it would be much appreciated if you could correct the output whenever you have time. This probably would be an attractive option for other users in a future bismark version as well, especially for bioinformatics/statistics purposes.

Thanks!
Hi Mixter,

I have actually just implemented a new option '--non_bs_mm' into the current version (0.7.7) which lets you choose between the standard and the mm-extended SAM output. Let me know if you encounter any problems.


Edit: It is worth noting that this option works currently only Bowtie 1 alignments. Bowtie 2 uses a different concept than mismatches alone, and we are currently thinkout about ways of how to best implement the Bowtie 2 alignment score (AS).
Attached Files
File Type: pl bismark_0.7.7_non_bs.pl (311.3 KB, 4 views)

Last edited by fkrueger; 10-11-2012 at 07:11 AM.
fkrueger is offline   Reply With Quote
Old 10-15-2012, 04:57 AM   #6
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Right, I have now amended Bismark to also output a number of mismatches for Bowtie 2 alignments. For this to work Bismark compares the alignments scores and CIGAR strings of best alignments, and separates out the number of mismatches and potential insertions or deletions. This will also work correctly if non-default gap opening and extension penalties were specified.

Unless there are any reports of problems with the option --non_bs_mismatches it will be part of the next release.
Attached Files
File Type: pl bismark 0.7.7 non-bisulfite mismatches all modes.pl (318.0 KB, 2 views)
fkrueger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:25 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO