Seqanswers Leaderboard Ad

**rflrob** · 09-13-2013, 10:48 AM

This snippet of python might do the trick for you, for the simplest possible definition of "counting mismatches", where that's any column where it's not all identical. It assumes you have BioPython installed, and that you've done your alignments in Clustal format (MUSCLE claims to be able to output in this format, and I have no reason to disbelieve it).

Code:

from Bio import AlignIO
from glob import glob

for file in glob('*.aln'):
    a = AlignIO.read(file, 'clustal')
    print file, len([1 for i in range(a.get_alignment_length())
                     if len(set(a[:,i])) != 1])
    # Iterate over the alignment in a list comprehension, and only keep 
    # elements where there's more than one value.

**Richard Finney** · 09-13-2013, 11:16 AM

"blat" by Jim Kent of UCSC is a good tool.

There are various methods for structuring a command batch file to process (input data) files.

blat (PSL format) output contains a count of the number of mis matches.

**pepperoni** · 09-13-2013, 07:25 PM

Many thanks to both of you for your quick replies. I have just tried rflrob's python script, and it works great

Originally posted by rflrob View Post

This snippet of python might do the trick for you, for the simplest possible definition of "counting mismatches", where that's any column where it's not all identical. It assumes you have BioPython installed, and that you've done your alignments in Clustal format (MUSCLE claims to be able to output in this format, and I have no reason to disbelieve it).

Code:

from Bio import AlignIO
from glob import glob

for file in glob('*.aln'):
    a = AlignIO.read(file, 'clustal')
    print file, len([1 for i in range(a.get_alignment_length())
                     if len(set(a[:,i])) != 1])
    # Iterate over the alignment in a list comprehension, and only keep 
    # elements where there's more than one value.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 58 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Tool for counting mismatches in hundreds of files?

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News