SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   BLAT or BLAST: output SNPs between reference and query? (http://seqanswers.com/forums/showthread.php?t=4138)

bbimber 02-23-2010 05:17 AM

BLAT or BLAST: output SNPs between reference and query?
 
When using either BLAT or BLAST, does anyone know of a tool or script that takes the output from the alignment and calculates the positions of difference between the reference and query?

For example, BOWTIE returns something like this for each result: 64:G>N,66:G>N

It is possible to write perl or something to parse the output of BLAST or BLAT, but I'd rather not reinvent this if it's been done before.

Thanks.

Zigster 02-23-2010 10:59 AM

Here is some old <GASP>PHP</GASP> code I wrote to do something kind of similar - basically display BLAT mismatches in red and make some %identity calls in a web application. Basically it comes down to reconstructing the alignment based on the weird BLAT qStarts and tStarts coordinates.

I am sorry for how messy this code is, but if you are forced to write this from scratch maybe it will help you get your head around the problem. BTW you can actually run PHP scripts from the command line.

PHP Code:

function displayAlignment($seq,$target,$db,$qName,$strand,$qStart,$qEnd,$tName,$tStart,$tEnd,$qStarts,$tStarts,$blockCount,$blockSizes,$qSize)
{
    if(!
$target)
    {
        
$alignHash['alignment']='no target';
        
$alignHash['verdict']='none';
        return(
$alignHash);
    }
    
$rawTargetSeq=$target;
    if(!
$seq)
    {
        
$alignHash['alignment']='no sequence';
        
$alignHash['verdict']='none';
        return(
$alignHash);
    }
    
    if(
$strand=='-')
    {
        
$seq=revcomp($seq);
    }
    
    
#query sequence
    
for($seqCnt=0,$qPos=$qStarts[0],$tPos=$tStarts[0]-$tStart;$seqCnt<sizeof($qStarts);$seqCnt++)
    {
        
$tStarts[$seqCnt]=$tStarts[$seqCnt]-$tStart;
        
#double gap
        
if($qStarts[$seqCnt]>$qPos && $tStarts[$seqCnt]>$tPos)
        {
            
$doubleGap=min($qStarts[$seqCnt]-$qPos,$tStarts[$seqCnt]-$tPos);
            for(
$i=0;$i<$doubleGap;$i++)
            {
                
$qSeq.=substr($seq,$qPos+$i,1);                    
                
$tSeq.=substr($rawTargetSeq,$tPos+$i,1);
            }
            
$qPos+=$doubleGap;
            
$tPos+=$doubleGap;
        }
        if(
$qStarts[$seqCnt]>$qPos)
        {
            for(
$i=0;$i<$qStarts[$seqCnt]-$qPos;$i++)
            {
                
$tSeq.='-'#insert in query seq, next alignment block starts later
                
$qSeq.=substr($seq,$qPos+$i,1);
            }
        }
        if(
$tStarts[$seqCnt]>$tPos)
        {
            for(
$i=0;$i<$tStarts[$seqCnt]-$tPos;$i++)
            {
                
$qSeq.='-';
                
$tSeq.=substr($rawTargetSeq,$tPos+$i,1);
            }
        }
        
$qSeq.=substr($seq,$qStarts[$seqCnt],$blockSizes[$seqCnt]);
        
$tSeq.=substr($rawTargetSeq,$tStarts[$seqCnt],$blockSizes[$seqCnt]);
        
$qPos=$qStarts[$seqCnt]+$blockSizes[$seqCnt];
        
$tPos=$tStarts[$seqCnt]+$blockSizes[$seqCnt];
    }
    
    if(
$strand=='-')
    {
        
$qSeq=revcomp($qSeq);
        
$tSeq=revcomp($tSeq);
    }

    
$mismatches=0;
    
$mismatches50=0;
    
$mismatches100=0;
    
$redMode=false;
    
$alignLen=strlen($qSeq);#arbitrary, could also be tSeq
    
for($i=0;$i<strlen($qSeq);$i++) {
        if(
preg_match('/[NP]/i',substr($qSeq$i1)) || preg_match('/[NP]/i',substr($tSeq$i1)))
        {
            
#do not count for or against
            
$alignLen--;
        }
        else
        {
            if (
strtoupper(substr($qSeq$i1)) != strtoupper(substr($tSeq$i1))) {
                
$mismatches++;
                if(
$i<50$mismatches50++;
                if(
$i<100$mismatches100++;
                if(!
$redMode)
                {
                    
$fqSeq.="<SPAN class=\"red\">";
                    
$redMode=true;
                }
                
$fqSeq.=substr($qSeq$i1);
                
$ftSeq.=substr($tSeq$i1);
            }
            else
            {    
                if(
$redMode)
                {
                    
$fqSeq.="</SPAN>";
                    
$redMode=false;
                }
                
$fqSeq.=substr($qSeq$i1);
                
$ftSeq.=substr($tSeq$i1);
            }
        }
    }
    if(
$alignLen>=100)
    {
        if(
$mismatches50<=|| $mismatches100 <=|| ((($alignLen-$mismatches)/$alignLen)*100) > 98$verdict="PASS"; else $verdict="FAIL";
        
$stats.="<br>$mismatches50 first 50 (".(((50-$mismatches50)/50)*100)."%)<br>$mismatches100 first 100 (".(((100-$mismatches100)/100)*100)."%)<br>$mismatches total mismatches (".number_format(((($alignLen-$mismatches)/$alignLen)*100),2)."%)";
    }
    else if(
$alignLen>=50)
    {
        if(
$mismatches50<=|| ((($alignLen-$mismatches)/$alignLen)*100) > 98$verdict="PASS"; else $verdict="FAIL";
        
$stats.="<br>$mismatches50 first 50 (".(((50-$mismatches50)/50)*100)."%)<br>$mismatches total mismatches (".number_format(((($alignLen-$mismatches)/$alignLen)*100),2)."%)";
    }
    else
    {
        if(
$mismatches==0$verdict="PASS"; else $verdict.="FAIL";
        
$stats.="<br>$mismatches total mismatches (".number_format(((($alignLen-$mismatches)/$alignLen)*100),2)."%)";
    }
    
        
$alignment.=$stats;
        
$alignment.="<br>query: ".$fqSeq;
        
$alignment.="<br>target:".$ftSeq;
    
#fclose($fa);
    
if(strlen($alignment)<8000)
        
$alignHash['alignment']=$alignment;
    else
        
$alignHash['alignment']='too long to display';
    
$alignHash['verdict']=$verdict;
    return(
$alignHash);




All times are GMT -8. The time now is 02:58 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.