View Single Post
Old 02-23-2010, 10:59 AM   #2
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

Here is some old <GASP>PHP</GASP> code I wrote to do something kind of similar - basically display BLAT mismatches in red and make some %identity calls in a web application. Basically it comes down to reconstructing the alignment based on the weird BLAT qStarts and tStarts coordinates.

I am sorry for how messy this code is, but if you are forced to write this from scratch maybe it will help you get your head around the problem. BTW you can actually run PHP scripts from the command line.

PHP Code:
function displayAlignment($seq,$target,$db,$qName,$strand,$qStart,$qEnd,$tName,$tStart,$tEnd,$qStarts,$tStarts,$blockCount,$blockSizes,$qSize)
{
    if(!
$target)
    {
        
$alignHash['alignment']='no target';
        
$alignHash['verdict']='none';
        return(
$alignHash);
    }
    
$rawTargetSeq=$target;
    if(!
$seq)
    {
        
$alignHash['alignment']='no sequence';
        
$alignHash['verdict']='none';
        return(
$alignHash);
    }
    
    if(
$strand=='-')
    {
        
$seq=revcomp($seq);
    }
    
    
#query sequence
    
for($seqCnt=0,$qPos=$qStarts[0],$tPos=$tStarts[0]-$tStart;$seqCnt<sizeof($qStarts);$seqCnt++)
    {
        
$tStarts[$seqCnt]=$tStarts[$seqCnt]-$tStart;
        
#double gap
        
if($qStarts[$seqCnt]>$qPos && $tStarts[$seqCnt]>$tPos)
        {
            
$doubleGap=min($qStarts[$seqCnt]-$qPos,$tStarts[$seqCnt]-$tPos);
            for(
$i=0;$i<$doubleGap;$i++)
            {
                
$qSeq.=substr($seq,$qPos+$i,1);                    
                
$tSeq.=substr($rawTargetSeq,$tPos+$i,1);
            }
            
$qPos+=$doubleGap;
            
$tPos+=$doubleGap;
        }
        if(
$qStarts[$seqCnt]>$qPos)
        {
            for(
$i=0;$i<$qStarts[$seqCnt]-$qPos;$i++)
            {
                
$tSeq.='-'#insert in query seq, next alignment block starts later
                
$qSeq.=substr($seq,$qPos+$i,1);
            }
        }
        if(
$tStarts[$seqCnt]>$tPos)
        {
            for(
$i=0;$i<$tStarts[$seqCnt]-$tPos;$i++)
            {
                
$qSeq.='-';
                
$tSeq.=substr($rawTargetSeq,$tPos+$i,1);
            }
        }
        
$qSeq.=substr($seq,$qStarts[$seqCnt],$blockSizes[$seqCnt]);
        
$tSeq.=substr($rawTargetSeq,$tStarts[$seqCnt],$blockSizes[$seqCnt]);
        
$qPos=$qStarts[$seqCnt]+$blockSizes[$seqCnt];
        
$tPos=$tStarts[$seqCnt]+$blockSizes[$seqCnt];
    }
    
    if(
$strand=='-')
    {
        
$qSeq=revcomp($qSeq);
        
$tSeq=revcomp($tSeq);
    }

    
$mismatches=0;
    
$mismatches50=0;
    
$mismatches100=0;
    
$redMode=false;
    
$alignLen=strlen($qSeq);#arbitrary, could also be tSeq
    
for($i=0;$i<strlen($qSeq);$i++) {
        if(
preg_match('/[NP]/i',substr($qSeq$i1)) || preg_match('/[NP]/i',substr($tSeq$i1)))
        {
            
#do not count for or against
            
$alignLen--;
        }
        else
        {
            if (
strtoupper(substr($qSeq$i1)) != strtoupper(substr($tSeq$i1))) {
                
$mismatches++;
                if(
$i<50$mismatches50++;
                if(
$i<100$mismatches100++;
                if(!
$redMode)
                {
                    
$fqSeq.="<SPAN class=\"red\">";
                    
$redMode=true;
                }
                
$fqSeq.=substr($qSeq$i1);
                
$ftSeq.=substr($tSeq$i1);
            }
            else
            {    
                if(
$redMode)
                {
                    
$fqSeq.="</SPAN>";
                    
$redMode=false;
                }
                
$fqSeq.=substr($qSeq$i1);
                
$ftSeq.=substr($tSeq$i1);
            }
        }
    }
    if(
$alignLen>=100)
    {
        if(
$mismatches50<=|| $mismatches100 <=|| ((($alignLen-$mismatches)/$alignLen)*100) > 98$verdict="PASS"; else $verdict="FAIL";
        
$stats.="<br>$mismatches50 first 50 (".(((50-$mismatches50)/50)*100)."%)<br>$mismatches100 first 100 (".(((100-$mismatches100)/100)*100)."%)<br>$mismatches total mismatches (".number_format(((($alignLen-$mismatches)/$alignLen)*100),2)."%)";
    }
    else if(
$alignLen>=50)
    {
        if(
$mismatches50<=|| ((($alignLen-$mismatches)/$alignLen)*100) > 98$verdict="PASS"; else $verdict="FAIL";
        
$stats.="<br>$mismatches50 first 50 (".(((50-$mismatches50)/50)*100)."%)<br>$mismatches total mismatches (".number_format(((($alignLen-$mismatches)/$alignLen)*100),2)."%)";
    }
    else
    {
        if(
$mismatches==0$verdict="PASS"; else $verdict.="FAIL";
        
$stats.="<br>$mismatches total mismatches (".number_format(((($alignLen-$mismatches)/$alignLen)*100),2)."%)";
    }
    
        
$alignment.=$stats;
        
$alignment.="<br>query: ".$fqSeq;
        
$alignment.="<br>target:".$ftSeq;
    
#fclose($fa);
    
if(strlen($alignment)<8000)
        
$alignHash['alignment']=$alignment;
    else
        
$alignHash['alignment']='too long to display';
    
$alignHash['verdict']=$verdict;
    return(
$alignHash);

__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote