View Single Post
Old 04-15-2014, 11:40 PM   #4
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

The perl version of @dpryan's script (be careful, its perl!, :P)

Quote:
open(FILE, "Homo_sapiens.GRCh37.68.DEXSeq.gtf");
my $previousGene = "";
while(<FILE>){
next if $_ =~ /aggregate\_gene/;
$_ =~ /gene_id \"(\S+)\"/;
$currentGene = $1;
@lineinfo = split( /\t/, $_ );
$currentStart = $lineinfo[3];
$currentEnd = $lineinfo [4];
$_ =~ /transcripts \"(\S+)\"/;
$transcripts = $1;
$_ =~ /exonic\_part\_number \"(\S+)\"/;
$exonPart = $1;
$_ =~ /gene\_id \"(\S+)\"/;
$geneID = $1;
if( $previousGene eq $currentGene ){
if( $currentStart - $previousEnd > 1 ){
$exonPart = $exonPart - 1;
$exonPart = sprintf( "%3.3d", $exonPart );
$nPart = $exonPart."i";
$end = $currentStart - 1;
$start = $previousEnd + 1;
print "$lineinfo[0]\t$lineinfo[1]\t$lineinfo[2]\t$start\t$end\t.\t$lineinfo[6]\t.\ttranscripts \"$transcripts\"; exonic_part_number \"$nPart\"; gene_id \"$geneID\"\n";
}
}
print $_;
$previousGene = $currentGene;
$previousEnd = $currentEnd;
}
close(FILE);
areyes is offline   Reply With Quote