Seqanswers Leaderboard Ad

**lottpaul** · 11-20-2014, 09:56 AM

Creating Annovar Index

The Annovar idx format is as follows:

1. File is tab separated.

2. First Line: #BIN <BiN SIZE> <File Size>

3. Remaining lines:
<Chromosome> <BIN Starting Position> <Starting position in File> <Ending position in File>

In Perl, the following routine would create the index of the file in a hash (dictionary/map), then you'd just need to print it out:

Code:

#!/usr/bin perl;
use warnings;
use strict;

die "$0 <Annovar Database File> <BIN Size>" unless @ARGV == 2;
my $input_file = $ARGV[0];
my $bin_size = $ARGV[1];
 
if (!-e $input_file) {
	die "$input_file not found\n";
}

my $file_size = -s $input_file;

my %index;
open(my $in, "<", $input_file) or die "Couldn't open $input_file for indexing\n";

my $previous_file_position = tell $in;

while (my $ln = <$in>) {
	
	#Check input file. Some are (chr,start,stop) and others are (id,chr,start,stop).
	#If you have the latter you'll need to change the next line to account for the id column   
	my ($chr,$start,$stop) = split "\t", $ln;
	my $bin_start = int($start/$bin_size) * $bin_size;
	my $current_file_position = tell $in;

	if (!exists $index{$chr}->{$bin_start}) {
		$index{$chr}->{$bin_start} = [$previous_file_position, $current_file_position];
	}
	else{
		$index{$chr}->{$bin_start}->[1] = $current_file_position;
	}
	
	$previous_file_position = $current_file_position;
}

close $in;

print "#BIN\t$bin_size\t$file_size\n";
foreach my $chr ((1,10..19,2,20,21,22,3..9,"MT","X","Y")){ #Ordered array to match other Annovar idx files
	foreach my $index_region (sort keys %{$index{$chr}}){
		my $start	= $index{$chr}->{$index_region}->[0];
		my $stop	= $index{$chr}->{$index_region}->[1];
		print "$chr\t$index_region\t$start\t$stop\n";
	}
}

I've checked the output against a couple idx files (clinvar20140702, AFR.sites.2012) provided by Annovar and get perfect agreement.

**molgen2** · 05-28-2015, 01:37 AM

The script is not working for me. I get an error message ("$current_position" requires explicit package name). After changing $current_position to $current_file_position in line 27, I get error messages 'Argument "chr4" isn't numeric in division (/) at ./makeannovarindex.pl line 23, <$in> line 20493.'

then I change line 22 from
my ($chr,$start,$stop) = split "\t", $ln;
to
my ($junk,$chr,$start,$stop) = split "\t", $ln;

the errors stop, but get no output (except for line 1: "#BIN 1000 24679810")

Does anybody experience the same issues? Could anyone get this script working?

**canisirius** · 05-28-2015, 04:05 AM

Hi,

First of, thanks to lottpaul for providing the solutions.

I have modified a line or two, I suppose. So I am attaching the script that I used finally.

Following is the command, I used to run the script.

Code:

perl compileAnnnovarIndex.pl hg19_snp138NonFlagged.txt 1000 > hg19_snp138NonFlagged.txt.idx

I hope it works for you too.

Attached Files

compileAnnnovarIndex.pl (1.2 KB, 512 views)

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Creating index for Annovar database file

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News