SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Creating subset BLAST database nupurgupta Bioinformatics 1 06-19-2012 04:37 AM
BWA not creating rBWT file ramiro2k Bioinformatics 2 03-23-2012 03:27 AM
Help Creating a .vam.bai file Desiree Wilson Bioinformatics 3 03-15-2012 02:26 PM
Mirroring/creating the database VIX_Z General 4 07-04-2011 01:25 AM
Bowtie database index question polsum Bioinformatics 1 05-18-2010 11:29 AM

Reply
 
Thread Tools
Old 09-21-2012, 05:23 AM   #1
canisirius
Junior Member
 
Location: Italy

Join Date: Nov 2011
Posts: 5
Question Creating index for Annovar database file

Hi,

Anybody has idea how to create index file for the Annovar annotation database files.

I am running Annovar with snp135_NonFlagged annotation file and it is taking a lot of time compared to snp135 which has .idx file.

Thanks
canisirius is offline   Reply With Quote
Old 11-20-2014, 09:56 AM   #2
lottpaul
Junior Member
 
Location: Davis, CA

Join Date: May 2013
Posts: 1
Default Creating Annovar Index

The Annovar idx format is as follows:

1. File is tab separated.

2. First Line: #BIN <BiN SIZE> <File Size>

3. Remaining lines:
<Chromosome> <BIN Starting Position> <Starting position in File> <Ending position in File>

In Perl, the following routine would create the index of the file in a hash (dictionary/map), then you'd just need to print it out:

Code:
#!/usr/bin perl;
use warnings;
use strict;

die "$0 <Annovar Database File> <BIN Size>" unless @ARGV == 2;
my $input_file = $ARGV[0];
my $bin_size = $ARGV[1];
 
if (!-e $input_file) {
	die "$input_file not found\n";
}

my $file_size = -s $input_file;

my %index;
open(my $in, "<", $input_file) or die "Couldn't open $input_file for indexing\n";

my $previous_file_position = tell $in;

while (my $ln = <$in>) {
	
	#Check input file. Some are (chr,start,stop) and others are (id,chr,start,stop).
	#If you have the latter you'll need to change the next line to account for the id column   
	my ($chr,$start,$stop) = split "\t", $ln;
	my $bin_start = int($start/$bin_size) * $bin_size;
	my $current_file_position = tell $in;

	if (!exists $index{$chr}->{$bin_start}) {
		$index{$chr}->{$bin_start} = [$previous_file_position, $current_file_position];
	}
	else{
		$index{$chr}->{$bin_start}->[1] = $current_file_position;
	}
	
	$previous_file_position = $current_file_position;
}

close $in;

print "#BIN\t$bin_size\t$file_size\n";
foreach my $chr ((1,10..19,2,20,21,22,3..9,"MT","X","Y")){ #Ordered array to match other Annovar idx files
	foreach my $index_region (sort keys %{$index{$chr}}){
		my $start	= $index{$chr}->{$index_region}->[0];
		my $stop	= $index{$chr}->{$index_region}->[1];
		print "$chr\t$index_region\t$start\t$stop\n";
	}
}
I've checked the output against a couple idx files (clinvar20140702, AFR.sites.2012) provided by Annovar and get perfect agreement.

Last edited by lottpaul; 05-29-2015 at 08:42 AM. Reason: Complete Code
lottpaul is offline   Reply With Quote
Old 05-28-2015, 02:37 AM   #3
molgen2
Junior Member
 
Location: Vienna

Join Date: May 2015
Posts: 1
Default

The script is not working for me. I get an error message ("$current_position" requires explicit package name). After changing $current_position to $current_file_position in line 27, I get error messages 'Argument "chr4" isn't numeric in division (/) at ./makeannovarindex.pl line 23, <$in> line 20493.'

then I change line 22 from
my ($chr,$start,$stop) = split "\t", $ln;
to
my ($junk,$chr,$start,$stop) = split "\t", $ln;

the errors stop, but get no output (except for line 1: "#BIN 1000 24679810")

Does anybody experience the same issues? Could anyone get this script working?
molgen2 is offline   Reply With Quote
Old 05-28-2015, 05:05 AM   #4
canisirius
Junior Member
 
Location: Italy

Join Date: Nov 2011
Posts: 5
Default

Hi,

First of, thanks to lottpaul for providing the solutions.

I have modified a line or two, I suppose. So I am attaching the script that I used finally.

Following is the command, I used to run the script.

Code:
perl compileAnnnovarIndex.pl hg19_snp138NonFlagged.txt 1000 > hg19_snp138NonFlagged.txt.idx
I hope it works for you too.
Attached Files
File Type: pl compileAnnnovarIndex.pl (1.2 KB, 438 views)
canisirius is offline   Reply With Quote
Reply

Tags
annovar, indexing, snp

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:49 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO