Seqanswers Leaderboard Ad

**swNGS** · 04-27-2012, 01:34 PM

okay, a bit of investigation..
I'm running it like this:

java -d64 -Xmx4g -jar ${GATK_path}/GenomeAnalysisTK.jar \
-I $current_directory/${output_fileName}.sorted.bam \
-R ${ref_genome_fasta} \
-T RealignerTargetCreator \
-o $current_directory/input.bam.list \
--known ${Thousand_genomes_indels_VCFfile}

I now realise that RealignerTargetCreator needs only be run once for one demultiplexed sample, and the end result applied back to them all.
However, the output from regions list file is not the same if I run it on A quick check of the number of rows of the output file is different (but not by much). Can i safely ignore the differences and adopt one targets file for realignment and apply to all sample bam files individually?.

I could not find an option to direct realignerTargetCreator to specific regions to start off with which would be helpful,

Chris

**Heisman** · 04-27-2012, 01:48 PM

Alright, two things:

1. You can run the realigner target creator with or without specifying candidate indels AND limiting it to only look in your targeted regions of interest. This will make it MUCH faster if you have a small target size. Look into using the "-L" option with the GATK.

2. If you update to the latest version of ANNOVAR, it is much faster than previous versions. However, if you have many samples, I suggest you wait until you have VCF files for all of them, then convert all of the VCF files to the annovar input files, and then attach a column to the end of each of these input files with a sample specific identifier. Then, concatenate all of these together and run the whole set through ANNOVAR. After this, you can then grep out each identifier to get individual files if you would like.

Let me know if any of this is not clear.

**swNGS** · 04-28-2012, 12:11 AM

That's really helpful, it makes sense that realignertargetcreator can accept a regions file as the other GATK tools seem to. I'll look into it.

On the Annovar front, the version I'm using is fairly recent ...last 8?weeks.
The stage that seems to two forever is annotation with its dbSNP file.
I was wondering if I extracted the variants falling in my target regions I wouldn't have to present the entire dbSNP file to Annovar?

**Heisman** · 04-28-2012, 06:49 AM

As long as you have the version from February 23 or later it should be fast: http://www.openbioinformatics.org/annovar/

Did you make sure to download the newest index files? It should have a file size of 103885428 (for dbSNP 135).

I don't think you can get away with just extracting the variants in your target regions because then the index file will be messed up.

If it's a real issue I would go with my idea of concatenating all samples data together with sample-specific tags after they are all analyzed specifically and then running it through annovar. If you need help setting that up let me know.

**Michael.James.Clark** · 06-05-2012, 11:12 AM

RE: ANNOVAR dbSNP annotation taking a long time.

Try using "--batchsize 50m" in your annotate_variation.pl command.

Dropped my run time on dbSNP135 from 20 hours to 20 minutes on 7.5M variants.

**swNGS** · 06-13-2012, 05:22 AM

I am running Annovar from the summarize_annovar.pl script.

My understanding of what you are suggesting is to edit this file at the step which annotates with dbsnp as follows:

if ($valistep{7}) {
$sc = "annotate_variation.pl -filter -batchsize 50m -dbtype snp$verdbsnp -buildver $buildver -outfile $outfile $queryfile $dbloc";
print STDERR "\nNOTICE: Running step 7 with system command <$sc>\n";
system ($sc) and die "Error running system command: <$sc>\n";

...however this does not seem to change the run time

am I doing it correctly ?

**SeekAnswers** · 06-13-2012, 07:18 AM

If your are just annotating the VCF, you could may be split the VCF by chromosome and run jobs in parallel if you have the facility to do that.

**binlangman** · 12-25-2013, 09:15 PM

GATK RealignerTargetCreator error

Hello! I am working with some Illumina FASTQ files. I used bwa to align the data. I then sorted, indexed and marked the bam file for PCR duplicates using picard. And then in order to realign around the indels, I used the RealignerTargetCreator. Here is the error:

INFO 11:12:43,040 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:12:43,043 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:27:51
INFO 11:12:43,043 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 11:12:43,043 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 11:12:43,048 HelpFormatter - Program Args: -T RealignerTargetCreator -R NC_010473.fasta -I dedup_reads.bam -o target_intervals.list
INFO 11:12:43,049 HelpFormatter - Date/Time: 2013/12/26 11:12:43
INFO 11:12:43,049 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:12:43,049 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:12:43,708 GenomeAnalysisEngine - Strictness is SILENT
INFO 11:12:44,116 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 11:12:44,127 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 11:12:44,145 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
INFO 11:12:44,237 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 11:12:44,361 GenomeAnalysisEngine - Done preparing for traversal
INFO 11:12:44,361 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 11:12:44,361 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 11:12:48,747 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.7-4-g6f46d11):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/lmbin/dedup_reads.bam} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 63; please see the GATK --help documentation for options related to this error
##### ERROR ------------------------------------------------------------------------------------------
I'm a greener, I don't know how to solve the problem in order to finish realignment. Can you give some advice?

**blakeoft** · 01-21-2014, 07:24 AM

Originally posted by binlangman View Post

Hello! I am working with some Illumina FASTQ files. I used bwa to align the data. I then sorted, indexed and marked the bam file for PCR duplicates using picard. And then in order to realign around the indels, I used the RealignerTargetCreator. Here is the error:

INFO 11:12:43,040 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:12:43,043 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:27:51
INFO 11:12:43,043 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 11:12:43,043 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 11:12:43,048 HelpFormatter - Program Args: -T RealignerTargetCreator -R NC_010473.fasta -I dedup_reads.bam -o target_intervals.list
INFO 11:12:43,049 HelpFormatter - Date/Time: 2013/12/26 11:12:43
INFO 11:12:43,049 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:12:43,049 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:12:43,708 GenomeAnalysisEngine - Strictness is SILENT
INFO 11:12:44,116 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 11:12:44,127 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 11:12:44,145 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
INFO 11:12:44,237 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 11:12:44,361 GenomeAnalysisEngine - Done preparing for traversal
INFO 11:12:44,361 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 11:12:44,361 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 11:12:48,747 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.7-4-g6f46d11):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/lmbin/dedup_reads.bam} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 63; please see the GATK --help documentation for options related to this error
##### ERROR ------------------------------------------------------------------------------------------
I'm a greener, I don't know how to solve the problem in order to finish realignment. Can you give some advice?

Can you post the full command that gave you this error?

**brdido** · 04-30-2014, 09:13 AM

You are probably aligning reads from illumina 1.3 or 1.5. (phred +64)

GATK expects to be 1.8 (phred+33)

The easiest workaround is to convert base qualities, take a look at:

FASTQ format - Wikipedia

http://en.wikipedia.org/wiki/FASTQ_format

There is also command lines examples to convert them.

cheers

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Any way to speed up GATK RealignerTargetCreator ?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News