SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Running GATK, keep receiving errors. prs321 Bioinformatics 2 06-28-2013 09:50 AM
MG-RAST - header format problem? mstagliamonte Bioinformatics 5 06-19-2013 07:29 AM
What is the header format for Interval List in PicardTools? accipiter Bioinformatics 5 04-17-2012 02:37 PM
GATK: VCF has a malformed header ameynert Bioinformatics 0 02-16-2012 06:51 AM
missing header information in bam cause GATK unifiedgenotyper fail foxyg Bioinformatics 3 11-11-2010 09:37 AM

Reply
 
Thread Tools
Old 06-25-2014, 04:16 PM   #1
AnushaC
Member
 
Location: San Diego

Join Date: Sep 2013
Posts: 78
Default GATK interval_list file header format and errors

Hi All,
I am trying to use GATK unified genotyper with -L option. The command works fine with out option but failing with L option.
##################
@HD VN:1.0 SO:unsorted
@SQ SN:chr1 LN:249250621 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1b22b98cdeb4a9304cb5d48026a85128 SP:Homo Sapien
@SQ SN:chr2 LN:243199373 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:a0d9851da00400dec1098a9255ac712e SP:Homo Sapien
@SQ SN:chr3 LN:198022430 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:641e4338fa8d52a5b781bd2a2c08d3c3 SP:Homo Sapien
@SQ SN:chr4 LN:191154276 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:23dccd106897542ad87d2765d28a19a1 SP:Homo Sapien
@SQ SN:chr5 LN:180915260 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:0740173db9ffd264d728f32784845cd7 SP:Homo Sapien
@SQ SN:chr6 LN:171115067 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1d3a93a248d92a729ee764823acbbc6b SP:Homo Sapien
@SQ SN:chr7 LN:159138663 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:618366e953d6aaad97dbe4777c29375e SP:Homo Sapien
@SQ SN:chr8 LN:146364022 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:96f514a9929e410c6651697bded59aec SP:Homo Sapien
@SQ SN:chr9 LN:141213431 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:3e273117f15e0a400f01055d9f393768 SP:Homo Sapien
@SQ SN:chr10 LN:135534747 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:988c28e000e84c26d552359af1ea2e1d SP:Homo Sapien
@SQ SN:chr11 LN:135006516 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:98c59049a2df285c76ffb1c6db8f8b96 SP:Homo Sapien
@SQ SN:chr12 LN:133851895 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:51851ac0e1a115847ad36449b0015864 SP:Homo Sapien
@SQ SN:chr13 LN:115169878 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:283f8d7892baa81b510a015719ca7b0b SP:Homo Sapien
@SQ SN:chr14 LN:107349540 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:98f3cae32b2a2e9524bc19813927542e SP:Homo Sapien
@SQ SN:chr15 LN:102531392 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:e5645a794a8238215b2cd77acb95a078 SP:Homo Sapien
@SQ SN:chr16 LN:90354753 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:fc9b1a7b42b97a864f56b348b06095e6 SP:Homo Sapien
@SQ SN:chr17 LN:81195210 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:351f64d4f4f9ddd45b35336ad97aa6de SP:Homo Sapien
@SQ SN:chr18 LN:78077248 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c SP:Homo Sapien
@SQ SN:chr19 LN:59128983 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1aacd71f30db8e561810913e0b72636d SP:Homo Sapien
@SQ SN:chr20 LN:63025520 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:0dec9660ec1efaaf33281c0d5ea2560f SP:Homo Sapien
@SQ SN:chr21 LN:48129895 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:2979a6085bfe28e3ad6f552f361ed74d SP:Homo Sapien
@SQ SN:chr22 LN:51304566 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:a718acaa6135fdca8357d5bfe94211dd SP:Homo Sapien
@SQ SN:chrX LN:155270560 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:7e0e2e580297b7764e31dbc80c2540dd SP:Homo Sapien
@SQ SN:chrY LN:59373566 AS:hg19 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1e86411d73e6f00a10590f976be01623 SP:Homo Sapien
chr1 564490 564532 + target_1
chr1 564533 564534 + target_2
chr1 564672 564718 + target_3
chr1 564720 564721 + target_4
##########################################
This is second header I tried
@HD VN:1.0 SO:unsorted
@SQ SN:chr1 LN:249250621 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr2 LN:243199373 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr3 LN:198022430 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr4 LN:191154276 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr5 LN:180915260 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr6 LN:171115067 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr7 LN:159138663 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr8 LN:146364022 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr9 LN:141213431 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr10 LN:135534747 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr11 LN:135006516 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr12 LN:133851895 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr13 LN:115169878 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr14 LN:107349540 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr15 LN:102531392 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr16 LN:90354753 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr17 LN:81195210 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr18 LN:78077248 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr19 LN:59128983 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr20 LN:63025520 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr21 LN:48129895 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chr22 LN:51304566 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chrX LN:155270560 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
@SQ SN:chrY LN:59373566 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa
chr1 564490 564532 + target_1
chr1 564533 564534 + target_2
chr1 564672 564718 + target_3
chr1 564720 564721 + target_4
#####################
I am not understanding the exact header format . I tried googling lot but didn't got how should be header
#############
This is error I am getting

INFO 17:05:07,240 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 17:05:30,725 ProgressMeter - done 1.47e+07 23.0 s 1.0 s 100.0% 23.0 s 0.0 s
INFO 17:05:30,726 ProgressMeter - Total runtime 23.49 secs, 0.39 min, 0.01 hours
INFO 17:05:30,728 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 332130 total reads (0.00%)
INFO 17:05:30,728 MicroScheduler - -> 0 reads (0.00% of total) failing BadMateFilter
INFO 17:05:30,728 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO 17:05:30,729 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 17:05:30,730 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 17:05:30,730 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 17:05:30,730 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 17:05:30,731 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
INFO 17:05:31,962 GATKRunReport - Uploaded run statistics report to AWS S3
###########################
AnushaC is offline   Reply With Quote
Old 06-25-2014, 04:17 PM   #2
AnushaC
Member
 
Location: San Diego

Join Date: Sep 2013
Posts: 78
Default

I tried this header too
@HD VN:1.0 SO:unsorted
@SQ SN:chr1 LN:249250621 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ SN:chr2 LN:243199373 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:a0d9851da00400dec1098a9255ac712e
@SQ SN:chr3 LN:198022430 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:641e4338fa8d52a5b781bd2a2c08d3c3
@SQ SN:chr4 LN:191154276 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:23dccd106897542ad87d2765d28a19a1
@SQ SN:chr5 LN:180915260 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:0740173db9ffd264d728f32784845cd7
@SQ SN:chr6 LN:171115067 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1d3a93a248d92a729ee764823acbbc6b
@SQ SN:chr7 LN:159138663 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:618366e953d6aaad97dbe4777c29375e
@SQ SN:chr8 LN:146364022 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:96f514a9929e410c6651697bded59aec
@SQ SN:chr9 LN:141213431 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:3e273117f15e0a400f01055d9f393768
@SQ SN:chr10 LN:135534747 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:988c28e000e84c26d552359af1ea2e1d
@SQ SN:chr11 LN:135006516 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:98c59049a2df285c76ffb1c6db8f8b96
@SQ SN:chr12 LN:133851895 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:51851ac0e1a115847ad36449b0015864
@SQ SN:chr13 LN:115169878 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:283f8d7892baa81b510a015719ca7b0b
@SQ SN:chr14 LN:107349540 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:98f3cae32b2a2e9524bc19813927542e
@SQ SN:chr15 LN:102531392 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:e5645a794a8238215b2cd77acb95a078
@SQ SN:chr16 LN:90354753 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:fc9b1a7b42b97a864f56b348b06095e6
@SQ SN:chr17 LN:81195210 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:351f64d4f4f9ddd45b35336ad97aa6de
@SQ SN:chr18 LN:78077248 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
@SQ SN:chr19 LN:59128983 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1aacd71f30db8e561810913e0b72636d
@SQ SN:chr20 LN:63025520 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:0dec9660ec1efaaf33281c0d5ea2560f
@SQ SN:chr21 LN:48129895 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:2979a6085bfe28e3ad6f552f361ed74d
@SQ SN:chr22 LN:51304566 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:a718acaa6135fdca8357d5bfe94211dd
@SQ SN:chrX LN:155270560 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:7e0e2e580297b7764e31dbc80c2540dd
@SQ SN:chrY LN:59373566 /mnt/idash/Genomics/data_ressources/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1e86411d73e6f00a10590f976be01623
chr1 564490 564532 + target_1
chr1 564533 564534 + target_2
chr1 564672 564718 + target_3
chr1 564720 564721 + target_4
chr1 564722 564724 + target_5
AnushaC is offline   Reply With Quote
Old 06-26-2014, 02:20 AM   #3
rbagnall
Member
 
Location: Sydney, Australia

Join Date: Jun 2010
Posts: 34
Default

GATK -L interval files do not need a header. They should be this format:

chr1:1000-1200
chr1:2004-2507
chr2:457290-457400

etc...
rbagnall is offline   Reply With Quote
Old 06-26-2014, 03:47 AM   #4
bruce01
Senior Member
 
Location: .

Join Date: Mar 2011
Posts: 157
Default

Can also be tab separated:
Code:
CHR   POS1   POS2
bruce01 is offline   Reply With Quote
Old 06-26-2014, 07:58 AM   #5
AnushaC
Member
 
Location: San Diego

Join Date: Sep 2013
Posts: 78
Default

It is not working I did like this
chr1:564490-564532 + target_1
chr1:564533-564534 + target_2
chr1:564672-564718 + target_3
chr1:564720-564721 + target_4
chr1:564722-564724 + target_5
chr1:564739-564741 + target_6
chr1:564771-564774 + target_7
chr1:564776-564807 + target_8
chr1:564898-564956 + target_9
chr1:564965-564966 + target_10
########################
##### ERROR MESSAGE: File associated with name /mnt/oncogxA/anusha/DGFDATA_Tissue_DATA/intersectalltissues/hg19_lite_copy_HR5.interval_list is malformed: Interval file could not be parsed in any supported format. caused by Failed to parse Genome Location string: chr1:564490-564532 + target_1
##### ERROR --
AnushaC is offline   Reply With Quote
Old 06-26-2014, 08:05 AM   #6
AnushaC
Member
 
Location: San Diego

Join Date: Sep 2013
Posts: 78
Default

chr1 564490 564532
chr1 564533 564534
chr1 564672 564718
chr1 564720 564721
chr1 564722 564724
chr1 564739 564741
chr1 564771 564774
chr1 564776 564807
##### ERROR MESSAGE: Badly formed genome loc: Contig 'chr1 564490 564532' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?
##### ERROR -----
I am used same reference fast file many times it worked fine .
AnushaC is offline   Reply With Quote
Old 06-26-2014, 08:08 AM   #7
bruce01
Senior Member
 
Location: .

Join Date: Mar 2011
Posts: 157
Default

Are your chromosome names "chr1", "chr2", "chr3" etc?

They seem to be "chr2 LN:243199373" or something from previous posts.

You have to use the exact same chromosome naming format that is in the reference fasta. Otherwise the software cannot find the chromosome!

When I first encountered this error I made a fasta with chromosomes name "1", "2", "3" etc for ease of use. Might be an idea.

Last edited by bruce01; 06-26-2014 at 08:17 AM. Reason: Clarity
bruce01 is offline   Reply With Quote
Old 06-26-2014, 08:15 AM   #8
AnushaC
Member
 
Location: San Diego

Join Date: Sep 2013
Posts: 78
Default

1 564490 564532 + target_1
1 564533 564534 + target_2
1 564672 564718 + target_3
1 564720 564721 + target_4
1 564722 564724 + target_5
1 564739 564741 + target_6
1 564771 564774 + target_7
1 564776 564807 + target_8
1 564898 564956 + target_9
1 564965 564966 + target_10

I tried like this too but did not work
AnushaC is offline   Reply With Quote
Old 06-26-2014, 08:34 AM   #9
AnushaC
Member
 
Location: San Diego

Join Date: Sep 2013
Posts: 78
Default

@HD VN:1.0 SO:unsorted
@SQ SN:chr1 LN:249250621 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ SN:chr2 LN:243199373 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:a0d9851da00400dec1098a9255ac712e
@SQ SN:chr3 LN:198022430 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:641e4338fa8d52a5b781bd2a2c08d3c3
@SQ SN:chr4 LN:191154276 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:23dccd106897542ad87d2765d28a19a1
@SQ SN:chr5 LN:180915260 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:0740173db9ffd264d728f32784845cd7
@SQ SN:chr6 LN:171115067 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1d3a93a248d92a729ee764823acbbc6b
@SQ SN:chr7 LN:159138663 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:618366e953d6aaad97dbe4777c29375e
@SQ SN:chr8 LN:146364022 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:96f514a9929e410c6651697bded59aec
@SQ SN:chr9 LN:141213431 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:3e273117f15e0a400f01055d9f393768
@SQ SN:chr10 LN:135534747 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:988c28e000e84c26d552359af1ea2e1d
@SQ SN:chr11 LN:135006516 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:98c59049a2df285c76ffb1c6db8f8b96
@SQ SN:chr12 LN:133851895 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:51851ac0e1a115847ad36449b0015864
@SQ SN:chr13 LN:115169878 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:283f8d7892baa81b510a015719ca7b0b
@SQ SN:chr14 LN:107349540 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:98f3cae32b2a2e9524bc19813927542e
@SQ SN:chr15 LN:102531392 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:e5645a794a8238215b2cd77acb95a078
@SQ SN:chr16 LN:90354753 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:fc9b1a7b42b97a864f56b348b06095e6
@SQ SN:chr17 LN:81195210 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:351f64d4f4f9ddd45b35336ad97aa6de
@SQ SN:chr18 LN:78077248 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
@SQ SN:chr19 LN:59128983 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1aacd71f30db8e561810913e0b72636d
@SQ SN:chr20 LN:63025520 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:0dec9660ec1efaaf33281c0d5ea2560f
@SQ SN:chr21 LN:48129895 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:2979a6085bfe28e3ad6f552f361ed74d
@SQ SN:chr22 LN:51304566 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:a718acaa6135fdca8357d5bfe94211dd
@SQ SN:chrX LN:155270560 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:7e0e2e580297b7764e31dbc80c2540dd
@SQ SN:chrY LN:59373566 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1e86411d73e6f00a10590f976be01623
1 564490 564532 + target_1
1 564533 564534 + target_2
1 564672 564718 + target_3

It give me sth like this

WARNING 2014-06-26 09:32:07 IntervalList Ignoring interval for unknown reference: X:153990999-153991009
WARNING 2014-06-26 09:32:07 IntervalList Ignoring interval for unknown reference: X:153991010-153991011
WARNING 2014-06-26 09:32:07 IntervalList Ignoring interval for unknown reference: X:153991012-153991016
WARNING 2014-06-26 09:32:07 IntervalList Ignoring interval for unknown reference: Y:8979526-8979550
WARN 09:32:07,516 IntervalUtils - The interval file /mnt/oncogxA/anusha/DGFDATA_Tissue_DATA/intersectalltissues/test.interval_list contains no intervals that could be parsed.
INFO 09:32:07,518 IntervalUtils - Processing 0 bp from intervals
WARN 09:32:07,520 GenomeAnalysisEngine - The given combination of -L and -XL options results in an empty set. No intervals to process.
INFO 09:32:07,611 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 09:32:07,614 GenomeAnalysisEngine - Done preparing for traversal
AnushaC is offline   Reply With Quote
Old 06-26-2014, 08:41 AM   #10
bruce01
Senior Member
 
Location: .

Join Date: Mar 2011
Posts: 157
Default

Your reference chromosome names are: "chr1 LN:249250621" etc.

So make your interval file with those:

Code:
chr1	LN:249250621	564490	564532
chr1	LN:249250621	564533	564534
chr1	LN:249250621	564672	564718
chr1	LN:249250621	564720	564721
Unless you do this, GATK cannot parse the interval file.
bruce01 is offline   Reply With Quote
Old 06-26-2014, 07:45 PM   #11
AnushaC
Member
 
Location: San Diego

Join Date: Sep 2013
Posts: 78
Default

achimmiri@idash-cloud-707:/mnt/oncogxA/anusha/DGFDATA_Tissue_DATA/intersectalltissues$ head test1.subtract.interval_list
chr1 LN:249250621 564490 564532
chr1 LN:249250621 564533 564534
chr1 LN:249250621 564672 564718
chr1 LN:249250621 564720 564721
chr1 LN:249250621 564722 564724
I used this with and without header
t
@HD VN:1.0 SO:unsorted
@SQ SN:chr1 LN:249250621 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ SN:chr2 LN:243199373 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:a0d9851da00400dec1098a9255ac712e
@SQ SN:chr3 LN:198022430 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:641e4338fa8d52a5b781bd2a2c08d3c3
@SQ SN:chr4 LN:191154276 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:23dccd106897542ad87d2765d28a19a1
@SQ SN:chr5 LN:180915260 UR:file:/raid/references-and-indexes/hg19/hg19_lite/hg19_lite.fa M5:0740173db9ffd264d728f32784845cd7

But still not working in any possible way.
AnushaC is offline   Reply With Quote
Old 06-26-2014, 08:01 PM   #12
AnushaC
Member
 
Location: San Diego

Join Date: Sep 2013
Posts: 78
Default

I still not able to figure out what is the exact format for interval list . I tried every thing so far none of them worked . I checked my dict file my header is exactly same as dict filr
but I am not sure what does body part should look like .
AnushaC is offline   Reply With Quote
Old 10-06-2016, 06:49 PM   #14
QazSeDc
Junior Member
 
Location: Hong Kong

Join Date: Jun 2015
Posts: 7
Default

I have run into the same exact problems!
I have been trying so many formats with different file suffix (.bed .list .intervals .interval_list) but only .bed and .interval_list worked for me so far.

.bed format:
<chr> <start> <end>

.interval_list format:
<chr>:<start>-<end>

However, since .bed is 0-based and .interval_list wouldn't allow any annotation after the interval on each line, I really want to get the .list format working.

I tried the gatk recommendation for .list format and even tried coping it but none worked for me.

I would like to know how did AnushaC end up solving this problem.
QazSeDc is offline   Reply With Quote
Old 10-07-2016, 12:08 AM   #15
QazSeDc
Junior Member
 
Location: Hong Kong

Join Date: Jun 2015
Posts: 7
Default

so i decided to test out what format can be used and got the following results:

command:
java ../GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T DepthOfCoverage -I <in.bam> -R <hg19.fa> -L <test.list> --interval_merging OVERLAPPING_ONLY -o <out.file>
---------------------------------------------------------------------------------------
test1.list

chr1 69089 70010
chr1 367657 368599
chr1 621094 622036
chr1 861320 861395
chr1 865533 865718

test1 result:
##### ERROR MESSAGE: Badly formed genome loc: Contig 'chr1 69089 70010' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?
---------------------------------------------------------------------------------------
test2.list

chr1:69089-70010
chr1:367657-368599
chr1:621094-622036
chr1:861320-861395
chr1:865533-865718

test2 result:
no error message and all results have been correctly calculated.
---------------------------------------------------------------------------------------
test3.list (this format is recommended on the gatk website)

@HD VN:1.0 SO:unsorted
@SQ SN:chr1 LN:249250621 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ SN:chr2 LN:243199373 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:a0d9851da00400dec1098a9255ac712e
...
...
@SQ SN:chrUn_gl000248 LN:39786 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:5a8e43bec9be36c7b49c84d585107776
@SQ SN:chrUn_gl000249 LN:38502 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1d78abec37c15fe29a275eb08d5af236
chr1 69089 70010
chr1 367657 368599
chr1 621094 622036
chr1 861320 861395
chr1 865533 865718

test3 result:
##### ERROR
##### ERROR MESSAGE: File associated with name test3.list is malformed: Interval file could not be parsed in any supported format. caused by Failed to parse Genome Location string: @HD VN:1.0 SO:unsorted
---------------------------------------------------------------------------------------
test4.list

@HD VN:1.0 SO:unsorted
@SQ SN:chr1 LN:249250621 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ SN:chr2 LN:243199373 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:a0d9851da00400dec1098a9255ac712e
...
...
@SQ SN:chrUn_gl000248 LN:39786 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:5a8e43bec9be36c7b49c84d585107776
@SQ SN:chrUn_gl000249 LN:38502 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1d78abec37c15fe29a275eb08d5af236
chr1:69089-70010
chr1:367657-368599
chr1:621094-622036
chr1:861320-861395
chr1:865533-865718

test4 result:
##### ERROR MESSAGE: File associated with name test4.list is malformed: Interval file could not be parsed in any supported format. caused by Failed to parse Genome Location string: @HD VN:1.0 SO:unsorted
---------------------------------------------------------------------------------------
test5.list

chr1 69089 70010 + A
chr1 367657 368599 + B
chr1 621094 622036 + C
chr1 861320 861395 + D
chr1 865533 865718 + E

test5 result:
##### ERROR MESSAGE: Badly formed genome loc: Contig 'chr1 69089 70010 + A' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?
---------------------------------------------------------------------------------------
test6.list

chr1:69089-70010 + target1
chr1:367657-368599 + target2
chr1:621094-622036 + target3
chr1:861320-861395 + target4
chr1:865533-865718 + target5

test6 result:
##### ERROR MESSAGE: File associated with name test6.list is malformed: Interval file could not be parsed in any supported format. caused by Failed to parse Genome Location string: chr1:69089-70010 + target1
---------------------------------------------------------------------------------------
In test2 the format <chr>:<start>-<end> worked well as rbagnall mentioned:
Quote:
Originally Posted by rbagnall View Post
GATK -L interval files do not need a header. They should be this format:

chr1:1000-1200
chr1:2004-2507
chr2:457290-457400

etc...
However I would prefer using the format <chr> <start> <end> if possible for several reasons.
Overall i am not sure why test1 or test3 wouldn't work
sorry for the long post

Last edited by QazSeDc; 10-07-2016 at 12:17 AM.
QazSeDc is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO