SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Picard's SortSam error with a merged bam file tomato2 Bioinformatics 1 11-01-2011 09:18 PM
how to get number of records of bam file using picard jay2008 Bioinformatics 0 05-23-2011 03:11 PM
repeatMasker.interval File during FusionSeq run maojn7488 Bioinformatics 0 03-08-2011 05:17 AM
Picard HsMetrics NEAR_BAIT_BASES interval Bruins Bioinformatics 0 01-25-2011 02:46 AM
Picard MarkDuplicates - How to identify duplicates in generated BAM file makarovv Bioinformatics 6 11-10-2010 08:02 AM

Reply
 
Thread Tools
Old 10-19-2011, 07:42 AM   #1
nexgengirl
Member
 
Location: Maryland

Join Date: Apr 2010
Posts: 31
Default Picard Interval File

Hi,

I have some Illumina TruSeq exome data and I want to use the picard tool CalculateHsMetrics.jar to look at the hybrid selection. I downloaded the TruSeq bed file from http://www.illumina.com/support/sequ...downloads.ilmn and the reference file for mapping from ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz


My question is on the format of the interval file. I've looked at http://www.broadinstitute.org/gsa/wi...s_for_the_GATK and used that as a template for my header but picard is complaining about this header and that the sequence dictionaries are not the same size.

Here's what my interval_list file looks like:

@HD VN:1.0 SO:coordinate
@SQ SN:1 LN:249250621 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:1b22b98cdeb4a9304cb5d48026a85128
SP:Homo Sapiens
@SQ SN:2 LN:243199373 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:a0d9851da00400dec1098a9255ac712e
SP:Homo Sapiens
@SQ SN:3 LN:198022430 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:fdfd811849cc2fadebc929bb925902e5
SP:Homo Sapiens
@SQ SN:4 LN:191154276 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:23dccd106897542ad87d2765d28a19a1
SP:Homo Sapiens
@SQ SN:5 LN:180915260 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:0740173db9ffd264d728f32784845cd7
SP:Homo Sapiens
@SQ SN:6 LN:171115067 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:1d3a93a248d92a729ee764823acbbc6b
SP:Homo Sapiens
@SQ SN:7 LN:159138663 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:618366e953d6aaad97dbe4777c29375e
SP:Homo Sapiens
@SQ SN:8 LN:146364022 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:96f514a9929e410c6651697bded59aec
SP:Homo Sapiens
@SQ SN:9 LN:141213431 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:3e273117f15e0a400f01055d9f393768
SP:Homo Sapiens
@SQ SN:10 LN:135534747 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:988c28e000e84c26d552359af1ea2e1d
SP:Homo Sapiens
@SQ SN:11 LN:135006516 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:98c59049a2df285c76ffb1c6db8f8b96
SP:Homo Sapiens
@SQ SN:12 LN:133851895 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:51851ac0e1a115847ad36449b0015864
SP:Homo Sapiens
@SQ SN:13 LN:115169878 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:283f8d7892baa81b510a015719ca7b0b
SP:Homo Sapiens
@SQ SN:14 LN:107349540 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:98f3cae32b2a2e9524bc19813927542e
SP:Homo Sapiens
@SQ SN:15 LN:102531392 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:e5645a794a8238215b2cd77acb95a078
SP:Homo Sapiens
@SQ SN:16 LN:90354753 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:fc9b1a7b42b97a864f56b348b06095e6
SP:Homo Sapiens
@SQ SN:17 LN:81195210 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:351f64d4f4f9ddd45b35336ad97aa6de
SP:Homo Sapiens
@SQ SN:18 LN:78077248 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
SP:Homo Sapiens
@SQ SN:19 LN:59128983 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:1aacd71f30db8e561810913e0b72636d
SP:Homo Sapiens
@SQ SN:20 LN:63025520 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:0dec9660ec1efaaf33281c0d5ea2560f
SP:Homo Sapiens
@SQ SN:21 LN:48129895 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:2979a6085bfe28e3ad6f552f361ed74d
SP:Homo Sapiens
@SQ SN:22 LN:51304566 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:a718acaa6135fdca8357d5bfe94211dd
SP:Homo Sapiens
@SQ SN:X LN:155270560 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:7e0e2e580297b7764e31dbc80c2540dd
SP:Homo Sapiens
@SQ SN:Y LN:59373566 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:1fa3474750af0948bdf97d5a0ee52e51
SP:Homo Sapiens
@SQ SN:MT LN:16569 AS:GRCh37 UR:ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz M5:c68f52674c9fb33aef52dcf399755519
SP:Homo Sapiens
1 14362 14829 + chr1:14363-14829:WASH5P
1 14969 15038 + chr1:14970-15038:WASH5P
1 15795 15947 + chr1:15796-15947:WASH5P
1 16606 16765 + chr1:16607-16765:WASH5P
1 16857 17055 + chr1:16858-17055:WASH5P
1 17232 17368 + chr1:17233-17368:WASH5P
1 17605 17742 + chr1:17606-17742:WASH5P
1 69090 70008 + chr1:69091-70008:OR4F5


I guess I'm stuck and any help would be appreciated. Thanks.
nexgengirl is offline   Reply With Quote
Old 10-19-2011, 08:46 AM   #2
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Check the chromosome name in the two files. The sanger reference file uses (1, 2, 3, X, Y) while most illumina files follow the UCSC convention (chr1, chr2, chr3, chrX, chrY). So you might need to remove the chr from the bed file. This is a fairly common issue when people mix annotation sources.
Jon_Keats is offline   Reply With Quote
Old 10-19-2011, 09:01 AM   #3
nexgengirl
Member
 
Location: Maryland

Join Date: Apr 2010
Posts: 31
Default

Thanks, I made sure to check that and I have them so they are the same in both files but the error is still there
nexgengirl is offline   Reply With Quote
Old 10-19-2011, 09:19 AM   #4
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Try removing the first line of the header, I had to do this for Picards CollectRnaSeqMetrics application that uses the same style of list

Code:
@HD VN:1.0 SO:coordinate
Jon_Keats is offline   Reply With Quote
Old 10-19-2011, 11:21 AM   #5
nexgengirl
Member
 
Location: Maryland

Join Date: Apr 2010
Posts: 31
Default

Unfortunately, that did not work either. Thank you for your help.
nexgengirl is offline   Reply With Quote
Old 09-06-2012, 10:23 AM   #6
bwubb
Member
 
Location: Philadelphia

Join Date: Jan 2012
Posts: 58
Default

Quote:
Originally Posted by nexgengirl View Post
Unfortunately, that did not work either. Thank you for your help.
Did you ever find a solution for this?

I am still having a similar error with this, in that picard yells that my interval list does not have a header, when I have followed all the intstructions I could find in order to make it properly.

Ive tried it with and without the @HD line as well.
bwubb is offline   Reply With Quote
Old 09-08-2012, 09:24 AM   #7
nexgengirl
Member
 
Location: Maryland

Join Date: Apr 2010
Posts: 31
Default

Yes, I did get this to work. The file is too large to attach so I'll show the header of the file below.

Here's the command I used:

java -Xmx10g -jar /path/to/picard-tools-1.54/CalculateHsMetrics.jar BAIT_INTERVALS=Truseq_for_picard_hs.bed TARGET_INTERVALS=Truseq_for_picard_hs.bed INPUT=sample.bam OUTPUT=sample.hybrid.stats.txt REFERENCE_SEQUENCE=/path/to/human_g1k_v37.fasta PER_TARGET_COVERAGE=sample.per.target.coverage.txt VALIDATION_STRINGENCY=LENIENT


#head of the file
head -130 Truseq_for_picard_hs.bed

@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
@SQ SN:MT LN:16569
@SQ SN:GL000207.1 LN:4262
@SQ SN:GL000226.1 LN:15008
@SQ SN:GL000229.1 LN:19913
@SQ SN:GL000231.1 LN:27386
@SQ SN:GL000210.1 LN:27682
@SQ SN:GL000239.1 LN:33824
@SQ SN:GL000235.1 LN:34474
@SQ SN:GL000201.1 LN:36148
@SQ SN:GL000247.1 LN:36422
@SQ SN:GL000245.1 LN:36651
@SQ SN:GL000197.1 LN:37175
@SQ SN:GL000203.1 LN:37498
@SQ SN:GL000246.1 LN:38154
@SQ SN:GL000249.1 LN:38502
@SQ SN:GL000196.1 LN:38914
@SQ SN:GL000248.1 LN:39786
@SQ SN:GL000244.1 LN:39929
@SQ SN:GL000238.1 LN:39939
@SQ SN:GL000202.1 LN:40103
@SQ SN:GL000234.1 LN:40531
@SQ SN:GL000232.1 LN:40652
@SQ SN:GL000206.1 LN:41001
@SQ SN:GL000240.1 LN:41933
@SQ SN:GL000236.1 LN:41934
@SQ SN:GL000241.1 LN:42152
@SQ SN:GL000243.1 LN:43341
@SQ SN:GL000242.1 LN:43523
@SQ SN:GL000230.1 LN:43691
@SQ SN:GL000237.1 LN:45867
@SQ SN:GL000233.1 LN:45941
@SQ SN:GL000204.1 LN:81310
@SQ SN:GL000198.1 LN:90085
@SQ SN:GL000208.1 LN:92689
@SQ SN:GL000191.1 LN:106433
@SQ SN:GL000227.1 LN:128374
@SQ SN:GL000228.1 LN:129120
@SQ SN:GL000214.1 LN:137718
@SQ SN:GL000221.1 LN:155397
@SQ SN:GL000209.1 LN:159169
@SQ SN:GL000218.1 LN:161147
@SQ SN:GL000220.1 LN:161802
@SQ SN:GL000213.1 LN:164239
@SQ SN:GL000211.1 LN:166566
@SQ SN:GL000199.1 LN:169874
@SQ SN:GL000217.1 LN:172149
@SQ SN:GL000216.1 LN:172294
@SQ SN:GL000215.1 LN:172545
@SQ SN:GL000205.1 LN:174588
@SQ SN:GL000219.1 LN:179198
@SQ SN:GL000224.1 LN:179693
@SQ SN:GL000223.1 LN:180455
@SQ SN:GL000195.1 LN:182896
@SQ SN:GL000212.1 LN:186858
@SQ SN:GL000222.1 LN:186861
@SQ SN:GL000200.1 LN:187035
@SQ SN:GL000193.1 LN:189789
@SQ SN:GL000194.1 LN:191469
@SQ SN:GL000225.1 LN:211173
@SQ SN:GL000192.1 LN:547496
1 14362 14829 + chr1:14363-14829:WASH5P
1 14969 15038 + chr1:14970-15038:WASH5P
1 15795 15947 + chr1:15796-15947:WASH5P
1 16606 16765 + chr1:16607-16765:WASH5P
1 16857 17055 + chr1:16858-17055:WASH5P
1 17232 17368 + chr1:17233-17368:WASH5P
1 17605 17742 + chr1:17606-17742:WASH5P
1 69090 70008 + chr1:69091-70008:OR4F5
1 661139 665184 + chr1:661140-665184:LOC100133331
1 761586 762902 + chr1:761587-762902:NCRNA00115
1 763063 763155 + chr1:763064-763155:LOC643837
1 783033 783186 + chr1:783034-783186:LOC643837
1 787306 787490 + chr1:787307-787490:LOC643837
1 788050 788146 + chr1:788051-788146:LOC643837
1 788770 788902 + chr1:788771-788902:LOC643837
1 788956 789740 + chr1:788957-789740:LOC643837
1 803452 804055 + chr1:803453-804055:FAM41C
1 809491 810535 + chr1:809492-810535:FAM41C
1 812125 812182 + chr1:812126-812182:FAM41C
1 852952 853100 + chr1:852953-853100:FLJ39609
1 853401 853555 + chr1:853402-853555:FLJ39609
1 854204 854295 + chr1:854205-854295:FLJ39609
1 854714 854817 + chr1:854715-854817:FLJ39609
1 861120 861180 + chr1:861121-861180:SAMD11
1 861301 861393 + chr1:861302-861393:SAMD11
1 865534 865716 + chr1:865535-865716:SAMD11
1 866418 866469 + chr1:866419-866469:SAMD11
1 871151 871276 + chr1:871152-871276:SAMD11
1 874419 874509 + chr1:874420-874509:SAMD11
1 874654 874840 + chr1:874655-874840:SAMD11
1 876523 876686 + chr1:876524-876686:SAMD11
1 877515 877631 + chr1:877516-877631:SAMD11
1 877789 877868 + chr1:877790-877868:SAMD11
1 877938 878438 + chr1:877939-878438:SAMD11
1 878632 878757 + chr1:878633-878757:SAMD11
1 879077 879188 + chr1:879078-879188:SAMD11
1 879287 879583 + chr1:879288-879583:SAMD11
1 879961 880180 + chr1:879962-880180:NOC2L
1 880897 881033 + chr1:880898-881033:NOC2L
1 881552 881666 + chr1:881553-881666:NOC2L
1 881781 881925 + chr1:881782-881925:NOC2L
1 883510 883612 + chr1:883511-883612:NOC2L
1 883869 883983 + chr1:883870-883983:NOC2L
1 886506 886618 + chr1:886507-886618:NOC2L
1 887379 887519 + chr1:887380-887519:NOC2L
1 887791 887980 + chr1:887792-887980:NOC2L

Last edited by nexgengirl; 09-08-2012 at 09:27 AM.
nexgengirl is offline   Reply With Quote
Old 09-08-2012, 09:30 AM   #8
nexgengirl
Member
 
Location: Maryland

Join Date: Apr 2010
Posts: 31
Default

Since showing the file on here doesn't look so good I have also attached the first 130 lines as a file so you can see how it looks in the terminal.
Attached Files
File Type: zip Truseq_for_picard_hs_first130lines.bed.zip (1.5 KB, 47 views)
nexgengirl is offline   Reply With Quote
Old 10-04-2012, 11:33 PM   #9
stephwen
Junior Member
 
Location: Liege, Belgium

Join Date: Jun 2011
Posts: 4
Default

I had the same problem, and I came across this small 2-liner (by a colleague) which worked for me:

Code:
samtools view -H input.bam > TruSeq-for-Picard.bed
gawk 'BEGIN {  OFS="\t"} {print $1,$2,$3,$6,$4 }' TruSeq-Exome-Targeted-Regions.bed >> TruSeq-for-Picard.bed
where TruSeq-Exome-Targeted-Regions.bed is the bed file downloaded off the Illumina website.
stephwen is offline   Reply With Quote
Old 03-12-2013, 03:28 PM   #10
Elsie
Member
 
Location: Australia

Join Date: Mar 2011
Posts: 85
Default

This two-liner is extremely helpful, many thanks for that.
Elsie is offline   Reply With Quote
Old 09-10-2013, 03:29 AM   #11
mducar
Junior Member
 
Location: Boston, MA

Join Date: Jan 2011
Posts: 2
Default

Converting from BED to picard formats is a bit more complicated than the two-liner that got posted.

The BED format specification states that BED files are first-base-0 and the interval is exclusive of the last base:
http://genome.ucsc.edu/FAQ/FAQformat.html#format1

Where the Picard interval list is first-base-1 and last base inclusive:
http://picard.sourceforge.net/javado...ervalList.html

So a region defined in a BED file as:
1 14362 14829

Needs to become the following in a Picard interval list:
1 14363 14829
mducar is offline   Reply With Quote
Old 01-24-2015, 06:57 AM   #12
manducasexta
Member
 
Location: San Francisco

Join Date: Mar 2009
Posts: 12
Default

like mducar (no relation) said, the positions need to be adjusted for the different numbering schemes, so change $2 to $2+1 in your awk line. also, if your bed file has a "track" line, omit that for your intervals file. the revised two-liner would be

samtools view -H my.bam > my.1based.intervals

gawk 'BEGIN { OFS="\t"} {print $1,$2+1,$3,$6,$4 }' my.bed | grep -v ^track >> my.1based.intervals

to verify the results:
head my.1based.intervals
cat my.1based.intervals | grep -v ^@ | head
head my.bed
manducasexta is offline   Reply With Quote
Reply

Tags
exome, hybrid, picard

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:23 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO