Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK DepthofCoverage Error

    Im trying to use GATK DepthofCoverage, but Im getting an error which I do not understand how to resolve.

    My command is:
    Code:
    java -Xmx4g -jar ~/GenomeAnalysisTK-1.3/GenomeAnalysisTK.jar \
    -T DepthOfCoverage \
    -R ~/b37_genomes/human_g1k_v37.fasta \
    -I PE.merged.sorted.recal.bam \
    -L ~/BED/exons.b37.bed \
    -o test.GATK.out
    But then I get:

    ##### ERROR MESSAGE: File associated with name ~/BED/exons.b37.bed is malformed: Interval file could not be parsed in any supported format. caused by BED files must be parsed through Tribble; parsing them as intervals through the GATK engine is no longer supported
    ##### ERROR ------------------------------------------------------------------------------------------

    The GATK wiki does not have many examples of what can be used. This is Whole Exome sequencing data, so naturally I want to determine the coverage over all exon intervals.

    I have no clue what this Tribble thing is, and how can they not support BED format....

    Thanks.

  • #2
    Could you post the header lines of your file and then the first few intervals as well?

    Comment


    • #3
      Wow. Sorry I missed the reply to this thread. Thank for responding.

      Here are the first few headers:
      Code:
      @HD	VN:1.0	GO:none	SO:coordinate
      @SQ	SN:1	LN:249250621
      @SQ	SN:2	LN:243199373
      @SQ	SN:3	LN:198022430
      @SQ	SN:4	LN:191154276
      @SQ	SN:5	LN:180915260
      @SQ	SN:6	LN:171115067
      @SQ	SN:7	LN:159138663
      @SQ	SN:8	LN:146364022
      @SQ	SN:9	LN:141213431
      Is this sufficient, or did you want the alignments as well? Thanks again for the help, this is still an issue for me.

      Comment


      • #4
        It indicates the bed file is the problem: could you post maybe the first 50 lines of that?

        Comment


        • #5
          Code:
          1	14467	14587
          1	14639	14883
          1	14943	15064
          1	15671	15990
          1	16591	16719
          1	16750	17074
          1	17178	17420
          1	17443	18108
          1	18203	18448
          1	19049	19170
          1	20603	20723
          1	24448	24915
          1	29267	29389
          1	30275	30431
          1	35095	35215
          1	35245	35366
          1	35668	35788
          1	62983	63703
          1	69069	70029
          1	112710	112830
          1	120754	120966
          1	129018	129259
          1	133356	133597
          1	135235	135355
          1	135688	136176
          1	137366	137726
          1	173754	173874
          1	228233	228711
          1	259027	259147
          1	267075	267287
          1	326408	326768
          1	327177	327665
          1	327998	328118
          1	329752	329993
          1	334092	334333
          1	342357	342569
          1	350498	350618
          1	367647	368608
          1	470971	471330
          1	621084	622045
          1	639075	639195
          1	647124	647336
          1	655375	655616
          1	659720	659961
          1	661601	661721
          1	662054	662542
          1	662854	663214
          1	709552	709672
          1	717360	717480
          1	721338	721640
          It is b37 format, but I have been consistent in its use. This is not my design file bed, but rather "all" exons. I wouldnt think this is the issue though. If there is no overlap and shouldnt it just report zero coverage?

          Comment


          • #6
            So I use the "1.0.2885" version of GATK for DepthOfCoverage and I use interval files that look like this:

            Code:
            @HD     VN:1.0  SO:unsorted
            @SQ     SN:chr1 LN:249250621    UR:file:hg19.fa       M5:1b22b98cdeb4a9304cb5d48026a85128
            @SQ     SN:chr2 LN:243199373    UR:file:hg19.fa       M5:a0d9851da00400dec1098a9255ac712e
            @SQ     SN:chr3 LN:198022430    UR:file:hg19.fa       M5:641e4338fa8d52a5b781bd2a2c08d3c3
            @SQ     SN:chr4 LN:191154276    UR:file:hg19.fa       M5:23dccd106897542ad87d2765d28a19a1
            @SQ     SN:chr5 LN:180915260    UR:file:hg19.fa       M5:0740173db9ffd264d728f32784845cd7
            @SQ     SN:chr6 LN:171115067    UR:file:hg19.fa       M5:1d3a93a248d92a729ee764823acbbc6b
            @SQ     SN:chr7 LN:159138663    UR:file:hg19.fa       M5:618366e953d6aaad97dbe4777c29375e
            @SQ     SN:chr8 LN:146364022    UR:file:hg19.fa       M5:96f514a9929e410c6651697bded59aec
            @SQ     SN:chr9 LN:141213431    UR:file:hg19.fa       M5:3e273117f15e0a400f01055d9f393768
            @SQ     SN:chr10        LN:135534747    UR:file:hg19.fa       M5:988c28e000e84c26d552359af1ea2e1d
            @SQ     SN:chr11        LN:135006516    UR:file:hg19.fa       M5:98c59049a2df285c76ffb1c6db8f8b96
            @SQ     SN:chr12        LN:133851895    UR:file:hg19.fa       M5:51851ac0e1a115847ad36449b0015864
            @SQ     SN:chr13        LN:115169878    UR:file:hg19.fa       M5:283f8d7892baa81b510a015719ca7b0b
            @SQ     SN:chr14        LN:107349540    UR:file:hg19.fa       M5:98f3cae32b2a2e9524bc19813927542e
            @SQ     SN:chr15        LN:102531392    UR:file:hg19.fa       M5:e5645a794a8238215b2cd77acb95a078
            @SQ     SN:chr16        LN:90354753     UR:file:hg19.fa       M5:fc9b1a7b42b97a864f56b348b06095e6
            @SQ     SN:chr17        LN:81195210     UR:file:hg19.fa       M5:351f64d4f4f9ddd45b35336ad97aa6de
            @SQ     SN:chr18        LN:78077248     UR:file:hg19.fa       M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
            @SQ     SN:chr19        LN:59128983     UR:file:hg19.fa       M5:1aacd71f30db8e561810913e0b72636d
            @SQ     SN:chr20        LN:63025520     UR:file:hg19.fa       M5:0dec9660ec1efaaf33281c0d5ea2560f
            @SQ     SN:chr21        LN:48129895     UR:file:hg19.fa       M5:2979a6085bfe28e3ad6f552f361ed74d
            @SQ     SN:chr22        LN:51304566     UR:file:hg19.fa       M5:a718acaa6135fdca8357d5bfe94211dd
            @SQ     SN:chrM LN:16571        UR:file:hg19.fa       M5:d2ed829b8a1628d16cbeee88e88e39eb
            @SQ     SN:chrX LN:155270560    UR:file:hg19.fa       M5:7e0e2e580297b7764e31dbc80c2540dd
            @SQ     SN:chrY LN:59373566     UR:file:hg19.fa       M5:1e86411d73e6f00a10590f976be01623
            chr1    2985721 2985854 +       target_1
            chr1    3102668 3103058 +       target_2
            chr1    3160630 3160721 +       target_3
            with the header created as described here: http://www.broadinstitute.org/gsa/wi...ference_genome

            Maybe try something like that?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            50 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            44 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X