Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Premature EOF using Picard

    Hi,

    We recently received sequencing data in the form of BAM files that are ~5.5x larger than what we normally deal with after upgrading our sequencer. I have had to submit Picard commands like MarkDuplicates, AddOrReplaceReadGroups etc. by submitting jobs in my terminal. Since doing this, I have been receiving premature EOF errors. As for a suggestion I found a while ago, I used the command $ tail problem.bam | hexdump -C to view the EOF marker and found that it was present although not at the end of the file where it normally has been when I have run earlier data sets. I have pasted what I see here:

    000005f0 f8 f6 25 bc 3a 51 28 f1 64 6a 98 38 25 f9 1f 3c |..%.:Q(.dj.8%..<|
    00000600 29 71 3b 11 61 00 00 1f 8b 08 04 00 00 00 00 00 |)q;.a...........|
    00000610 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 |...BC...........|
    00000620 00 00 00 5b 57 65 64 20 4e 6f 76 20 30 37 20 31 |...[Wed Nov 07 1|
    00000630 36 3a 31 31 3a 32 38 20 45 53 54 20 32 30 31 32 |6:11:28 EST 2012|
    00000640 5d 20 6e 65 74 2e 73 66 2e 70 69 63 61 72 64 2e |] net.sf.picard.|
    00000650 73 61 6d 2e 41 64 64 4f 72 52 65 70 6c 61 63 65 |sam.AddOrReplace|
    00000660 52 65 61 64 47 72 6f 75 70 73 20 64 6f 6e 65 2e |ReadGroups done.|
    00000670 20 45 6c 61 70 73 65 64 20 74 69 6d 65 3a 20 32 | Elapsed time: 2|
    00000680 36 2e 30 36 20 6d 69 6e 75 74 65 73 2e 0a 52 75 |6.06 minutes..Ru|
    00000690 6e 74 69 6d 65 2e 74 6f 74 61 6c 4d 65 6d 6f 72 |ntime.totalMemor|
    000006a0 79 28 29 3d 35 35 39 32 31 38 36 38 38 0a |y()=559218688.|
    000006ae

    I have bolded the 28 byte EOF marker. It should be at the end of the file but it is not. I am wondering if the information being added to the end of the file is the summary that Picard writes that got put to the file when I ran this command as a job. Does anyone have an idea of what is going on? Thank you so much! Here is the command I am using.

    /opt/sharcnet/sq-tm/2.4/bin/sqsub -o sorted-lane1.marked.bam --memperproc=20G -r 7d \
    > java -jar ./MarkDuplicates.jar INPUT=sorted-lane2.bam METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT
    Last edited by biochemMScstudent; 11-09-2012, 07:06 AM. Reason: typo

  • #2
    Somehow some debug information has ended up at the end of your BAM file. One possible cause if the debugging was written to stdout (which was piped to the BAM file) instead of stderr. Or, it could just be a Picard bug where the debugging has wrongly been written to the output file. But I am just guessing here.

    I'd ask the Picard developers about this if I were you...

    Comment


    • #3
      OK thanks very much. I was able to run the command without submitting a job so I did get the data I need, but in the future it my files are even larger I may have no choice but to submit Picard commands using jobs and not through command line so I will look into notifying Picard developers about this. Thanks again!

      Comment


      • #4
        Hmm. Perhaps the qsub did something unexpected with the stdout/stderr pipes. You could try a simple wrapper script (qsub the shell script, the shell script calls Picard). That may solve it.
        Last edited by maubp; 11-12-2012, 07:02 AM. Reason: markup

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Working...
        X