Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Read trimming and Picard

    Hi,


    Does anyone have a recommended read-trimming software that works with color-space data?


    Also, I'm not trying to-repost but I'm getting some odd-errors and the help-email list for SamTools seems dead. What is the source of this error:


    INFO 2010-10-29 09:39:34 MarkDuplicates Read 46000000 records. Tracking 687328 as yet unmatched pairs. 46728 records in RAM. Last sequence index: 9
    INFO 2010-10-29 09:39:45 MarkDuplicates Read 47000000 records. Tracking 686480 as yet unmatched pairs. 32624 records in RAM. Last sequence index: 9
    INFO 2010-10-29 09:40:08 MarkDuplicates Read 48000000 records. Tracking 684660 as yet unmatched pairs. 17477 records in RAM. Last sequence index: 9
    INFO 2010-10-29 09:40:18 MarkDuplicates Read 49000000 records. Tracking 682311 as yet unmatched pairs. 479 records in RAM. Last sequence index: 9
    [Fri Oct 29 09:40:37 CDT 2010] net.sf.picard.sam.MarkDuplicates done.
    Runtime.totalMemory()=772931584
    Exception in thread "main" net.sf.picard.PicardException: Exception writing ReadEnds to file.
    at net.sf.picard.sam.ReadEndsCodec.encode(ReadEndsCodec.java:74)
    at net.sf.picard.sam.ReadEndsCodec.encode(ReadEndsCodec.java:32)
    at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:185)
    at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:140)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:269)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:109)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:93)
    Caused by: java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:260)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at java.io.DataOutputStream.flush(DataOutputStream.java:106)
    at net.sf.picard.sam.ReadEndsCodec.encode(ReadEndsCodec.java:71)
    ... 7 more


    I can't seem to find any documentation on it and nobody answered my last post.

    Finally, I've been reading on some previous seq-answers posts and I wanted to see if anyone can clarify that samtools removes duplicates based on start/stop alone and doesn't consider identical sequences. Are you sure?

  • #2
    It seems you run out of space?

    Code:
    ...
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicat es.java:93)
    Caused by: java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    ...
    Regarding samtools, look at this thread with comments from the author.

    It does not considers the sequence. Also, take a look to the mathematical models implemented in samtools. Entries 1.1 and 1.2 detail changes of getting duplicates at library and mapping level.
    -drd

    Comment


    • #3
      Originally posted by drio View Post
      It seems you run out of space?

      Code:
      ...
      at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicat es.java:93)
      Caused by: java.io.IOException: No space left on device
      at java.io.FileOutputStream.writeBytes(Native Method)
      ...
      Regarding samtools, look at this thread with comments from the author.

      It does not considers the sequence. Also, take a look to the mathematical models implemented in samtools. Entries 1.1 and 1.2 detail changes of getting duplicates at library and mapping level.

      Thanks for the reply, Drio. I run out of space, but I also set the MAX* param for a much higher value with the same end result. Still get the error...

      Comment


      • #4
        Originally posted by JohnK View Post
        Thanks for the reply, Drio. I run out of space, but I also set the MAX* param for a much higher value with the same end result. Still get the error...
        Why do you expect the setting the MAX* param would eliminate the "running out of space" error? Now if you said "I ran out and put a new 10 TB Raid-5 disk on my system and slapped on an extra 256 GB of memory with the same end result" then I would be concerned.

        More seriously, it is possible that -- assuming you are on running on a *nix based system -- that the program is set to saving temporary files in '/tmp'. On many system '/tmp' is actually memory instead of disk. Thus it is possible to run of out of "disk space" even though you have lots of disk space.

        Or you may simply be out of disk space. How much do you have free?

        Comment


        • #5
          Originally posted by westerman View Post
          Why do you expect the setting the MAX* param would eliminate the "running out of space" error? Now if you said "I ran out and put a new 10 TB Raid-5 disk on my system and slapped on an extra 256 GB of memory with the same end result" then I would be concerned.

          More seriously, it is possible that -- assuming you are on running on a *nix based system -- that the program is set to saving temporary files in '/tmp'. On many system '/tmp' is actually memory instead of disk. Thus it is possible to run of out of "disk space" even though you have lots of disk space.

          Or you may simply be out of disk space. How much do you have free?
          It was a similar issue, but my sys-admin found it. One program was eating /tmp and putting it over the top.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 08:47 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X