Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • picard mark duplicates

    Hi,

    I'm using Picard to mark duplicates. This has worked fine for me previously on aglient50mb exomes using hg18. I've now updated to aglient51mb version4 and hg19, which also works fine.

    However, when I try to run the aglient50mb exomes aligned with novoalign on hg19. I get the output below. What does the Unknown Library line mean? ( I usually only get one "Library" line) and why do I not get a histogram??

    thanks for any help,
    Jane

    ## net.sf.picard.metrics.StringHeader
    # net.sf.picard.sam.MarkDuplicates INPUT=S1_sorted.bam OUTPUT=S1_novoalign.bam METRICS_FILE=S1_metrics.out TMP_DIR=tmp2 VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=2000000 REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
    ## net.sf.picard.metrics.StringHeader
    # Started on: Tue Oct 09 19:07:47 GMT 2012

    ## METRICS CLASS net.sf.picard.sam.DuplicationMetrics
    LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
    Unknown Library 560890 13587410 3485484 118573 1075343 0 0.081817 81249772
    Library 663607 14277689 3709621 149188 1179918 0 0.08587 81556300

  • #2
    Originally posted by jgSoton View Post
    Hi,

    I'm using Picard to mark duplicates. This has worked fine for me previously on aglient50mb exomes using hg18. I've now updated to aglient51mb version4 and hg19, which also works fine.

    However, when I try to run the aglient50mb exomes aligned with novoalign on hg19. I get the output below. What does the Unknown Library line mean? ( I usually only get one "Library" line) and why do I not get a histogram??

    thanks for any help,
    Jane

    ## net.sf.picard.metrics.StringHeader
    # net.sf.picard.sam.MarkDuplicates INPUT=S1_sorted.bam OUTPUT=S1_novoalign.bam METRICS_FILE=S1_metrics.out TMP_DIR=tmp2 VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=2000000 REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
    ## net.sf.picard.metrics.StringHeader
    # Started on: Tue Oct 09 19:07:47 GMT 2012

    ## METRICS CLASS net.sf.picard.sam.DuplicationMetrics
    LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
    Unknown Library 560890 13587410 3485484 118573 1075343 0 0.081817 81249772
    Library 663607 14277689 3709621 149188 1179918 0 0.08587 81556300
    Hi Guys,

    Same issue with me when running a bam file resulting from merging 3 different samples.
    Any ideas?
    Thanks in advance.

    Cheers,

    Fernando

    Comment


    • #3
      Hi Jane,

      As you could tell from my previous post, I had the same issue. I found a putative solution to your problem.
      I understand you have aligned your files using novoalign and I do not know if it creates the same problem as bowtie.
      As I said, I aligned my files using bowtie1 which I thought it added correctly read metadata such as library, platform and sample information. It looks OK if you check the RGs using samtools view -H yourbam.file. However if you check the group read by read by locating the Z tag in your bam - e. g., samtools view yourbam.file | less you will not be able to grab it.

      I solved this by replancing/adding the reads metadata using Picard's AddOrReplaceReadGroups (http://picard.sourceforge.net/comman...laceReadGroups).

      Please let me know if you need more help and if this solves your problem.

      Cheers,

      Fernando
      Last edited by fjrossello; 11-06-2012, 12:37 AM. Reason: typo/added info

      Comment


      • #4
        Thanks Fernando,

        I just managed to work around this problem myself yesterday. It does seem to be a result of merging bam files in samtools and not being able to keep the readgroup info. the same for all reads.

        I have used the -r option in samtools merge (without specifiying a text file with the readgroups in). This seems to give me my metrics output from picard but I'm not sure what I'm doing to the readgroups?! Since I am not using GATK it doesn't matter to me too much, samtools mpileup still seems to give the correct sampleID in the *.vcf file.

        I think the picard option of add/replace readgroups would be a better solution. Thanks for your response.

        Jane

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X