Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • picard mark duplicates

    Hi,

    I'm using Picard to mark duplicates. This has worked fine for me previously on aglient50mb exomes using hg18. I've now updated to aglient51mb version4 and hg19, which also works fine.

    However, when I try to run the aglient50mb exomes aligned with novoalign on hg19. I get the output below. What does the Unknown Library line mean? ( I usually only get one "Library" line) and why do I not get a histogram??

    thanks for any help,
    Jane

    ## net.sf.picard.metrics.StringHeader
    # net.sf.picard.sam.MarkDuplicates INPUT=S1_sorted.bam OUTPUT=S1_novoalign.bam METRICS_FILE=S1_metrics.out TMP_DIR=tmp2 VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=2000000 REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
    ## net.sf.picard.metrics.StringHeader
    # Started on: Tue Oct 09 19:07:47 GMT 2012

    ## METRICS CLASS net.sf.picard.sam.DuplicationMetrics
    LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
    Unknown Library 560890 13587410 3485484 118573 1075343 0 0.081817 81249772
    Library 663607 14277689 3709621 149188 1179918 0 0.08587 81556300

  • #2
    Originally posted by jgSoton View Post
    Hi,

    I'm using Picard to mark duplicates. This has worked fine for me previously on aglient50mb exomes using hg18. I've now updated to aglient51mb version4 and hg19, which also works fine.

    However, when I try to run the aglient50mb exomes aligned with novoalign on hg19. I get the output below. What does the Unknown Library line mean? ( I usually only get one "Library" line) and why do I not get a histogram??

    thanks for any help,
    Jane

    ## net.sf.picard.metrics.StringHeader
    # net.sf.picard.sam.MarkDuplicates INPUT=S1_sorted.bam OUTPUT=S1_novoalign.bam METRICS_FILE=S1_metrics.out TMP_DIR=tmp2 VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=2000000 REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
    ## net.sf.picard.metrics.StringHeader
    # Started on: Tue Oct 09 19:07:47 GMT 2012

    ## METRICS CLASS net.sf.picard.sam.DuplicationMetrics
    LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
    Unknown Library 560890 13587410 3485484 118573 1075343 0 0.081817 81249772
    Library 663607 14277689 3709621 149188 1179918 0 0.08587 81556300
    Hi Guys,

    Same issue with me when running a bam file resulting from merging 3 different samples.
    Any ideas?
    Thanks in advance.

    Cheers,

    Fernando

    Comment


    • #3
      Hi Jane,

      As you could tell from my previous post, I had the same issue. I found a putative solution to your problem.
      I understand you have aligned your files using novoalign and I do not know if it creates the same problem as bowtie.
      As I said, I aligned my files using bowtie1 which I thought it added correctly read metadata such as library, platform and sample information. It looks OK if you check the RGs using samtools view -H yourbam.file. However if you check the group read by read by locating the Z tag in your bam - e. g., samtools view yourbam.file | less you will not be able to grab it.

      I solved this by replancing/adding the reads metadata using Picard's AddOrReplaceReadGroups (http://picard.sourceforge.net/comman...laceReadGroups).

      Please let me know if you need more help and if this solves your problem.

      Cheers,

      Fernando
      Last edited by fjrossello; 11-06-2012, 12:37 AM. Reason: typo/added info

      Comment


      • #4
        Thanks Fernando,

        I just managed to work around this problem myself yesterday. It does seem to be a result of merging bam files in samtools and not being able to keep the readgroup info. the same for all reads.

        I have used the -r option in samtools merge (without specifiying a text file with the readgroups in). This seems to give me my metrics output from picard but I'm not sure what I'm doing to the readgroups?! Since I am not using GATK it doesn't matter to me too much, samtools mpileup still seems to give the correct sampleID in the *.vcf file.

        I think the picard option of add/replace readgroups would be a better solution. Thanks for your response.

        Jane

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X