Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Beginner question about Picard AddOrReplaceReadGroups

    Hi,
    I am just starting to learn how to use novocraft and GATK, and I reached a point during my analysis where GATK said my file was not accepted without read groups. I searched the forum and people have suggested to use AddOrReplaceReadGroups from picard. I was looking at the picard source page and I am not sure of some the required options. I am trying to understand as much I can about what I am putting through the various software. But I don't understand some of the options.
    Can some one help explain what the RGPU options is ( Read Group platform unit (eg. run barcode) Required). Would I get this RGPU from the raw data? or is this something that comes with the documentation for the sequencer?
    The exome I am analyzing was sequenced by illumina.
    Any help would be very appreciated. Thanks
    Not sure if this is useful, but here is the first few lines of my data before alignment:
    @HWI-ST132_0459:8:1101:1113:1946#GGCTAC/1
    ATTAGAAAAGTAGATTCACATGGTTTTCCACATGTTAGAGGAATTGATAGAATTCTATTTGAACAAAGGACAGTGTTTAC
    AAATAATAGCAATGCCATAT
    +HWI-ST132_0459:8:1101:1113:1946#GGCTAC/1
    ffffffafd^deeeeaaT`adddddac\Vceeeeefff`fbeee`]KK][c_bc^dad_d]`ddZLKYYUcccY^c_ac_
    b]WabY_]__NZ[[ZcccUc
    @HWI-ST132_0459:8:1101:1247:1955#GGCTAC/1
    TAAATAATTTAAATTTCTGATCATAGCCTATTTTTGATATCACAAGGATGACGTCTTGATCTGATAGGAAGGATAAGATA
    ACAAGAGGGCCTAGACTAGT
    +HWI-ST132_0459:8:1101:1247:1955#GGCTAC/1
    gfgfggggggggggggggggggggfggggdgfegggefggggfgegeegegggcggggcgggecggg\eeeecdeegfee
    ggeggaaad^e^_acdaYd^
    @HWI-ST132_0459:8:1101:1059:1956#GGCTAC/1
    AGTAATGACTTAAATAGACATTCTAATGTGGTGCAAAGCTCACGACTCAATATTGAGTACAAAAAAAAAGCAAGTTGTAT
    GTGTTAGCCCATTCTCACAC
    +HWI-ST132_0459:8:1101:1059:1956#GGCTAC/1
    gggfggfgggggggggg_gegagdggegeeeegggadefeggegdedggaeaeecddZeeebdccgegg_edTb[eeaee
    eeeeedebeeeYbcbf]ccf
    @HWI-ST132_0459:8:1101:1227:1985#GGCTAC/1
    TGAATGACTTTGAGATATGGTGTTGGCACTGAATTAAGACAGGAGAAGACTACTGGTGATCTAAAAGGAAATAGTGTTAT
    AGTAGTAAAGAAGGAATCCA
    +HWI-ST132_0459:8:1101:1227:1985#GGCTAC/1
    ggggggggggggegggegggegegggggffggfgfgggggggfdgaegdggfgggecggeeeg_edfecedaedbfff`f
    egggfeaefeggffcgdgfc
    @HWI-ST132_0459:8:1101:1070:1988#GGCTAC/1
    cmyers@cgscluster:~/musa/Eurodata$ clear
    cmyers@cgscluster:~/musa/Eurodata$ more pt170.fq
    @HWI-ST132_0459:8:1101:1113:1946#GGCTAC/1
    ATTAGAAAAGTAGATTCACATGGTTTTCCACATGTTAGAGGAATTGATAGAATTCTATTTGAACAAAGGACAGTGTTTAC
    AAATAATAGCAATGCCATAT
    +HWI-ST132_0459:8:1101:1113:1946#GGCTAC/1
    ffffffafd^deeeeaaT`adddddac\Vceeeeefff`fbeee`]KK][c_bc^dad_d]`ddZLKYYUcccY^c_ac_
    b]WabY_]__NZ[[ZcccUc
    @HWI-ST132_0459:8:1101:1247:1955#GGCTAC/1
    TAAATAATTTAAATTTCTGATCATAGCCTATTTTTGATATCACAAGGATGACGTCTTGATCTGATAGGAAGGATAAGATA
    ACAAGAGGGCCTAGACTAGT
    +HWI-ST132_0459:8:1101:1247:1955#GGCTAC/1
    gfgfggggggggggggggggggggfggggdgfegggefggggfgegeegegggcggggcgggecggg\eeeecdeegfee
    ggeggaaad^e^_acdaYd^
    @HWI-ST132_0459:8:1101:1059:1956#GGCTAC/1
    AGTAATGACTTAAATAGACATTCTAATGTGGTGCAAAGCTCACGACTCAATATTGAGTACAAAAAAAAAGCAAGTTGTAT
    GTGTTAGCCCATTCTCACAC
    +HWI-ST132_0459:8:1101:1059:1956#GGCTAC/1
    gggfggfgggggggggg_gegagdggegeeeegggadefeggegdedggaeaeecddZeeebdccgegg_edTb[eeaee
    eeeeedebeeeYbcbf]ccf
    @HWI-ST132_0459:8:1101:1227:1985#GGCTAC/1
    TGAATGACTTTGAGATATGGTGTTGGCACTGAATTAAGACAGGAGAAGACTACTGGTGATCTAAAAGGAAATAGTGTTAT
    AGTAGTAAAGAAGGAATCCA

  • #2
    The read group can be found in your sam/bam file. This file is generated by a mapping program (in your case it is novoalign). If you have multiplexed data, then you need to separate the results in your sam/bam file with read group. However if you do not have multiplexed data, does not matter what kind of strings do you add to AddOrReplaceReadGroups.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      Yesterday, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    59 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    57 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    47 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X