Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Jane M View Post
    Not yet...
    I think I might have figured out at least part of your question with regard to the file recal_data.grp. If you look at the GATK methods and workflow page under the "Base Quality Score Recalibrator" section, it shows the recal_data.grp being used as part of the -BQSR parameter:

    Code:
    java -jar GenomeAnalysisTK.jar \
       -T PrintReads \
       -R reference.fasta \
       -I input.bam \
       -BQSR recalibration_report.grp \
       -o output.bam
    \

    Interesting thing is the documentation for the PrintReads program doesn't include the -BQSR parameter...

    Comment


    • Originally posted by Jane M View Post
      Thank you AJERYC!

      Because of some troubles with my version of dbSNP, I haven't managed to run:


      but I am still wondering if I should run the PrintReads step since I only have one bam file and if my recalibrated bam file will be the recal_data.grp file. Any idea?
      I'm not sure if we are running the same version of GATK. For Quality score recalibration I use the following instructions

      java -Xmx16G -jar gatk/GenomeAnalysisTK.jar -I input.marked.realigned.fixed.bam -R hg19/hg19.fa -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv -knownSites:dbsnp,VCF dbsnp135.hg19.vcf

      java -Xmx16G -jar gatk/GenomeAnalysisTK.jar \-l INFO \-R hg19.fa \-I input.marked.realigned.fixed.bam \-T TableRecalibration \--out input.marked.realigned.fixed.recal.bam \-recalFile input.recal_data.csv


      You can see I get 2 files, one is the bam file and the other one is the recal_data (that you get in the first instruction. Maybe you are missing the second instruction and that is why you dont get the bam file.

      Comment


      • Originally posted by AJERYC View Post
        I'm not sure if we are running the same version of GATK. For Quality score recalibration I use the following instructions

        java -Xmx16G -jar gatk/GenomeAnalysisTK.jar -I input.marked.realigned.fixed.bam -R hg19/hg19.fa -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv -knownSites:dbsnp,VCF dbsnp135.hg19.vcf

        java -Xmx16G -jar gatk/GenomeAnalysisTK.jar \-l INFO \-R hg19.fa \-I input.marked.realigned.fixed.bam \-T TableRecalibration \--out input.marked.realigned.fixed.recal.bam \-recalFile input.recal_data.csv


        You can see I get 2 files, one is the bam file and the other one is the recal_data (that you get in the first instruction. Maybe you are missing the second instruction and that is why you dont get the bam file.
        The point is that we are not using the same version. You probably have a version before v2.0 and have a version after 2.0. From this 2.0 version, CountCovariates and TableRecalibration do not exist anymore. That's a pity because the process was rather clear. The csv file generated at the CountCovariates step is then used at the TableRecalibration step...
        Last edited by Jane M; 09-07-2012, 02:05 AM.

        Comment


        • Originally posted by fongchun View Post
          I think I might have figured out at least part of your question with regard to the file recal_data.grp. If you look at the GATK methods and workflow page under the "Base Quality Score Recalibrator" section, it shows the recal_data.grp being used as part of the -BQSR parameter:

          Code:
          java -jar GenomeAnalysisTK.jar \
             -T PrintReads \
             -R reference.fasta \
             -I input.bam \
             -BQSR recalibration_report.grp \
             -o output.bam
          \

          Interesting thing is the documentation for the PrintReads program doesn't include the -BQSR parameter...
          Ah, interesting.. I only noticed this information about PrintReads (http://www.broadinstitute.org/gatk/g...ntReads.html):
          java -Xmx2g -jar GenomeAnalysisTK.jar \
          -R ref.fasta \
          -T PrintReads \
          -o output.bam \
          -I input1.bam \
          -I input2.bam \
          --read_filter MappingQualityZero
          I didn't check where you suggested me. And here it's much clearer:
          java -jar GenomeAnalysisTK.jar \
          -T PrintReads \
          -R reference.fasta \
          -I input.bam \
          -BQSR recalibration_report.grp \
          -o output.bam
          The grp file is used and there is an output bam file
          Thanks fongchun!

          Comment


          • GATK -dcov option???

            I have additional question to raonyguimaraes's post
            Does anyone know in details about GATK -dcov option in UnifiedGenotyper. I tried to look in GATK Manual but could not find much about it other than the following information:
            -dcov [50 for 4x, 200 for >30x WGS or Whole exome]
            in the link:


            Also if not specified what default value this option takes?

            If you anyone knows about it could you please send me the link to the information resource?

            Thanks in advance

            Comment


            • I am wondering if the step of variant quality score recalibration, after the variant calling is still in use. If I remember well, I read somewhere that it was no more performed. In addition, in the publications that I read recently, this step is not mentioned. Do you know why it has been abandoned?
              Or what was the interest in the first place to recalibrate the quality of the variant bases after the variant calling, since there was the quality score recalibration before variant calling ?

              Comment


              • Originally posted by Jane M View Post
                I am wondering if the step of variant quality score recalibration, after the variant calling is still in use. If I remember well, I read somewhere that it was no more performed. In addition, in the publications that I read recently, this step is not mentioned. Do you know why it has been abandoned?
                Any suggestion?

                Comment


                • Originally posted by rahilsethi View Post
                  I have additional question to raonyguimaraes's post
                  Does anyone know in details about GATK -dcov option in UnifiedGenotyper. I tried to look in GATK Manual but could not find much about it other than the following information:
                  -dcov [50 for 4x, 200 for >30x WGS or Whole exome]
                  in the link:


                  Also if not specified what default value this option takes?

                  If you anyone knows about it could you please send me the link to the information resource?

                  Thanks in advance
                  We had some discussion on this in the GATK forum here. Maybe that is of interest to you.

                  cheers,
                  Sophia

                  Comment


                  • Concerning the sam to bam conversion and suppression of PCR duplicates steps, are there any reason to prefer Picard to samtools?
                    I tried SortSam from Picard and it seems to take much more time than samtools view + samtools sort.
                    I think I will use samtools, but I would like to know if there are advantages when using Picard.
                    Thank you

                    Comment


                    • I get the following error when using GATK to perform local realignment around indels.

                      Anyone an idea what went wrong?

                      Code:
                      E:\EXOME DATA ANALYSIS\1 Unzipped fastq>java -jar GenomeAnalysisTK.jar -T Realig
                      nerTargetCreator -R hg19.fa -o Ot2363.bam.list -I Ot2363.marked.bam
                      INFO  13:45:07,701 HelpFormatter - ---------------------------------------------
                      -----------------------------------
                      INFO  13:45:07,710 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-9-gb9
                      0951c, Compiled 2012/09/19 21:18:53
                      INFO  13:45:07,710 HelpFormatter - Copyright (c) 2010 The Broad Institute
                      INFO  13:45:07,710 HelpFormatter - For support and documentation go to http://ww
                      w.broadinstitute.org/gatk
                      INFO  13:45:07,712 HelpFormatter - Program Args: -T RealignerTargetCreator -R hg
                      19.fa -o Ot2363.bam.list -I Ot2363.marked.bam
                      INFO  13:45:07,712 HelpFormatter - Date/Time: 2012/09/20 13:45:07
                      INFO  13:45:07,712 HelpFormatter - ---------------------------------------------
                      -----------------------------------
                      INFO  13:45:07,713 HelpFormatter - ---------------------------------------------
                      -----------------------------------
                      INFO  13:45:07,720 GenomeAnalysisEngine - Strictness is SILENT
                      INFO  13:45:07,723 ReferenceDataSource - Index file E:\EXOME DATA ANALYSIS\1 Unz
                      ipped fastq\hg19.fa.fai does not exist. Trying to create it now.
                      PROGRESS UPDATE: file is 15 percent complete
                      PROGRESS UPDATE: file is 28 percent complete
                      PROGRESS UPDATE: file is 39 percent complete
                      PROGRESS UPDATE: file is 54 percent complete
                      PROGRESS UPDATE: file is 67 percent complete
                      PROGRESS UPDATE: file is 77 percent complete
                      PROGRESS UPDATE: file is 89 percent complete
                      PROGRESS UPDATE: file is 99 percent complete
                      ##### ERROR --------------------------------------------------------------------
                      ----------------------
                      ##### ERROR A USER ERROR has occurred (version 2.1-9-gb90951c):
                      ##### ERROR The invalid arguments or inputs must be corrected before the GATK ca
                      n proceed
                      ##### ERROR Please do not post this error to the GATK forum
                      ##### ERROR
                      ##### ERROR See the documentation (rerun with -h) for this tool to view allowabl
                      e command-line arguments.
                      ##### ERROR Visit our website and forum for extensive documentation and answers
                      to
                      ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
                      ##### ERROR
                      ##### ERROR MESSAGE: Couldn't write file E:\EXOME DATA ANALYSIS\1 Unzipped fastq
                      \hg19.fa.fai because exception The process cannot access the file because anothe
                      r process has locked a portion of the file
                      ##### ERROR --------------------------------------------------------------------
                      ----------------------

                      Comment


                      • Originally posted by ddaneels View Post
                        I get the following error when using GATK to perform local realignment around indels.

                        Anyone an idea what went wrong?

                        Code:
                        E:\EXOME DATA ANALYSIS\1 Unzipped fastq>java -jar GenomeAnalysisTK.jar -T Realig
                        nerTargetCreator -R hg19.fa -o Ot2363.bam.list -I Ot2363.marked.bam
                        INFO  13:45:07,701 HelpFormatter - ---------------------------------------------
                        -----------------------------------
                        INFO  13:45:07,710 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-9-gb9
                        0951c, Compiled 2012/09/19 21:18:53
                        INFO  13:45:07,710 HelpFormatter - Copyright (c) 2010 The Broad Institute
                        INFO  13:45:07,710 HelpFormatter - For support and documentation go to http://ww
                        w.broadinstitute.org/gatk
                        INFO  13:45:07,712 HelpFormatter - Program Args: -T RealignerTargetCreator -R hg
                        19.fa -o Ot2363.bam.list -I Ot2363.marked.bam
                        INFO  13:45:07,712 HelpFormatter - Date/Time: 2012/09/20 13:45:07
                        INFO  13:45:07,712 HelpFormatter - ---------------------------------------------
                        -----------------------------------
                        INFO  13:45:07,713 HelpFormatter - ---------------------------------------------
                        -----------------------------------
                        INFO  13:45:07,720 GenomeAnalysisEngine - Strictness is SILENT
                        INFO  13:45:07,723 ReferenceDataSource - Index file E:\EXOME DATA ANALYSIS\1 Unz
                        ipped fastq\hg19.fa.fai does not exist. Trying to create it now.
                        PROGRESS UPDATE: file is 15 percent complete
                        PROGRESS UPDATE: file is 28 percent complete
                        PROGRESS UPDATE: file is 39 percent complete
                        PROGRESS UPDATE: file is 54 percent complete
                        PROGRESS UPDATE: file is 67 percent complete
                        PROGRESS UPDATE: file is 77 percent complete
                        PROGRESS UPDATE: file is 89 percent complete
                        PROGRESS UPDATE: file is 99 percent complete
                        ##### ERROR --------------------------------------------------------------------
                        ----------------------
                        ##### ERROR A USER ERROR has occurred (version 2.1-9-gb90951c):
                        ##### ERROR The invalid arguments or inputs must be corrected before the GATK ca
                        n proceed
                        ##### ERROR Please do not post this error to the GATK forum
                        ##### ERROR
                        ##### ERROR See the documentation (rerun with -h) for this tool to view allowabl
                        e command-line arguments.
                        ##### ERROR Visit our website and forum for extensive documentation and answers
                        to
                        ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
                        ##### ERROR
                        ##### ERROR MESSAGE: Couldn't write file E:\EXOME DATA ANALYSIS\1 Unzipped fastq
                        \hg19.fa.fai because exception The process cannot access the file because anothe
                        r process has locked a portion of the file
                        ##### ERROR --------------------------------------------------------------------
                        ----------------------
                        I think the error is here:
                        Couldn't write file E:\EXOME DATA ANALYSIS\1 Unzipped fastq
                        check up for Linux write permissions of the directory, harddisk space...

                        Comment


                        • Have you got any answer to this issue?

                          Have you got any answer to this issue?

                          Thanks!

                          Wen

                          Originally posted by blackgore View Post
                          In following the workflow mentioned above, I've come up against an error, and I'm wondering if I'm alone in this. Has anyone experienced difficulty with using CountCovariates tool, specifically with errors regarding accessing information from the input BAM file? I've tried this with several samples, but keep getting the same error, "Bad input: Could not find any usable data in the input BAM file(s)"

                          (for those interested, the BAM files in question are not empty, and work just fine with samtools view).



                          Code:
                          java -Xmx16g -jar /$Software/GenomeAnalysisTK-1.3-17-gc62082b/GenomeAnalysisTK.jar -T CountCovariates -R /$Genomes/Broad/Human/b37/human_g1k_v37.fasta -I $Projects/data/SampleA_bowtie.gatk.realign.bam -nt 8 -l INFO -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -log RECAL.log -recalFile RECAL.csv --knownSites $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
                          
                          INFO  14:01:25,870 HelpFormatter - ---------------------------------------------------------------------------------
                          INFO  14:01:25,875 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.3-17-gc62082b, Compiled 2011/11/18 15:24:46
                          INFO  14:01:25,875 HelpFormatter - Copyright (c) 2010 The Broad Institute
                          INFO  14:01:25,876 HelpFormatter - Please view our documentation at [url]http://www.broadinstitute.org/gsa/wiki[/url]
                          INFO  14:01:25,876 HelpFormatter - For support, please view our support site at [url]http://getsatisfaction.com/gsa[/url]
                          INFO  14:01:25,877 HelpFormatter - Program Args: -T CountCovariates -R /$Genomes/Broad/Human/b37/human_g1k_v37.fasta -I $Projects/data/SampleA_bowtie.gatk.realign.bam -nt 8 -l INFO -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -log RECAL.log -recalFile RECAL.csv --knownSites $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
                          INFO  14:01:25,878 HelpFormatter - Date/Time: 2011/11/24 14:01:25
                          INFO  14:01:25,878 HelpFormatter - ---------------------------------------------------------------------------------
                          INFO  14:01:25,878 HelpFormatter - ---------------------------------------------------------------------------------
                          INFO  14:01:26,052 RodBindingArgumentTypeDescriptor - Dynamically determined type of $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf to be VCF
                          INFO  14:01:26,064 GenomeAnalysisEngine - Strictness is SILENT
                          INFO  14:01:26,815 RMDTrackBuilder - Loading Tribble index from disk for file $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
                          INFO  14:01:30,532 MicroScheduler - Running the GATK in parallel mode with 8 concurrent threads
                          INFO  14:01:32,326 CountCovariatesWalker - The covariates being used here:
                          INFO  14:01:32,327 CountCovariatesWalker -      ReadGroupCovariate
                          INFO  14:01:32,327 CountCovariatesWalker -      QualityScoreCovariate
                          INFO  14:01:32,327 CountCovariatesWalker -      CycleCovariate
                          INFO  14:01:32,328 CountCovariatesWalker -      DinucCovariate
                          INFO  14:01:41,189 CountCovariatesWalker - Writing raw recalibration data...
                          INFO  14:01:44,145 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                          INFO  14:01:44,146 HttpMethodDirector - Retrying request
                          INFO  14:01:44,149 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                          INFO  14:01:44,149 HttpMethodDirector - Retrying request
                          INFO  14:01:44,152 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                          INFO  14:01:44,153 HttpMethodDirector - Retrying request
                          INFO  14:01:44,155 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                          INFO  14:01:44,155 HttpMethodDirector - Retrying request
                          INFO  14:01:44,158 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                          INFO  14:01:44,158 HttpMethodDirector - Retrying request
                          ##### ERROR ------------------------------------------------------------------------------------------
                          ##### ERROR A USER ERROR has occurred (version 1.3-17-gc62082b):
                          ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
                          ##### ERROR Please do not post this error to the GATK forum
                          ##### ERROR
                          ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
                          ##### ERROR Visit our wiki for extensive documentation [url]http://www.broadinstitute.org/gsa/wiki[/url]
                          ##### ERROR Visit our forum to view answers to commonly asked questions [url]http://getsatisfaction.com/gsa[/url]
                          ##### ERROR
                          ##### ERROR MESSAGE: Bad input: Could not find any usable data in the input BAM file(s).
                          ##### ERROR ------------------------------------------------------------------------------------------

                          Comment


                          • Hi you all and congratulations for this useful thread.

                            I am trying to reproduce the pipeline posted at the beginning of the post as an alternative way for SNP analysis.
                            I am not actually experienced in NGS but I have to deal with the results of exome analyses coming from MiSeq sequencer and I would like to improve (compare) the results obtained trhough the MiSeq machine (BWA, CASAVA...).
                            According to the tutorial posted by ulz_peter (thanks again) I have performed the initial reference genome indexing (hg19) with the last updated version of bwa (0.6.2) and I obtained 5 different files as a result

                            hg19.amb
                            hg19.ann
                            hg19.bwt
                            hg19.pac
                            hg19.sa

                            According to other threads (http://seqanswers.com/forums/showthread.php?t=20705) it seems that the expected number of resulting files is 8. May I continue with this five files or it should be better to work with an earlier version of bwa? just in order to be able to reproduce the pipeline here described.

                            On the other hand, and thinking on the next step in the pipe, according to the BWA alignment options suggested in the tutorial:

                            "the -I option tells BWA to use Illumina1.3+ qualities"

                            but if I am not misunderstood, Miseq fastq results are in Sanger format (Illumina 1.8+), so may I use the -I option or not?

                            I think I am asking for very basic things but you know, basic knowledge is crucial to understand complexity. So I´ll be grateful if anuone could help me. I promise to continue asking when I have a doubt.

                            Thanks in advance

                            Comment


                            • I think it's important add in manual and in the wiki to add that vcf file, hg19.fasta
                              are in GATK bundle to which it's possible to access with an ftp client:
                              GATK budle ftp with an ftp client
                              http://gatkforums.broadinstitute.org...lic-ftp-server


                              I think that it's an important step to add in wiki

                              Comment


                              • bwa index file of hg19

                                Hi,
                                As the index file of hg19 takes time, is it possible to download the built version from somewhere?

                                Thanks,

                                Carol,

                                Originally posted by ulz_peter View Post
                                Hi Folks,

                                As I was writing a short guide of Exome analysis in our Institute, I thought it might be of some use to others especially for newbies, who need some kind of starting point to get to analysis of exome data (pretty much like the RNA-seq manual I once read in an older thread...). Instead of explaining everything in 100 new threads one could then point to that manual...

                                It is the way we do exome analysis at our Institute, but I would be happy if people help improve the manual, add their knowledge and expand it, like a common knowledge base for exome-level analysis.

                                I attached the pdf version and a .doc version within a zip folder, as the filesize was too large for uploading the doc file alone.

                                The most updated version can be found in the SeqWiki (http://seqanswers.com/wiki/How-to/exome_analysis)
                                (just to make it clear, it is not regularly updated and it's only goal is to get people started on the use of tools often used in exome sequencing)

                                Any comments highly appreciated!

                                P.S. I added a (very) short visualization chapter

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 08:47 AM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                54 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X