Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    I am not affiliated with VAAST in any way, but I have used it extensively and absolutely love it. I don't want to derail this thread in any way but I can certainly answer questions about it.

    Comment


    • #32
      Originally posted by mirabilia View Post
      thanks a lot ulz_peter!
      Could you please clarify which steps of your pipeline are specifically for diploid genomes in order I can customize for my purposes?
      I didn't find it yet, but there was a statement on the GATK homepage that the options descirbed there (which are basically pretty mcuh the same as I use) only work for diploid genomes and expected shifts of allele frequency must be adressed. So the question is: what are you planning to do: find rare alleles within some strains, sequence a genetically homogeneous strain...

      Comment


      • #33
        Problem with SNP-calling

        Following ulz_peter's original doc, I have some problem when doing the SNP-calling.

        java -Xmx4g -jar /path/GenomeAnalysisTK-1.1-35-ge253f6f/GenomeAnalysisTK.jar \
        -glm BOTH \
        -R hg18.fa \
        -T UnifiedGenotyper \
        -I myinput.marked.realigned.fixed.recal.bam \
        -D dbsnp132_hg18.txt \
        -o myoutput.snps.vcf \
        -metrics snps.metrics \
        -stand_call_conf 50.0 \
        -stand_emit_conf 10.0 \
        -dcov 1000 \
        -A DepthOfCoverage \
        -A AlleleBalance \
        -L hg18_exonIntervals.bed

        This "-L" option does not work.
        I got the hg18_exonIntervals.bed from UCSC as ulz_peter's original doc shows.
        I run the SNP-calling without the "-L" line.
        Then the variant quality score recalibration step does not work, generating an empty output.tranches file.

        Can somebody help me out? Thanks a lot.
        Last edited by liu_xt005; 10-17-2011, 10:19 AM.

        Comment


        • #34
          What is the error message when you specify the -L argument?

          I actually stopped using the Variant quality Score recalibration as it often did not work out for me (I never work on more than 2 exomes at a time).

          I out the version withouth the recalibration on the SEQanswers Wiki/How-To section. You may have a look there, as I will update that in the future and stop uploading newer versions of the PDF file...

          Comment


          • #35
            Originally posted by liu_xt005 View Post
            Following ulz_peter's original doc, I have some problem when doing the SNP-calling.

            java -Xmx4g -jar /path/GenomeAnalysisTK-1.1-35-ge253f6f/GenomeAnalysisTK.jar \
            -glm BOTH \
            -R hg18.fa \
            -T UnifiedGenotyper \
            -I myinput.marked.realigned.fixed.recal.bam \
            -D dbsnp132_hg18.txt \
            -o myoutput.snps.vcf \
            -metrics snps.metrics \
            -stand_call_conf 50.0 \
            -stand_emit_conf 10.0 \
            -dcov 1000 \
            -A DepthOfCoverage \
            -A AlleleBalance \
            -L hg18_exonIntervals.bed

            This "-L" option does not work.
            I got the hg18_exonIntervals.bed from UCSC as ulz_peter's original doc shows.
            I run the SNP-calling without the "-L" line.
            Then the variant quality score recalibration step does not work, generating an empty output.tranches file.

            Can somebody help me out? Thanks a lot.
            Did you try with -B:targetIntervals,BED hg18_exonIntervals.bed?

            By the way I couldn't figure out how to use this on version 1.2

            Comment


            • #36
              Originally posted by ulz_peter View Post
              I didn't find it yet, but there was a statement on the GATK homepage that the options descirbed there (which are basically pretty mcuh the same as I use) only work for diploid genomes and expected shifts of allele frequency must be adressed. So the question is: what are you planning to do: find rare alleles within some strains, sequence a genetically homogeneous strain...
              The first part of the analysis, until the local realignment around indels, is of course suitable for all genomes I have. Quality score recalibration and SNP calling need customization and I'm thinking on... Actually, I'd like to catch rare alleles in sequenced bacterial populations, like repeats, tandem repeats and structural variants, but I suspect it's a really difficult task to discriminate between real rare variations and noise introduced by such data.
              Obviously any kind of suggestion, it's really appreciate!

              Comment


              • #37
                -L option problem solved

                Originally posted by ulz_peter View Post
                What is the error message when you specify the -L argument?

                I actually stopped using the Variant quality Score recalibration as it often did not work out for me (I never work on more than 2 exomes at a time).

                I out the version withouth the recalibration on the SEQanswers Wiki/How-To section. You may have a look there, as I will update that in the future and stop uploading newer versions of the PDF file...
                Dear ulz_peter and raonyguimaraes,

                Thanks VERY MUCH to both of you.
                The problem seems to be solved by removing the random and hap intervals from the .bed file.

                ulz_peter,
                raonyguimaraes posted a similar pipeline by Gayle Philip.
                SNPs and Indels are recalibrated/filtered separately and combined after.
                I am trying that, and think that it is a good idea to exclude Indels from Gaussian models.

                Comment


                • #38
                  Great resource!!!
                  Just a few comments:
                  In the picard/MarkDuplicates.jar, the option 'CREATE_INDEX=true' should be added.
                  With respect to adding read group information, instead of using the bwa sampe -r option, picard AddOrReplaceReadGroups.jar is an easier way to go as it tells you which options are required. Thanks for sharing!

                  Comment


                  • #39
                    You're absolutely right on the first one.
                    For the sampe -r option: I'd like to keep it in the bwa part as another step handling the BAM file takes a lot of time than just doing it once... but I could add that as an alternative

                    Comment


                    • #40
                      reference dictionary

                      When I tried to use GATK to do the local realignment according to ulz_peter's instruction, one error message occurred: Invalid command line: Failed to load reference dictionary. Could anybody let me know where to get this reference dictionary? how to use it in the command line?

                      Thanks in advance.

                      Comment


                      • #41
                        Post your command line here... and the full error message

                        Comment


                        • #42
                          Originally posted by emilyjia2000 View Post
                          When I tried to use GATK to do the local realignment according to ulz_peter's instruction, one error message occurred: Invalid command line: Failed to load reference dictionary. Could anybody let me know where to get this reference dictionary? how to use it in the command line?

                          Thanks in advance.
                          Look here: http://www.broadinstitute.org/gsa/wi...ference_genome

                          Comment


                          • #43
                            A quick comment: I would suggest people to do the SAM -> BAM conversion using Picard.
                            In fact, samtools sort generates 79 temporary files for one sample.
                            Last edited by liu_xt005; 10-26-2011, 11:48 AM.

                            Comment


                            • #44
                              Thanks for all of your quick response. I used the command line:

                              java -Xmx4g -jar /path/to/GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -o output.interval -I /path/to/reorder_dedup.bam

                              I already copied the ucsc.hg19.dict in the same directory.

                              When I run this command, the error message:

                              ##### ERROR ------------------------------------------------------------------------------------------
                              ##### ERROR A USER ERROR has occurred (version 1.2-26-g43b0c98):
                              ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
                              ##### ERROR Please do not post this error to the GATK forum
                              ##### ERROR
                              ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
                              ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
                              ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
                              ##### ERROR
                              ##### ERROR MESSAGE: Invalid command line: Failed to load reference dictionary
                              ##### ERROR ------------------------------------------------------------------------------------------

                              Comment


                              • #45
                                If I'm not wrong ucsc.hg19.dict should be on the same directory of the reference file.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                9 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X