Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    ddaneels, for clipping the 3' end I use Sickle (https://github.com/najoshi/sickle). It is generally used for read trimming, as reads tend to have a low quality on the 3' end and the 5' end, so this tool allows trimming based on the quality from the fastq file.
    Regarding the reference file, I usually use a single FASTA file as the reference which includes all chromosomes, instead of per chromosome. It takes more time to map for the entire genome, but unless you want to study a specific region of your read data, it makes more sense to use the complete reference as an exploratory analysis.
    "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

    Comment


    • #77
      Trim barcodes

      Would it also be a good idea to trim of the Illumina barcode sequences? And to keep only the barcoded reads?

      Comment


      • #78
        in case the barcodes are still in your sequence you need to trim them out, they could cause serious trouble in downstream analyses. However, for our MiSeq we get the index reads as separate fastq files, are you sure they are in your sequence?

        Comment


        • #79
          Would it also be a good idea to trim of the Illumina barcode sequences? And to keep only the barcoded reads?
          I agree with ulz_peter, I don't think the barcodes are supposed to appear in the reads. Regarding Sickle, you cannot determine specificaly which region you want to trim. The only things you can control are the quality threshold (the read is trimmed before and after bases which have a quality below the threshold) and length threshold (if the trimmed read is shorter than the threshold, it will be removed)
          "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

          Comment


          • #80
            @ulz_peter:
            Thank you very much for the effort of compiling that manual and for updating it as well! It is really useful. While going through it I found out i was on the right track

            Comment


            • #81
              CountCovariates error on BAM files

              I am experiencing the same error as described below by blackgore, with bam files that have data and work with samtools and several picard commands.
              The recalibration .csv file is not generated.
              Has there been any explanation or possible solution concerning this issue?

              cheers,
              Sophia

              Originally posted by blackgore View Post
              In following the workflow mentioned above, I've come up against an error, and I'm wondering if I'm alone in this. Has anyone experienced difficulty with using CountCovariates tool, specifically with errors regarding accessing information from the input BAM file? I've tried this with several samples, but keep getting the same error, "Bad input: Could not find any usable data in the input BAM file(s)"

              (for those interested, the BAM files in question are not empty, and work just fine with samtools view).



              Code:
              java -Xmx16g -jar /$Software/GenomeAnalysisTK-1.3-17-gc62082b/GenomeAnalysisTK.jar -T CountCovariates -R /$Genomes/Broad/Human/b37/human_g1k_v37.fasta -I $Projects/data/SampleA_bowtie.gatk.realign.bam -nt 8 -l INFO -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -log RECAL.log -recalFile RECAL.csv --knownSites $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
              
              INFO  14:01:25,870 HelpFormatter - ---------------------------------------------------------------------------------
              INFO  14:01:25,875 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.3-17-gc62082b, Compiled 2011/11/18 15:24:46
              INFO  14:01:25,875 HelpFormatter - Copyright (c) 2010 The Broad Institute
              INFO  14:01:25,876 HelpFormatter - Please view our documentation at [url]http://www.broadinstitute.org/gsa/wiki[/url]
              INFO  14:01:25,876 HelpFormatter - For support, please view our support site at [url]http://getsatisfaction.com/gsa[/url]
              INFO  14:01:25,877 HelpFormatter - Program Args: -T CountCovariates -R /$Genomes/Broad/Human/b37/human_g1k_v37.fasta -I $Projects/data/SampleA_bowtie.gatk.realign.bam -nt 8 -l INFO -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -log RECAL.log -recalFile RECAL.csv --knownSites $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
              INFO  14:01:25,878 HelpFormatter - Date/Time: 2011/11/24 14:01:25
              INFO  14:01:25,878 HelpFormatter - ---------------------------------------------------------------------------------
              INFO  14:01:25,878 HelpFormatter - ---------------------------------------------------------------------------------
              INFO  14:01:26,052 RodBindingArgumentTypeDescriptor - Dynamically determined type of $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf to be VCF
              INFO  14:01:26,064 GenomeAnalysisEngine - Strictness is SILENT
              INFO  14:01:26,815 RMDTrackBuilder - Loading Tribble index from disk for file $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
              INFO  14:01:30,532 MicroScheduler - Running the GATK in parallel mode with 8 concurrent threads
              INFO  14:01:32,326 CountCovariatesWalker - The covariates being used here:
              INFO  14:01:32,327 CountCovariatesWalker -      ReadGroupCovariate
              INFO  14:01:32,327 CountCovariatesWalker -      QualityScoreCovariate
              INFO  14:01:32,327 CountCovariatesWalker -      CycleCovariate
              INFO  14:01:32,328 CountCovariatesWalker -      DinucCovariate
              INFO  14:01:41,189 CountCovariatesWalker - Writing raw recalibration data...
              INFO  14:01:44,145 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
              INFO  14:01:44,146 HttpMethodDirector - Retrying request
              INFO  14:01:44,149 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
              INFO  14:01:44,149 HttpMethodDirector - Retrying request
              INFO  14:01:44,152 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
              INFO  14:01:44,153 HttpMethodDirector - Retrying request
              INFO  14:01:44,155 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
              INFO  14:01:44,155 HttpMethodDirector - Retrying request
              INFO  14:01:44,158 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
              INFO  14:01:44,158 HttpMethodDirector - Retrying request
              ##### ERROR ------------------------------------------------------------------------------------------
              ##### ERROR A USER ERROR has occurred (version 1.3-17-gc62082b):
              ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
              ##### ERROR Please do not post this error to the GATK forum
              ##### ERROR
              ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
              ##### ERROR Visit our wiki for extensive documentation [url]http://www.broadinstitute.org/gsa/wiki[/url]
              ##### ERROR Visit our forum to view answers to commonly asked questions [url]http://getsatisfaction.com/gsa[/url]
              ##### ERROR
              ##### ERROR MESSAGE: Bad input: Could not find any usable data in the input BAM file(s).
              ##### ERROR ------------------------------------------------------------------------------------------

              Comment


              • #82
                Originally posted by sdvie View Post
                I am experiencing the same error as described below by blackgore, with bam files that have data and work with samtools and several picard commands.
                The recalibration .csv file is not generated.
                Has there been any explanation or possible solution concerning this issue?

                cheers,
                Sophia
                Looks like we have the same issue. My problems comes from using tophat as a mapper, which likes to output all mapping qualities as 255. See: https://getsatisfaction.com/gsa/topi...ovariates_tool

                Comment


                • #83
                  Hi, Guys
                  I know there are many experts here, hoping someone help me out this question.

                  I got the .csra file for the accession number on NCBI

                  From the Metadata it is said as the PAIRED.

                  First, i fastq-dump this csra file, it should come out 2 .fastq files, one is for forward and the other is for the reverse sequence. Right?! But i only got one .fastq file.


                  Then, i sam-dump/samtools the csra file into .bam file.

                  Following ulz_peter's great manual, i did the local realignment around indels using the reference hg19 i download from UCSC, an error occurred, saying that input file reads and reference have incompatible contigs. No overlapping contigs found.
                  Read contigs=[1,2,...,22, GL000235.1,GL000201.1....]
                  Reference contigs=[chr1,chr2,...,chrM]

                  GL000235.1,GL000201.1 ....are the files i download using SRA tool. Cause in the csra file there are linked references.

                  So what kind of reference i should use? How can i do that?

                  Thank you !

                  Comment


                  • #84
                    Hi!
                    I have problem continating the single-choromosome files to a single file.
                    I used the command given in the exome analysis.pdf :
                    cat chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa
                    chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chrM.fa > hg19.fa

                    I did check (several times) that the files were in this order.

                    I get this error message:
                    $ chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fafahr18.
                    -sh: chr10.fa: command not found

                    and

                    $ chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chrM.fa > hg19.fa
                    -sh: chr19.fa: command not found

                    I have no idea how to fix this, since i'm not sure what the problem is. can anyone help?

                    Comment


                    • #85
                      Originally posted by Sini View Post
                      Hi!
                      I have problem continating the single-choromosome files to a single file.
                      I used the command given in the exome analysis.pdf :
                      cat chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa
                      chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chrM.fa > hg19.fa

                      I did check (several times) that the files were in this order.

                      I get this error message:
                      $ chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fafahr18.
                      -sh: chr10.fa: command not found

                      and

                      $ chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chrM.fa > hg19.fa
                      -sh: chr19.fa: command not found

                      I have no idea how to fix this, since i'm not sure what the problem is. can anyone help?
                      Hi!
                      Make sure you have the whole list of .fa files without any line breaks in between, as each line break will submit what is written as a command. So e.g. if you have prepared your command "offline", like

                      cat chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa [enter]
                      chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa [enter]
                      chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chrM.fa > hg19.fa

                      the shell will consider every first string in the line (in bold) as the command to execute and that is why you get these errors.

                      cheers,
                      Sophia

                      Comment


                      • #86
                        Originally posted by sdvie View Post
                        Hi!
                        Make sure you have the whole list of .fa files without any line breaks in between, as each line break will submit what is written as a command. So e.g. if you have prepared your command "offline", like

                        cat chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa [enter]
                        chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa [enter]
                        chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chrM.fa > hg19.fa

                        the shell will consider every first string in the line (in bold) as the command to execute and that is why you get these errors.

                        cheers,
                        Sophia
                        Thank you! I'm not used to working with command lines so I keep making all kinds of stupid mistakes..

                        Comment


                        • #87
                          I’m a bit embarrassed to ask help again this soon. I’m now trying to use GATK. This is a new program to me and I run into problems straight away…. I followed the instructions in the pdf. file. and used the command:

                          java -Xmx4m -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -o input.bam.list -I input.AS.bam

                          and got huge error message (below) If anyone could tell me what this error message means I would be very greatful!

                          ##### ERROR ------------------------------------------------------------------------------------------
                          ##### ERROR stack trace
                          java.lang.ExceptionInInitializerError
                          at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.<init>(GenomeAnalysisEngine.java:146)
                          at org.broadinstitute.sting.gatk.CommandLineExecutable.<init>(CommandLineExecutable.java:53)
                          at org.broadinstitute.sting.gatk.CommandLineGATK.<init>(CommandLineGATK.java:55)
                          at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
                          Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
                          at org.reflections.Reflections.scan(Reflections.java:166)
                          at org.reflections.Reflections.<init>(Reflections.java:91)
                          at org.broadinstitute.sting.utils.classloader.PluginManager.<clinit>(PluginManager.java:79)
                          ... 4 more
                          Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
                          at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
                          at java.util.concurrent.FutureTask.get(Unknown Source)
                          at org.reflections.Reflections.scan(Reflections.java:162)
                          ... 6 more
                          Caused by: java.lang.OutOfMemoryError: Java heap space
                          at javassist.bytecode.ExceptionTable.<init>(ExceptionTable.java:58)
                          at javassist.bytecode.CodeAttribute.<init>(CodeAttribute.java:108)
                          at javassist.bytecode.AttributeInfo.read(AttributeInfo.java:78)
                          at javassist.bytecode.MethodInfo.read(MethodInfo.java:498)
                          at javassist.bytecode.MethodInfo.<init>(MethodInfo.java:79)
                          at javassist.bytecode.ClassFile.read(ClassFile.java:716)
                          at javassist.bytecode.ClassFile.<init>(ClassFile.java:85)
                          at org.reflections.adapters.JavassistAdapter.createClassObject(JavassistAdapter.java:86)
                          at org.reflections.adapters.JavassistAdapter.createClassObject(JavassistAdapter.java:22)
                          at org.reflections.scanners.AbstractScanner.scan(AbstractScanner.java:38)
                          at org.reflections.Reflections$2.run(Reflections.java:149)
                          at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
                          at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
                          at java.util.concurrent.FutureTask.run(Unknown Source)
                          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                          at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                          at java.lang.Thread.run(Unknown Source)
                          ##### ERROR ------------------------------------------------------------------------------------------
                          Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
                          at java.util.Arrays.copyOfRange(Unknown Source)
                          at java.lang.String.<init>(Unknown Source)
                          at java.util.Properties.loadConvert(Unknown Source)
                          at java.util.Properties.load0(Unknown Source)
                          at java.util.Properties.load(Unknown Source)
                          at java.util.PropertyResourceBundle.<init>(Unknown Source)
                          at java.util.ResourceBundle$Control.newBundle(Unknown Source)
                          at java.util.ResourceBundle.loadBundle(Unknown Source)
                          at java.util.ResourceBundle.findBundle(Unknown Source)
                          at java.util.ResourceBundle.findBundle(Unknown Source)
                          at java.util.ResourceBundle.findBundle(Unknown Source)
                          at java.util.ResourceBundle.getBundleImpl(Unknown Source)
                          at java.util.ResourceBundle.getBundle(Unknown Source)
                          at org.broadinstitute.sting.utils.text.TextFormattingUtils.loadResourceBundle(TextFormattingUtils.java:104)
                          at org.broadinstitute.sting.gatk.CommandLineGATK.getVersionNumber(CommandLineGATK.java:135)
                          at org.broadinstitute.sting.commandline.CommandLineProgram.exitSystemWithError(CommandLineProgram.java:342)
                          at org.broadinstitute.sting.commandline.CommandLineProgram.exitSystemWithError(CommandLineProgram.java:398)
                          at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:110)

                          Comment


                          • #88
                            Sini, it says you have an out of memory error. Try:

                            java -Xmx4000m -jar

                            instead of

                            java -Xmx4m -jar

                            Comment


                            • #89
                              Originally posted by Heisman View Post
                              Sini, it says you have an out of memory error. Try:

                              java -Xmx4000m -jar

                              instead of

                              java -Xmx4m -jar
                              Thank you!

                              Comment


                              • #90
                                why remove genes with multiple variants

                                Originally posted by raonyguimaraes View Post
                                Hello all, I have good news ...

                                After using annovar, I finally got to the number of 22709 variants on my data.

                                From there I'm now trying to filter based on this approach:


                                The numbers are pretty close so I think I'm on the right track

                                22709 Variants
                                11.179 Variants
                                4766 Variants
                                4222 Variants
                                removed frequency > 0.01
                                878 Variants
                                427 Variants
                                Hi, raonyguimaraes, why did you remove genes with multiple variants in your last 2 steps?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 11:49 AM
                                0 responses
                                15 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                61 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X