Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Brian,

    I ran into a smaller issue with bbnorm. When trying to input and output separate files for a PE library like this:

    Code:
    bbnorm.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1.bbnorm.fastq.gz out2=R2.bbnorm.fastq.gz prefilter=t tossbadreads=t ecc=t fixspikes=t qin=33 -Xmx72g target=40
    I receive this error during pass 2:

    Code:
    Exception in thread "main" java.lang.AssertionError: Please do not set 'interleaved=true' with dual input files.
    	at stream.ConcurrentGenericReadInputStream.<init>(ConcurrentGenericReadInputStream.java:132)
    	at stream.ConcurrentGenericReadInputStream.getReadInputStream(ConcurrentGenericReadInputStream.java:661)
    	at stream.ConcurrentGenericReadInputStream.getReadInputStream(ConcurrentGenericReadInputStream.java:641)
    	at kmer.KmerCount7MTA.countFastq(KmerCount7MTA.java:355)
    	at kmer.KmerCount7MTA.makeKca(KmerCount7MTA.java:222)
    	at jgi.KmerNormalize.runPass(KmerNormalize.java:1006)
    	at jgi.KmerNormalize.main(KmerNormalize.java:736)
    Setting interleaved=false doesn't change that. Outputting to a single, interleaved file (in1=xxx in2=xxx out=xxx) on the other hand works fine. Any ideas?

    Olaf

    Comment


    • #17
      Olaf,

      Currently, BBNorm uses single interleaved files for temporary storage when using multiple passes. And I have not implemented any way to specify dual files in intermediate stages, since everyone at JGI uses interleaved files for everything.

      You have two options.
      1) You could set "passes=1", which is faster, but I don't recommend it because it doesn't give as good results as 2-pass normalization.
      or
      2) You could specify only a single output file, which will get interleaved reads:

      bbnorm.sh in1=R1.fastq.gz in2=R2.fastq.gz out=R12.bbnorm.fastq.gz prefilter=t tossbadreads=t ecc=t fixspikes=t qin=33 -Xmx72g target=40

      ...Then, if you need to, de-interleave it afterward:

      reformat.sh in=R12.bbnorm.fastq.gz out1=R1.bbnorm.fastq.gz out2=R2.bbnorm.fastq.gz

      Sorry for the inconvenience! I'll try to fix that by the next release, though unlike documenting the "qin" flag, this will take more work so no guarantees. Thanks for bringing it to my attention. FYI, the flag "interleaved" has no effect on output, only input.

      -Brian
      Last edited by Brian Bushnell; 06-23-2014, 06:05 PM.

      Comment


      • #18
        Thanks for the info Brian, it wasn't a big issue.

        Olaf

        Comment


        • #19
          Olaf,

          This has been fixed in the latest release, 33.04

          Comment


          • #20
            Excellent, just did a test run. This is very useful software!

            Olaf

            Comment


            • #21
              Possible to add Read Group in BBmap header?

              *sorry, probably wrong thread, I found more activity in the release announcement thread*

              Hello,
              I used BBduk to process my read pairs and then mapped them using BBmap, then sam/bam and sorted.
              I plan to use an alternative to mpileup to process this set (for comparison of the outputs), so I am trying to use GATK tools.

              When running a GATK tool, it reports the error that the readgoup is not found in the header. With other mappers, this is an option (like -R for BWA). I found a methods to manually add readgoup information to the header (such as here), but I have limited linux skills and get errors when trying that approach (command "header" not found). I am also concerned that if I put the wrong RG info, I may pooch a downstream tool.

              Is there a way to make the BBmap output compatible with GATK?
              Last edited by sdmoore; 07-05-2014, 09:34 AM. Reason: wrong thread?

              Comment


              • #22
                sdmoore,

                BBMap does not have an option for setting the readgroup, since I never encountered a situation where I needed it. But if it's useful, I can add it to the next release. The solution in your linked thread looks reasonable and I'm not sure why it didn't work for you; I will let you know if I find a better solution.

                Comment


                • #23
                  @sdmoore: You actually just want AddOrReplaceReadGroups from Picard tools. The command I expect you were going for is "samtools reheader", though that won't really do what you want since read group information is also added to each alignment.

                  @Brian: It would be great if you could add read group support. That'll be needed by anyone doing SNP calling.

                  Comment


                  • #24
                    Thanks Brian and dpryan.
                    I had to give up on bbmap for now, not for this problem (I found the AddOrReplaceReadGroups tool later: I edited the sams or the bams). Rather, the resulting vcf from mpileup on the BBmap alignments were "all over the place" (and took forever to process too), I don't know how else to describe it, large insert calls for a bunch of positions. Viewing the file was no help (tons of insert/asterisks displayed). Same mess from FreeBayes. BWA-mem and Bowtie2 assemblies don't show this and I can easily identify known errors in the reference file with either mpileup or freebayes. The assembly looked more like what I got from cushaw2 (and also dropped). We are at a stage now where we will Sanger sequence a few loci to clear things up (e.g., BWA never shows a collection of mutations that Bowtie2 does). I was hoping to have a third assembler "take sides", but I think it's faster for us to sequence and be sure.

                    Comment


                    • #25
                      sdmoore,

                      By default, BBMap will look for much longer indels than BWA/Bowtie2, over 16000bp. You can limit this with the maxindel flag (e.g. "maxindel=40"). Soft-clipping (via "local" flag) can also reduce erroneous variation calls from chimeric or low-quality reads.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      66 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X