Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Thanks for the tool. In comparison to `cutadapt`, I noticed that `bbduk` (using the options below) ran 20x faster, and excluded all of the over-represented sequences that were flagged by FastQC.

    ```
    bbmap/bbduk.sh in1=file_1.fq.gz in2=file_2.fq.gz out1=file_1_trim.fq.gz out2=file_2_trim.fq.gz ref=adapters ktrim=r k=23 mink=11 hdist=1 tpe tbo
    ```​

    Are the above options sensible for trimming paired end Illumina RNA-seq?

    Should I be using `ktrim=rl` or a lower `k`?

    I did not see adapters.fa file in the bbmap folder, so I just used the 'adapters' string as mentioned in the --help description of ref.




    I am new to processing this omic at the read level. Thank you
    Last edited by Kermit; 07-04-2023, 01:15 PM.

    Comment



    • Hello,

      I'm noticing some non-reproducibility when bbduk is writing gzipped files. The reads appear to be the same, but the output order has changed. This leads to subtle changes in metaphlan abundance levels (changes at the 4th significant didget). Is this intended behavior or a bug? Is there anyway to force bbduk to run in a fully reproducible way?

      ```
      bbduk.sh in=DC001_R1.fastq.gz out=DC001_R1.rep1.fastq.gz ftm=5 tpe tbo qtrim=rl trimq=25 minlen=50 ref=adapters,phix -Xmx16g
      bbduk.sh in=DC001_R1.fastq.gz out=DC001_R1.rep2.fastq.gz ftm=5 tpe tbo qtrim=rl trimq=25 minlen=50 ref=adapters,phix -Xmx16g
      ​```


      Comment


      • Hi,

        Getting this strange error running mapPacBio.sh. The cluster node I'm running it on has 104 cpus, 500GB mem, and 400GB storage. I was getting the same error using 104 cpus so tried lowering it to 48.

        The read fasta is 56 GB and the reference fasta is 95 MB.

        I've cut down the length of the sequence and 'Sm' string shown.

        Thanks,

        S.

        mapPacBio.sh ref=$CPREF ambig=all nodisk=t nzo=t mappedonly=t minid=0.85 out=bact-bbmap.sam scafstats=cp_bb.mapstats in=$reads fastareadlen=6000 threads=48 maxindel=100 k=15

        Loading python3/3.9.2
        Loading requirement: intel-mkl/2020.3.304
        java -ea -Xmx229788m -cp /g/data/nm31/bin/bbmap/current/ align2.BBMapPacBio build=1 overwrite=true minratio=0.40 fastareadlen=6000 ambiguous=best minscaf=100 startpad=10000 stoppad=10000 midpad=6000 ref=/home/554/ta0341/dy44/r12.18_Roepera_similis/cp-fabids.fasta ambig=all nodisk=t nzo=t mappedonly=t minid=0.75 out=bact-bbmap.sam scafstats=bact_bb.mapstats in=/home/554/ta0341/dy44/r12.18_Roepera_similis/reads.fasta fastareadlen=6000 threads=48 maxindel=100 k=15
        Executing align2.BBMapPacBio [build=1, overwrite=true, minratio=0.40, fastareadlen=6000, ambiguous=best, minscaf=100, startpad=10000, stoppad=10000, midpad=6000, ref=/home/554/ta0341/dy44/r12.18_Roepera_similis/cp-fabids.fasta, ambig=all, nodisk=t, nzo=t, mappedonly=t, minid=0.75, out=bact-bbmap.sam, scafstats=bact_bb.mapstats, in=/home/554/ta0341/dy44/r12.18_Roepera_similis/reads.fasta, fastareadlen=6000, threads=48, maxindel=100, k=15]
        Version 39.01

        Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.400
        Set OUTPUT_MAPPED_ONLY to true
        Scaffold statistics will be written to bact_bb.mapstats
        Set threads to 48
        Retaining all best sites for ambiguous mappings.
        Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.367
        Executing dna.FastaToChromArrays2 [/home/554/ta0341/dy44/r12.18_Roepera_similis/cp-fabids.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=false, minscaf=100, midpad=6000, startpad=10000, stoppad=10000, nodisk=true]

        Set genScaffoldInfo=true
        Set genome to 1

        Loaded Reference: 0.003 seconds.
        Loading index for chunk 1-1, build 1
        Indexing threads started for block 0-1
        Indexing threads finished for block 0-1
        Generated Index: 10.393 seconds.
        Analyzed Index: 33.211 seconds.
        Started output stream: 1.183 seconds.
        Creating scaffold statistics table: 0.036 seconds.
        Cleared Memory: 0.175 seconds.
        Processing reads in single-ended mode.
        Started read stream.
        Started 48 mapping threads.
        Exception in thread "Thread-43" java.lang.AssertionError: -24, -24
        4908336, 1,0,99886506,99894071,0,00,151,154695,351771,0,263804,,mSSSmSSSmSmmmSSSSSSmSSSmmSSmmmSmmmSSmmSSSSSmmmSSSSSmSSSmmSSSmmSSmSSSSSmSSSSSSmSSSSSSmSSSSSSSSSSSmmmSSSSSSSmSSmmSSmSSSSSSSSSSSSSSmmmmmmSmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmSSmmmmmmmmmmmmmmmmmmmmSSmSSSSSmmSSSSmmSSmSmSSmSSSSSSSmSSSSmSmSSSSSSSSmmmSSmSSSSSSSSSSSSmSSSmmSSmSSSmmSSSmmSSmmmmmmmSSmmmmSmSmmSmSSSSSmSSSmmSSSSSSmSSSSmmSSSmSSSSmmSmSSSmmSmSSSmSSSmmmSSSSSmSSSmSSSSmmSSmSSSSSSSmSmSSmSSSSSmSmmSSSSmSSSSSSSmSSSmSmmSSSSSmSSSmSSSmmmSSSmSmSSSmmSmSSmSSSSSmSSSmSmmSSmSSSSmSSmSSmmmmSSSSSSSmSmSSSmSmSSSSmmSmSmSmmmmSSSSSSSSSSSSmmSSmSSSmSmSSSSSSmmSmmSSmSSSmmmSSmSSSSSmmSSSSSmSSSmmSSSSmSSSmSmmSS....SSmSm, ATGAAAGAAAATATTCACATAGCTTTGTTTGTTTGATTGGCCT....CCACAAACTACGAAAATTTCTCTTTTTCCTATCTCTATCCAGATCAGCCGAAGCCAAAGCTCTTATTATTTGCTGAGAAATAGCTCGAGTAAG
        at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:407)
        at align2.AbstractMapThread.genMatchStringForSite(AbstractMapThread.java:1043)
        at align2.AbstractMapThread.genMatchString(AbstractMapThread.java:923)
        at align2.BBMapThreadPacBio.processRead(BBMapThreadPacBio.java:564)
        at align2.AbstractMapThread.run(AbstractMapThread.java:522)
        Exception in thread "Thread-20" java.lang.AssertionError: -19, -19
        9787423, 1,0,91446916,91454472,0,00,84,35773,287063,0,286984,,SSSSmSSSmSSm....SmSm, AGATAGATTAAAATAAAAA....AATATTCATCGTTTAATTCTATAATAGAATGC
        at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:407)
        at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:638)
        at align2.AbstractMapThread.genMatchStringForSite(AbstractMapThread.java:1026)
        at align2.AbstractMapThread.genMatchString(AbstractMapThread.java:923)
        at align2.BBMapThreadPacBio.processRead(BBMapThreadPacBio.java:564)
        at align2.AbstractMapThread.run(AbstractMapThread.java:522)
        Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47

        **************************************************************************
        Warning! 2 mapping threads did not terminate normally.
        Check the error log; the output may be corrupt or incomplete.
        Please submit the full stderr output as a bug report, not just this message.
        **************************************************************************




        ------------------ Results ------------------

        Genome: 1
        Key Length: 15
        Max Indel: 100
        Minimum Score Ratio: 0.367
        Mapping Mode: normal
        Reads Used: 12661524 (57423401385 bases)

        Mapping: 27513.481 seconds.
        Reads/sec: 460.19
        kBases/sec: 2087.10


        Read 1 data: pct reads num reads pct bases num bases

        mapped: 1.9671% 249070 1.9269% 1106468960
        unambiguous: 1.8172% 230081 1.8567% 1066207675
        ambiguous: 0.1500% 18989 0.0701% 40261285
        low-Q discards: 0.0000% 0 0.0000% 0

        perfect best site: 0.0023% 288 0.0000% 20624
        semiperfect site: 0.0023% 288 0.0000% 20624

        Match Rate: NA NA 78.4035% 941243945
        Error Rate: 86.9937% 248782 20.7111% 248639475
        Sub Rate: 86.8864% 248475 5.4915% 65925963
        Del Rate: 83.3997% 238504 7.8336% 94043737
        Ins Rate: 83.6599% 239248 7.3860% 88669775
        N Rate: 3.2615% 9327 0.8854% 10629277
        Exception in thread "main" java.lang.AssertionError:
        The number of reads out does not add up to the number of reads in.
        This may indicate that a mapping thread crashed.
        If you submit a bug report, include the entire console output, not just this error message.
        125046+124024+0+12412452+0+0 = 12661522 != 12661524
        at align2.AbstractMapper.printOutputStats(AbstractMapper.java:1962)
        at align2.AbstractMapper.printOutput(AbstractMapper.java:1060)
        at align2.BBMapPacBio.testSpeed(BBMapPacBio.java:491)
        at align2.BBMapPacBio.main(BBMapPacBio.java:36)

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM
        • seqadmin
          The Impact of AI in Genomic Medicine
          by seqadmin



          Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
          02-26-2024, 02:07 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-14-2024, 06:13 AM
        0 responses
        33 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-08-2024, 08:03 AM
        0 responses
        72 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-07-2024, 08:13 AM
        0 responses
        81 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-06-2024, 09:51 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X