Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hello

    My sequencing output largely consists of Illumina (NextSeq and Miseq) NON-overlapping paired end reads (the occasional reads will overlap, as the DNA is not all fragmented evenly). For ease of storage and to simplify input data at the start of a workflow, I would like to combine the files.

    If I understand correctly, bbmerge is specifically for merging overlapping reads? In which case I am best off simply using the bbmap reformat.sh to combine the reads in to one interleaved file?

    Code:
    reformat.sh in1=read1.fq in2=read2.fq out=reads.fq
    Last edited by SDH; 10-16-2018, 11:14 PM. Reason: typo.

    Comment


    • shredding and mapping long reads to reference error

      Hello,

      I have a reference and I have an additional genome sequenced via 10x that contains 150k contigs from 1k to 7.5Mb. I am attempting to align the second to the first with BBMap by using the flag maxlen=2000 as there are extensive rearrangements. This morning I found this error message. Can anyone tell me what might have happened? Thanks.
      Code:
      'BBMap failed with error code 1
      Executing align2.BBMapPacBio [maxlen=2000, maxindel=4000, ambiguous=random, k=15, saa=f, maxlen=6000, minratio=0.40, minscaf=100, startpad=10000, stoppad=10000, midpad=6000, ref=ref.fasta, out=bbmap.sam, in=reads1.fasta]
      
      BBMap version 37.64
      Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.400
      Choosing a site randomly for ambiguous mappings.
      Writing reference.
      Executing dna.FastaToChromArrays2 [ref.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=100, midpad=6000, startpad=10000, stoppad=10000, nodisk=false]
      
      Set genScaffoldInfo=true
      Writing chunk 1
      Writing chunk 2
      Writing chunk 3
      Writing chunk 4
      Writing chunk 5
      Writing chunk 6
      Writing chunk 7
      Set genome to 1
      
      Loaded Reference: 0.007 seconds.
      Loading index for chunk 1-7, build 1
      No index available; generating from reference genome: /Users/wilfordbrimley/Geneious Prime/temp/17/ref/index/1/chr1-3_index_k15_c2_b1.block
      No index available; generating from reference genome: /Users/wilfordbrimley/Geneious Prime/temp/17/ref/index/1/chr4-7_index_k15_c2_b1.block
      Indexing threads started for block 0-3
      Indexing threads started for block 4-7
      Indexing threads finished for block 0-3
      Indexing threads finished for block 4-7
      Generated Index: 249.271 seconds.
      Analyzed Index: 71.551 seconds.
      Started output stream: 0.096 seconds.
      Cleared Memory: 0.195 seconds.
      Processing reads in single-ended mode.
      Started read stream.
      Started 24 mapping threads.
      Exception in thread "Thread-29" java.lang.AssertionError: -30, -30
      89764, 2,0,149134394,149142098,0,00,971,475770,460506,0,389847,149134394~149139756~149140058~149142098,mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmSmmmmmmmmmmmSmmmSmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmSmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmSmmmmmmmmmmmmmmmSm
      ...
      GCATTCAGACAAGGGTAGCAATATCAATCAACTCTATCTGCTTACAGGAATTGAGCTAGCATAAAGTGCTAAACTCATACTATCAGGCATAACCTAGTCAGGCTATCCTACTTACTGTCTACTCAATATAGGGGTGAGCATTTGGACCGGTGGACCGGACCGGACCGGTGAACCGGACCGGACCGAACCGGATCGGACCGTTTTTTTGGACCCCCGGGCCGGTTCTCGGTTCTAAGATTTCTCGCCGGCCCGGTCTGGTTTTTTTCCCGGTCCGGTCCGGTTTTTACCGGGTTTAGAACCGGTTGGACATGATTAAGATTTTTGTGACGTTTTTTAAACGCACATAACTCATTTGTTTTAAGTTGGATTGACCTGCGGTTTTTCCAACATTTAGTTTACAATTTGTAGATGAATTTAGACTATAATTTTGTTTATTTTGGTATTATATTCACCGTGTTACAATCCTTCAAAGTAGACCTAACAAAGCTGAAAAAATTCAGTTTTTCAGCAACATTTTGTAAGCTGTGACGTTTTTTAAACATCCATAACTCATTCGTTTTAATTCGGATTGACCTCCGATTTTTTCCAACATTTAGTTTACAGTTTGTAGATGAGTTTAGACTATAATTTTGTTTATTTTGGTATTATATTCACTGAGCTACAATCCTTTAAAGTAGAGCTATCAAAGTTGAAAAAATTTAGCTTTTTAGCAAAATTTTATAATCTGTGACGTTTTTTAAACACCCATAACTCATTCGTTTTAAGTTGGATTGACCTACGGTTTTTTCCAACATTTAGTTTACAATTTGTAGATGAATTTAGACTATAATTTTGTTTATTTTGGTATTATATGCATTGAGTTACAATCCTTCAAAGTAGAGCTATCAAAGCTGAAATATTTCAGCTTTTTAGTATTATTTTATACTATAATCCCGAGTCCGCGTTTAGCACTTGTGGTAGAGTTGTAAGTGAGTATCAAACTAGTTTATCGACATTGATAGTTGAAGCTTTGTTATGCACCCAAGATTGGGTGATAAAGTTTACAAATCCGATAGTTGACAACGTTGGTGACATCTTGAACGATGATGATATAACTTTGGGTAAAAAAATTACAATCTTTATTCTTTTTTTATACAATTATTACTTTTTATTAAATACTAACAACATTTTCATTTGTGTTTATGTAGAGATTGCACAAGCTTTAAATATGTTACATATGGATGACAATGATATGGGGAAGAGGCCAATAGAAGATTAGATTATGAGTTTATGAAATTTGTTAGTTTGATTGATTTTAAAGTTTAAACTATGATTTTTGTTGTTACGGTAATACTTTTGTATGTTTGTCTTAAAATTGTTTAACTTTTGTTGGTGAACTTGATTTATATGTTAGTTGCTTTTATTTGGAAAATTAATTTTTGCAAATTGCAAGTTATAATATTATACATCAATGTAGATTAGTTAGGCATGAAACGGAAACATATTAATTTTGCAAGTTATAATATTATACATCAATGTAGATTACTTATTATACTCTATGCAATTAAAGTTGTACCGGTCCAAACGAGTCCAAAAACCGGAAAACCCGGACCTAGACCCGGACACGGACACGGACACGGACACGACTAATCCGGTCCGGTCCATGGTCCACAAAATTCTCCATAGCCGGTCCGGTCCGATTCGGTCCGGGCCTGTCCAAATGCTCACCCCTAACTTAATATCTAGCATGCAATTCTTATACAACTGAAACACATAAAGCAGGCACATAAGGCATCACTCCTAGATCCTTAGTCCTATTCTAGCATGCTGTTCTACAAAAGCTGATAAACATAACATAAACTTGTAAGGGTACTTTGGGGAATACTTACTTGAGCTCGGCCGGTCGCGCACATCACACACTTCGTTCTTTCTCGAAACTCTTTTCTTAGTTTTTTAGAAAACATTTTCTTTTTAGAAAATCTTTTAATCCCTTGATTTGAGTTCAGACACCCCCGAAGGTGTGTCCGAATCCCTCAAACCAGGGCTCTGATACCAACTTGTAACGACCCAAATTTCACGTTCAAAAATTTCGTTTTAAAATGTTACTTTAGAAAACATTAATAATTAAAACATTGTTTGATCAAACCATAACACAACGTAAACCATGTCTAAAAGCCAAATCACACAACCAATCAAACATCAGAGTATAAAACCCAAAGAATCTCG
      at align2.MultiStateAligner9PacBio.makeGref(MultiStateAligner9PacBio.java:1388)
      at align2.MultiStateAligner9PacBio.fillLimited(MultiStateAligner9PacBio.java:96)
      at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:565)
      at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:641)
      at align2.AbstractMapThread.genMatchStringForSite(AbstractMapThread.java:1006)
      at align2.AbstractMapThread.genMatchString(AbstractMapThread.java:902)
      at align2.BBMapThreadPacBio.processRead(BBMapThreadPacBio.java:563)
      at align2.AbstractMapThread.run(AbstractMapThread.java:502)
      Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23
      
      **************************************************************************
      Warning! 2 mapping threads did not terminate normally.
      Check the error log; the output may be corrupt or incomplete.
      Please submit the full stderr output as a bug report, not just this message.
      **************************************************************************
      
      
      
      
      ------------------ Results ------------------
      
      Genome: 1
      Key Length: 15
      Max Indel: 4000
      Minimum Score Ratio: 0.4
      Mapping Mode: normal
      Reads Used: 581616 (2969452913 bases)
      
      Mapping: 177826.639 seconds.
      Reads/sec: 3.27
      kBases/sec: 16.70
      
      
      Read 1 data: pct reads num reads pct bases num bases
      
      mapped: 81.2921% 472808 78.5245% 2331747518
      unambiguous: 76.0459% 442295 75.7328% 2248849209
      ambiguous: 5.2462% 30513 2.7917% 82898309
      low-Q discards: 12.7684% 74263 15.0015% 445463533
      
      perfect best site: 4.9337% 28695 4.2346% 125743838
      semiperfect site: 5.0009% 29086 4.2700% 126794871
      
      Match Rate: NA NA 84.9205% 2173341879
      Error Rate: 87.8894% 442044 13.2182% 338289025
      Sub Rate: 84.8676% 426846 1.8940% 48472068
      Del Rate: 67.4695% 339341 8.8901% 227520440
      Ins Rate: 68.5198% 344624 2.4342% 62296517
      N Rate: 30.3789% 152792 1.8614% 47637054
      Exception in thread "main" java.lang.AssertionError:
      The number of reads out does not add up to the number of reads in.
      This may indicate that a mapping thread crashed.
      If you submit a bug report, include the entire console output, not just this error message.
      238640+234168+0+34543+74263 = 581614 != 581616
      at align2.AbstractMapper.printOutputStats(AbstractMapper.java:1892)
      at align2.AbstractMapper.printOutput(AbstractMapper.java:1022)
      at align2.BBMapPacBio.testSpeed(BBMapPacBio.java:462)
      at align2.BBMapPacBio.main(BBMapPacBio.java:37)
      '
      Last edited by Grammatonotus; 02-22-2019, 09:10 PM.

      Comment


      • A little bit more information: the DNA sequence in the error message maps to the middle of a contig about 2.8 Mb, starts 402,000 bases in. I tried running this both locally and on a cluster and got the same error message with almost exactly the same sequence in each-- 2045 bases in one, 2205 bases in the other, with ~99% overlap. Any insights appreciated.

        Comment


        • @grammtonotus: How much memory are you allocating for this job? With that large sized fragments you are likely going to need plenty.

          I also suggest that you look into using `minimap2` from Heng Li as an alternate option to BBMap.

          Comment


          • When I run it locally about 52 Gb (I ran out of system memory when I had it set to 58), on the cluster 439 Gb. I hope that's not the problem!

            Comment


            • a colleague suggests that the mapper may have come to a read that hits too many places in the reference. I have it set to place this at Random, would changing to First Best, None, or All be likely to give a different result?

              Also wondering if setting maxlen=4000 might change anything. I don't know how diverged the two are so kind of just guessing really. Thanks.

              Comment


              • Let us know if using more memory made a difference.

                You should really be using a different tool designed for long read alignments. Even LASTZ may be appropriate since you have query sequences as long as 7.5Mb.

                Comment


                • Hello Brian, it is just a question that pop up when using bbmap. For the outm, what type of flags do you use to both take pairs where both reads mappedand those where just one read of the pair mapped? Or how do you get read of both 77 and 141 flags?

                  Thank you and best

                  Comment


                  • I think you are looking for the following options that are accessible from "reformat.sh".

                    Code:
                    Sam and bam processing options:
                    
                    mappedonly=f            Toss unmapped reads.
                    unmappedonly=f          Toss mapped reads.
                    pairedonly=f            Toss reads that are not mapped as proper pairs.
                    unpairedonly=f          Toss reads that are mapped as proper pairs.
                    primaryonly=f           Toss secondary alignments.  Set this to true for sam to fastq conversion.
                    minmapq=-1              If non-negative, toss reads with mapq under this.
                    maxmapq=-1              If non-negative, toss reads with mapq over this.
                    requiredbits=0          (rbits) Toss sam lines with any of these flag bits unset.  Similar to samtools -f.
                    filterbits=0            (fbits) Toss sam lines with any of these flag bits set.  Similar to samtools -F.

                    Comment


                    • Fixing Mate not found error

                      Greetings,

                      I have a seemingly simple problem, that Im sure bbmap can help with, but Im unclear what steps must be taken for success.

                      I have several bams from paired end sequencing where some number of mates have been removed. The remaining read still says it is paired so every tool seems to throw an error. Samtools or picard fixmate does not correct it, at best they yell the read names in question.

                      I am trying to use reformat.sh to remove those mates that remain, but I also seem to get a java exception, even when just trying to convert to fastq.

                      Code:
                      [bwubb@node107 disambres]$ reformat.sh -Xmx5g in=qsortA.bam out1=reads1.fq out2=reads2.fq ow
                      java -ea -Xmx5g -cp /home/bwubb/software/bbmap/current/ jgi.ReformatReads -Xmx5g in=qsortA.bam out1=reads1.fq out2=reads2.fq ow
                      Executing jgi.ReformatReads [-Xmx5g, in=qsortA.bam, out1=reads1.fq, out2=reads2.fq, ow]
                      
                      Set INTERLEAVED to true
                      Found samtools 1.9
                      Input is being processed as paired
                      Exception in thread "Thread-2" java.lang.AssertionError: K00315:202:H577WBBXY:7:1101:1438:38767 13      -1      -       57328513        57328662        1000000100000000011     1       0  60       TCCAGAAAGAGAGTTCTTGAAAGAAAGAGGCGCTATCATTTTGACACAGATGGNAAGGGCTCGATTCACGATCAAAAAGGCTCCAAAAANAAAAAAAATTCTGTTTGTTNTNCTTTTTCCTCCNGATTCTAGTTTTTTTNTTATNTTTTG  AAFFFFFAJJJJJ7FFFJJ7JJ7FJJJJJFFJJJJJFJJJJJF-A-AAFFJFA!AF---AAJJ-AJF<FF-AAFAF<AAAJJJJFA-7A!--77-<A<JJ<A--7A<FJ!J!<FJJJJA<FF7!)7AJF---7AFJJFJ!JF--!7-7<F      .       30      C68m28Nm53      .
                      
                      K00315:202:H577WBBXY:7:1101:1438:38767  113     19      57328582        60      68S82M  1       176081441       0       CAAAANATAANAAAAAAACTAGAATCNGGAGGAAAAAGNANAACAAACAGAATTTTTTTTNTTTTTGGAGCCTTTTTGATCGTGAATCGAGCCCTTNCCATCTGTGTCAAAATGATAGCGCCTCTTTCTTTCAAGAACTCTCTTTCTGGA      F<7-7!--FJ!JFJJFA7---FJA7)!7FF<AJJJJF<!J!JF<A7--A<JJ<A<-77--!A7-AFJJJJAAA<FAFAA-FF<FJA-JJAA---FA!AFJFFAA-A-FJJJJJFJJJJJFFJJJJJF7JJ7JJFFF7JJJJJAFFFFFAA      NM:i:1  MD:Z:28C53      MC:Z:150M       AS:i:80 XS:i:21 SA:Z:1,176081357,+,81S62M7S,54,7;       RG:Z:FGC2033.7
                      flag=1110001
                      
                              at stream.SamReadInputStream.toReadList(SamReadInputStream.java:136)
                              at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:89)
                              at stream.SamReadInputStream.hasMore(SamReadInputStream.java:53)
                              at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:643)
                              at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)

                      I thought I could make read1.fq reads2.fq and use repair.sh Clearly it finds the two problem reads. I have tried other arguments with the reformat.sh, and a few other tools, but none want to correct the MATE_NOT_FOUND problem. It would seem impossible that I am the first person to run into this issue. Thank you very much for any advice/help.

                      -bwubb

                      Comment


                      • Greetings,

                        I have a seemingly simple problem, that Im sure bbmap can help with, but Im unclear what steps must be taken for success.

                        I have several bams from paired end sequencing where some number of mates have been removed. The remaining read still says it is paired so every tool seems to throw an error. Samtools or picard fixmate does not correct it, at best they yell the read names in question.

                        I am trying to use reformat.sh to remove those mates that remain, but I also seem to get a java exception, even when just trying to convert to fastq.

                        Code:
                        [bwubb@node107 disambres]$ reformat.sh -Xmx5g in=qsortA.bam out1=reads1.fq out2=reads2.fq ow
                        java -ea -Xmx5g -cp /home/bwubb/software/bbmap/current/ jgi.ReformatReads -Xmx5g in=qsortA.bam out1=reads1.fq out2=reads2.fq ow
                        Executing jgi.ReformatReads [-Xmx5g, in=qsortA.bam, out1=reads1.fq, out2=reads2.fq, ow]
                        
                        Set INTERLEAVED to true
                        Found samtools 1.9
                        Input is being processed as paired
                        Exception in thread "Thread-2" java.lang.AssertionError: K00315:202:H577WBBXY:7:1101:1438:38767 13      -1      -       57328513        57328662        1000000100000000011     1       0  60       TCCAGAAAGAGAGTTCTTGAAAGAAAGAGGCGCTATCATTTTGACACAGATGGNAAGGGCTCGATTCACGATCAAAAAGGCTCCAAAAANAAAAAAAATTCTGTTTGTTNTNCTTTTTCCTCCNGATTCTAGTTTTTTTNTTATNTTTTG  AAFFFFFAJJJJJ7FFFJJ7JJ7FJJJJJFFJJJJJFJJJJJF-A-AAFFJFA!AF---AAJJ-AJF<FF-AAFAF<AAAJJJJFA-7A!--77-<A<JJ<A--7A<FJ!J!<FJJJJA<FF7!)7AJF---7AFJJFJ!JF--!7-7<F      .       30      C68m28Nm53      .
                        
                        K00315:202:H577WBBXY:7:1101:1438:38767  113     19      57328582        60      68S82M  1       176081441       0       CAAAANATAANAAAAAAACTAGAATCNGGAGGAAAAAGNANAACAAACAGAATTTTTTTTNTTTTTGGAGCCTTTTTGATCGTGAATCGAGCCCTTNCCATCTGTGTCAAAATGATAGCGCCTCTTTCTTTCAAGAACTCTCTTTCTGGA      F<7-7!--FJ!JFJJFA7---FJA7)!7FF<AJJJJF<!J!JF<A7--A<JJ<A<-77--!A7-AFJJJJAAA<FAFAA-FF<FJA-JJAA---FA!AFJFFAA-A-FJJJJJFJJJJJFFJJJJJF7JJ7JJFFF7JJJJJAFFFFFAA      NM:i:1  MD:Z:28C53      MC:Z:150M       AS:i:80 XS:i:21 SA:Z:1,176081357,+,81S62M7S,54,7;       RG:Z:FGC2033.7
                        flag=1110001
                        
                                at stream.SamReadInputStream.toReadList(SamReadInputStream.java:136)
                                at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:89)
                                at stream.SamReadInputStream.hasMore(SamReadInputStream.java:53)
                                at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:643)
                                at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)

                        I thought I could make read1.fq reads2.fq and use repair.sh Clearly it finds the two problem reads. I have tried other arguments with the reformat.sh, and a few other tools, but none want to correct the MATE_NOT_FOUND problem. It would seem impossible that I am the first person to run into this issue. Thank you very much for any advice/help.

                        -bwubb

                        Comment


                        • Greetings,

                          I have a seemingly simple problem, that Im sure bbmap can help with, but Im unclear what steps must be taken for success.

                          I have several bams from paired end sequencing where some number of mates have been removed. The remaining read still says it is paired so every tool seems to throw an error. Samtools or picard fixmate does not correct it, at best they yell the read names in question.

                          I am trying to use reformat.sh to remove those mates that remain, but I also seem to get a java exception, even when just trying to convert to fastq.

                          Code:
                          [bwubb@node107 disambres]$ reformat.sh -Xmx5g in=qsortA.bam out1=reads1.fq out2=reads2.fq ow
                          java -ea -Xmx5g -cp /home/bwubb/software/bbmap/current/ jgi.ReformatReads -Xmx5g in=qsortA.bam out1=reads1.fq out2=reads2.fq ow
                          Executing jgi.ReformatReads [-Xmx5g, in=qsortA.bam, out1=reads1.fq, out2=reads2.fq, ow]
                          
                          Set INTERLEAVED to true
                          Found samtools 1.9
                          Input is being processed as paired
                          Exception in thread "Thread-2" java.lang.AssertionError: K00315:202:H577WBBXY:7:1101:1438:38767 13      -1      -       57328513        57328662        1000000100000000011     1       0  60       TCCAGAAAGAGAGTTCTTGAAAGAAAGAGGCGCTATCATTTTGACACAGATGGNAAGGGCTCGATTCACGATCAAAAAGGCTCCAAAAANAAAAAAAATTCTGTTTGTTNTNCTTTTTCCTCCNGATTCTAGTTTTTTTNTTATNTTTTG  AAFFFFFAJJJJJ7FFFJJ7JJ7FJJJJJFFJJJJJFJJJJJF-A-AAFFJFA!AF---AAJJ-AJF<FF-AAFAF<AAAJJJJFA-7A!--77-<A<JJ<A--7A<FJ!J!<FJJJJA<FF7!)7AJF---7AFJJFJ!JF--!7-7<F      .       30      C68m28Nm53      .
                          
                          K00315:202:H577WBBXY:7:1101:1438:38767  113     19      57328582        60      68S82M  1       176081441       0       CAAAANATAANAAAAAAACTAGAATCNGGAGGAAAAAGNANAACAAACAGAATTTTTTTTNTTTTTGGAGCCTTTTTGATCGTGAATCGAGCCCTTNCCATCTGTGTCAAAATGATAGCGCCTCTTTCTTTCAAGAACTCTCTTTCTGGA      F<7-7!--FJ!JFJJFA7---FJA7)!7FF<AJJJJF<!J!JF<A7--A<JJ<A<-77--!A7-AFJJJJAAA<FAFAA-FF<FJA-JJAA---FA!AFJFFAA-A-FJJJJJFJJJJJFFJJJJJF7JJ7JJFFF7JJJJJAFFFFFAA      NM:i:1  MD:Z:28C53      MC:Z:150M       AS:i:80 XS:i:21 SA:Z:1,176081357,+,81S62M7S,54,7;       RG:Z:FGC2033.7
                          flag=1110001
                          
                                  at stream.SamReadInputStream.toReadList(SamReadInputStream.java:136)
                                  at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:89)
                                  at stream.SamReadInputStream.hasMore(SamReadInputStream.java:53)
                                  at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:643)
                                  at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)

                          I thought I could make read1.fq reads2.fq and use repair.sh Clearly it finds the two problem reads. I have tried other arguments with the reformat.sh, and a few other tools, but none want to correct the MATE_NOT_FOUND problem. It would seem impossible that I am the first person to run into this issue. Thank you very much for any advice/help.

                          -bwubb

                          Comment


                          • Greetings,

                            I have a seemingly simple problem, that Im sure bbmap can help with, but Im unclear what steps must be taken for success.

                            I have several bams from paired end sequencing where some number of mates have been removed. The remaining read still says it is paired so every tool seems to throw an error. Samtools or picard fixmate does not correct it, at best they yell the read names in question.

                            I am trying to use reformat.sh to remove those mates that remain, but I also seem to get a java exception, even when just trying to convert to fastq.

                            Code:
                            [bwubb@node107 disambres]$ reformat.sh -Xmx5g in=qsortA.bam out1=reads1.fq out2=reads2.fq ow
                            java -ea -Xmx5g -cp /home/bwubb/software/bbmap/current/ jgi.ReformatReads -Xmx5g in=qsortA.bam out1=reads1.fq out2=reads2.fq ow
                            Executing jgi.ReformatReads [-Xmx5g, in=qsortA.bam, out1=reads1.fq, out2=reads2.fq, ow]
                            
                            Set INTERLEAVED to true
                            Found samtools 1.9
                            Input is being processed as paired
                            Exception in thread "Thread-2" java.lang.AssertionError: K00315:202:H577WBBXY:7:1101:1438:38767 13      -1      -       57328513        57328662        1000000100000000011     1       0  60       TCCAGAAAGAGAGTTCTTGAAAGAAAGAGGCGCTATCATTTTGACACAGATGGNAAGGGCTCGATTCACGATCAAAAAGGCTCCAAAAANAAAAAAAATTCTGTTTGTTNTNCTTTTTCCTCCNGATTCTAGTTTTTTTNTTATNTTTTG  AAFFFFFAJJJJJ7FFFJJ7JJ7FJJJJJFFJJJJJFJJJJJF-A-AAFFJFA!AF---AAJJ-AJF<FF-AAFAF<AAAJJJJFA-7A!--77-<A<JJ<A--7A<FJ!J!<FJJJJA<FF7!)7AJF---7AFJJFJ!JF--!7-7<F      .       30      C68m28Nm53      .
                            
                            K00315:202:H577WBBXY:7:1101:1438:38767  113     19      57328582        60      68S82M  1       176081441       0       CAAAANATAANAAAAAAACTAGAATCNGGAGGAAAAAGNANAACAAACAGAATTTTTTTTNTTTTTGGAGCCTTTTTGATCGTGAATCGAGCCCTTNCCATCTGTGTCAAAATGATAGCGCCTCTTTCTTTCAAGAACTCTCTTTCTGGA      F<7-7!--FJ!JFJJFA7---FJA7)!7FF<AJJJJF<!J!JF<A7--A<JJ<A<-77--!A7-AFJJJJAAA<FAFAA-FF<FJA-JJAA---FA!AFJFFAA-A-FJJJJJFJJJJJFFJJJJJF7JJ7JJFFF7JJJJJAFFFFFAA      NM:i:1  MD:Z:28C53      MC:Z:150M       AS:i:80 XS:i:21 SA:Z:1,176081357,+,81S62M7S,54,7;       RG:Z:FGC2033.7
                            flag=1110001
                            
                                    at stream.SamReadInputStream.toReadList(SamReadInputStream.java:136)
                                    at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:89)
                                    at stream.SamReadInputStream.hasMore(SamReadInputStream.java:53)
                                    at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:643)
                                    at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)

                            I thought I could make read1.fq reads2.fq and use repair.sh Clearly it finds the two problem reads. I have tried other arguments with the reformat.sh, and a few other tools, but none want to correct the MATE_NOT_FOUND problem. It would seem impossible that I am the first person to run into this issue. Thank you very much for any advice/help.

                            -bwubb

                            EDIT: Sorry if this post shows up multiple times. I literally tried making it three times and nothing showed up for a full day. Tried a quick reply and it showed up immediately.

                            Comment


                            • @bwbubb: Option for reformat.sh that you are looking for is "pairedonly=f Toss reads that are not mapped as proper pairs". Set pairedonly=t and process your BAM files.

                              Comment


                              • Originally posted by GenoMax View Post
                                @bwbubb: Option for reformat.sh that you are looking for is "pairedonly=f Toss reads that are not mapped as proper pairs". Set pairedonly=t and process your BAM files.
                                EDIT:

                                Thank you @GenoMax. This actually does not work for for this case.

                                Code:
                                [bwubb@node061 disambres]$ reformat.sh pairedonly=t in=input.bam out=fix.bam
                                java -ea -Xmx200m -cp /home/bwubb/software/bbmap/current/ jgi.ReformatReads pairedonly=t in=input.bam out=fix.bam
                                Executing jgi.ReformatReads [pairedonly=t, in=input.bam, out=fix.bam]
                                
                                Found samtools 1.9
                                Input is being processed as unpaired
                                Input:                          13726088 reads                  2041390865 bases
                                Output:                         12583399 reads (91.68%)         1880228501 bases (92.11%)
                                
                                Time:                           349.142 seconds.
                                Reads Processed:      13726k    39.31k reads/sec
                                Bases Processed:       2041m    5.85m bases/sec
                                
                                [bwubb@node061 disambres]$ samtools index fix.bam
                                [bwubb@node061 disambres]$ java -jar ~/software/picard/2.20.2/picard.jar ValidateSamFile I=fix.bam MODE=SUMMARY
                                INFO    2019-06-13 08:00:16     ValidateSamFile
                                
                                ...
                                
                                ## HISTOGRAM    java.lang.String
                                Error Type      Count
                                ERROR:MATE_NOT_FOUND    2
                                
                                [Thu Jun 13 08:02:01 EDT 2019] picard.sam.ValidateSamFile done. Elapsed time: 1.74 minutes.
                                Runtime.totalMemory()=2075918336
                                To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
                                Not sure how things work internally, but from my understanding the "problem" reads think they are paired.


                                I toiled with this more yesterday and I believe I can use reformat.sh to make an interleaved fastq file, and then use repair.sh to make read1.fq and read2.fq and take it back to alignment. All attempts at correcting the bam directly seem to fail. I have yet to have total success, which in my case would be passing picardtools ValidateSamFile.

                                I am aligning with bwa mem -M, having some other issue in the validation that maybe is unrelated or will be fixed with the right reformat/repair arguments.
                                Last edited by bwubb; 06-13-2019, 07:36 AM.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X