Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rkanwar
    Junior Member
    • Feb 2011
    • 8

    I am using version v34.08. Another pair of reads that failed are:

    R0266248:292:C5C0TACXX:4:1101:3508:3710 97 3 193355806 42 19=1X77= = 96255219 -97100684 AATATTCCCAAGAGGATCTTGGGAGATGATGACACGTTCTCCAGTTAAGGTAAGAACATAGGCCGTCTCAGTGAGGTTCCTTAGGAGAGTAACTGCT CCCFFFFFHHHHFHIGHIJIIJIIJJJJGIIJJJJJJJJJJJIHHIIGGJFHIIHIJJJGGIJJJJJHHFHHHDF@?AEECCCEDDC??>CCDECDD NM:i:1 AM:i:42 RG:Z:AS_N#1#4#AS_N_1
    R0266248:292:C5C0TACXX:4:1101:3508:3710 145 11 96255219 44 98= = 193355806 97100684 GTAGCCATGCAGAGTTCCCAGATTTCTTCCCCTTCAGCCCAGCTTCTGTGTCTTCCCTCTGTCCACTTGGCATCTTCCCTGTGAAGATCTGTTAAAAG DDCDDDDDDDDDDDDBDDEDDDDECDABFHHECAJJIJIJHHHDFIHFJJIIGHIJIIIIJJJIGHIJJHHE8JJIJIJIJJJJIHFGHHFFFFFCCC NM:i:0 AM:i:44 RG:Z:AS_N#1#4#AS_N_1
    I used the following command to run bbmap:

    bbmap_cmd="${bbmap} \
    in1=${duk1} \
    in2=${duk2} \
    ambiguous=toss \
    rgid=${ID} \
    rgsm=${SM} \
    rgpl=${PL} \
    rglb=${LB} \
    out=${bb_prefix}.sam \
    pairlen=1000 \
    threads=10 \
    k=11 \
    trimreaddescriptions=t \
    reads=1000 \
    slow=t"
    Thanks!


    Originally posted by Brian Bushnell View Post
    Sounds like a bug, and I agree, it seems as though the mate is mapped to a different (longer) contig. What version of BBMap are you using? Also, can you give me the complete command line and, if possible, the text of the second read? Thanks!

    Comment

    • michaellim
      Member
      • Dec 2014
      • 28

      Hey Guys,

      If I have Java 8, does that mean I can't use BBMap?

      Thank you. Merry Christmas

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        Originally posted by michaellim View Post
        Hey Guys,

        If I have Java 8, does that mean I can't use BBMap?

        Thank you. Merry Christmas
        It works fine with Java 7 or 8 Java is backwards-compatible.

        Comment

        • rkanwar
          Junior Member
          • Feb 2011
          • 8

          Thanks for fixing the chimeric alignment thing [v34.30] ! I have another issue now while running GATK. It is now quitting using the following error:

          ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{test.bam} is malformed: Read R0266248:292:C5C0TACXX:4:2101:19666:82211 is missing the read group (RG) tag, which is required by the GATK.
          The SAM records for the corresponding read are:
          R0266248:292:C5C0TACXX:4:2101:19666:82211 73 1 6500339 44 98= = 6500339 0 GGTGCCACGCCCCTGTACCTGGCGTGCCAGGAGGGCCACCTGGAGGTGACCCAGTACCTGGTGCAGGAATGCGGCGCAGACCCGCACGCGCGCGCCCA @@@FFFFFHGHHHIBFHIJIGIJJIIIJJJGIIIIIIJJJJJIGHI@GHHHHHFBDFFE>B.;>CDDDDDDDDDDBDDBDDDDDBDDDDDDDDBDDDD RG:Z:AS_N#1#4#AS_N_1 AM:i:0 NM:i:0
          R0266248:292:C5C0TACXX:4:2101:19666:82211 133 1 6500339 0 * = 6500339 0 GAATTGAGGACATACAGCACAAAGAAGTTTCTGAGAATGCTTCTGTCTAGATTTAATATGAAGATAACCCGTTTCCAATGAAATCCTCAAAGCTATCC @CCFFFFFHHHHHIJJIJIJJIJJJJIHHIIJJIJHIGGJJIJJJIJJIJIJJJJIJJJGIIJJIJJJIJIHIJJJHHGEHHFFDEDDCE;ACEDCDD
          I am using paired end reads and it looks like for this particular read pair the read group info is missing for one of the ends. Can you please fix this small issue too ? Thanks a lot for your help !

          bye,
          Rahul

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            Rahul,

            Thanks again! This is now fixed in 34.33. I was skipping the generation of optional tags for unmapped reads, which caused this problem.

            -Brian

            Comment

            • lankage
              Member
              • Oct 2014
              • 20

              dedupe.sh question

              I recently have been experimenting with read clustering tools using pacbio data. When running dedupe.sh without any of the clustering parameters, I noticed it outputs significantly fewer reads than are input despite saying it found 0 duplicates, 0 contained sequences and 0 invalid entries. I went from a fastq file containing ~3000 reads to a file containing 130. Im wondering, what filtering is occurring behind the scenes here?

              dhowifi-19acbio4 mgraham$ wc -l 0002_Forward--0002_Reverse.fastq
              12460 0002_Forward--0002_Reverse.fastq

              dhowifi-19:shortread mgraham$ bbmap34_26/dedupe.sh in=pacbio4/0002_Forward--0002_Reverse.fastq -Xmx10g
              java -Djava.library.path=/Users/mgraham/Desktop/dev/shortread/bbmap34_26/jni/ -ea -Xmx10g -Xms10g -cp /Users/mgraham/Desktop/dev/shortread/bbmap34_26/current/ jgi.Dedupe in=pacbio4/0002_Forward--0002_Reverse.fastq -Xmx10g
              Executing jgi.Dedupe [in=pacbio4/0002_Forward--0002_Reverse.fastq, -Xmx10g]

              Initial:
              Memory: free=10129m, used=161m

              Found 0 duplicates.
              Finished exact matches. Time: 0.070 seconds.
              Memory: free=9807m, used=483m

              Found 0 contained sequences.
              Finished containment. Time: 0.036 seconds.
              Memory: free=9538m, used=752m

              Removed 0 invalid entries.
              Finished invalid removal. Time: 0.017 seconds.
              Memory: free=9538m, used=752m

              Input: 130 reads 125193 bases.
              Duplicates: 0 reads (0.00%) 0 bases (0.00%) 0 collisions.
              Containments: 0 reads (0.00%) 0 bases (0.00%) 1539 collisions.
              Result: 130 reads (100.00%) 125193 bases (100.00%)

              Time: 0.131 seconds.
              Reads Processed: 130 0.99k reads/sec
              Bases Processed: 125k 0.96m bases/sec

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                There should not be any filtering, so I don't know what the problem is. I imagine that all of my tools would perceive that file as containing 130 reads. Would it be possible for you to email me the file, or at least the first 2000 lines of it? It's possible that it is misformatted, or contains a control character in an unexpected place, so that the program quits reading it early.

                Comment

                • sdriscoll
                  I like code
                  • Sep 2009
                  • 436

                  Question about bbmapskimmer.sh. so it says in the readme this version is designed to find all mapping locations with alignment score above a certain threshold. So does that mean when you run it and say set minid=0.9 then you get all mappings at that score and above or do you need to set something else? I'm asking because the options appear to be identical to standard bbmap (i.e. with the ambig, maxsites2, secondary, maxsites options).

                  With all of these options identical to bbmap I don't understand how this version can output all mappings above a threshold.
                  /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                  Salk Institute for Biological Studies, La Jolla, CA, USA */

                  Comment

                  • Brian Bushnell
                    Super Moderator
                    • Jan 2014
                    • 2709

                    The options are identical as much as possible; the code itself is different. bbmapskimmer.sh does not run BBMap, but a different program based on BBMap, but substantially different - BBMap gains a lot of speed by only looking for mapping sites similar to or better than the current best discovered site, and by limiting the minimum score in the dynamic programming alignment to a little below the best known score. These heuristics and early exits were removed from skimmer, so it is generally slower than BBMap. I tried to keep the interface the same for convenience.

                    Some else recently asked me roughly the same question via email, and I'll post my response here:

                    Maxsites controls the number of alignments printed for a read that maps to multiple locations, and has no function unless ambig=all. Maxsites2 controls the maximum number of potential mapping locations that are considered for alignment, which is computationally expensive; it's really a heuristic that affects speed versus accuracy. If there are more than "maxsites2" seed hits in the reference, the lowest-scoring ones will be discarded prior to alignment. The default is 800 - normally, there is no reason to change it unless you have a situation where you actually expect reads to have hundreds of correct mapping locations, and need all of them reported (like when aligning a primer sequence to a 16S database, for example). maxsites2 has an effect whether or not ambig=all.

                    With ambig=all, BBMap will only report alignments that are equal-scoring, or quite close, to the best alignment - so if the best alignment was perfect, and the second best had 4 substitutions, the second-best would NOT be reported. If you actually want every single alignment above some identity threshold, then you should use BBMapSkimmer (bbmapskimmer.sh) instead. The syntax is the same, but rather than looking for the best mapping location and then reporting it and anything else with a similar score, it is designed to report ALL alignments above a minimum quality.

                    For example:

                    bbmapskimmer.sh in=reads.fq out=mapped.sam ambig=all idtag minid=0.80 idfilter=0.9 maxsites=40

                    This will look for alignments with an approximate identity of at least 80%. Once it finds them, anything with an actual identity of under 90% will be removed (and the actual identity will be printed in the sam line). So the net result is that you will end up with everything that has at least 90% identity, up to 40 per read. The reason minid<idfilter is because minid is a heuristic based on approximate score, which acts before the true identity is known (which requires an expensive alignment), while idfilter is exact and acts AFTER alignment.

                    Comment

                    • sdriscoll
                      I like code
                      • Sep 2009
                      • 436

                      Nice. See I use tools like RSEM and eXpress and both of those tools like to get all possible alignments up to some allowed alignment threshold. That's sort of the default output of something like bowtie or bowtie2 with the -a flag (all alignments). So it sounds like bbmapskimmer.sh outputs what I'm looking for when it comes to running those expression quantification tools. Thanks.
                      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                      Salk Institute for Biological Studies, La Jolla, CA, USA */

                      Comment

                      • lankage
                        Member
                        • Oct 2014
                        • 20

                        BBTools programs stop reading in fastq reads when encountering 0 length sequences

                        Brian, found the issue. Our PacBio data contained 0 length sequences that were causing dedupe/bbmap to quit reading in sequences. I removed these and everything runs fine.
                        Example:

                        @m141217_232404_42136_c100683242550000001823136602221530_s1_X0/45415/ccs 0.9 18

                        +

                        @m141217_232404_42136_c100683242550000001823136602221530_s1_X0/12786/ccs 0.99 27

                        Comment

                        • Brian Bushnell
                          Super Moderator
                          • Jan 2014
                          • 2709

                          OK, thanks for the report. I will modify it to process those correctly.

                          Comment

                          • sdriscoll
                            I like code
                            • Sep 2009
                            • 436

                            Hey Brian,

                            Considering the very nice multi-threaded set of tools you have thus far, would it be out of the question to add in a variant caller for SNP/INDEL calling? It seems that those things tend to NOT be multi-threaded.
                            /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                            Salk Institute for Biological Studies, La Jolla, CA, USA */

                            Comment

                            • Brian Bushnell
                              Super Moderator
                              • Jan 2014
                              • 2709

                              I wrote a variant caller, which is indeed multithreaded, but I don't think it works right now because of massive changes I've made to related code over the last couple years. But it would be useful in a lot of scenarios (including error-correcting assemblies) so I have been meaning to resurrect it.

                              I will do so if I have time; that's really the limiting reagent.

                              Comment

                              • sdriscoll
                                I like code
                                • Sep 2009
                                • 436

                                Cool. That would be a useful tool. There is a multithreaded caller in the Subread package however it is a little too barebones.
                                /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                                Salk Institute for Biological Studies, La Jolla, CA, USA */

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Pathogen Surveillance with Advanced Genomic Tools
                                  by seqadmin




                                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                  03-24-2025, 11:48 AM
                                • seqadmin
                                  New Genomics Tools and Methods Shared at AGBT 2025
                                  by seqadmin


                                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                  The Headliner
                                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                  03-03-2025, 01:39 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 10:17 AM
                                0 responses
                                7 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-20-2025, 05:03 AM
                                0 responses
                                49 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-19-2025, 07:27 AM
                                0 responses
                                59 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-18-2025, 12:50 PM
                                0 responses
                                50 views
                                0 reactions
                                Last Post seqadmin  
                                Working...