Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • M4TTN
    Member
    • Jan 2014
    • 77

    #16
    Here is the histogram:



    So the aberrant inserts of approx 300bp are clearly caused by an error in how Bowtie classifies insert length.

    I do think this might be for those inserts in the 280-300bp range that fail to have adapters trimmed in the FastQ. What do you think?

    If so, it would explain why there is a dip in apparent insert lenghts in the 280-300bp range, an the anomalous peak at 300bp.

    Comment

    • GenoMax
      Senior Member
      • Feb 2008
      • 7142

      #17
      Since you now have BBMap installed you can easily check with BBDuk to see how many reads still have adapter contamination. I assume you have not done any trimming on these reads. Specify appropriate adapter file when you trim. Standard illumina adapter files are in "/path_to/bbmap-xx.xx/bbmap/resources/".
      Last edited by GenoMax; 02-05-2015, 06:11 AM.

      Comment

      • M4TTN
        Member
        • Jan 2014
        • 77

        #18
        Trimming was performed automatically by Illumina prior to FASTQ download, but I suspect there will be short tails left on a subset of fragments.

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #19
          No harm in trying a pass through BBDuk to verify (specially if you want to get rid of those tails).

          Comment

          • M4TTN
            Member
            • Jan 2014
            • 77

            #20
            Initial:
            Memory: free=1039m, used=21m

            Added 16767 kmers; time: 0.225 seconds.
            bbduk output:

            Memory: free=1032m, used=28m

            Input is being processed as paired
            Started output streams: 0.010 seconds.
            Processing time: 14.997 seconds.

            Input: 2546118 reads 645049609 bases.
            KTrimmed: 2543559 reads (99.90%) 633504163 bases (98.21%)
            Result: 467188 reads (18.35%) 11545446 bases (1.79%)

            Time: 15.238 seconds.
            Reads Processed: 2546k 167.09k reads/sec
            Bases Processed: 645m 42.33m bases/sec

            Comment

            • M4TTN
              Member
              • Jan 2014
              • 77

              #21
              Hmmm..I don't think I have bbduk working correctly.

              Original Fastq file:

              @M00561:19:000000000-ABAUW:1:1101:10946:1435 1:N:0:9
              TCTCCCTTTTATCTTTACATACTGTCGTTCATTATCCTCTTATCTTATCAAACCTTGCTTTTCATCTTTCTTTTTTTTTTTTTTCTTTTCTCTTTCTTTTCTCCTTTCTACCCTCTATTTTGTTTTCTTTTTTTCTTCAGTTTTTCATCTTTTATTATTCTCTTTTATCTCTGTTTGGATAGCGCCTGACTGAAGTTATAACTTCGGTCATGTTATCTGT
              +
              CBCCCFFFFE9FAFFG9@,C,C@,CC8FF<8CC,CC@F,CC,C,;C,C@,,,;6CE,6,<CC@,,;<CC@,<C,,+78@++7++6<C,<6<,,,<<C?,,5,,,,,<,,,9,,:9,,,9,,,::,:@???<=++8,88,,,,:<8,,,5,,,:,,:,,,5,,,::A,,575,,:8,,+,,,,3+38++,8>,,,,,,,,,,7>@=**66*,,77,@@>,,
              @M00561:19:000000000-ABAUW:1:1101:13213:1452 1:N:0:9
              CCGTATTACCTGCCGCATCATTGTGAGTTGAAGATACATGTGCGGTTGATTTTATCTGGCTAGGCTACGTATTTCTATTTTTTTTCTCCCTTTCTTTTCCTTTTTTCTTTTTTTTTTCTTTTTTCTTTTCTTCTTCTGTTTTTTTTTATTTTTTCTTTTCTTACTCTTTTACTCCAGTGCCTTCAGATGTCTTTTTCTTCGCATTTTCCATTCTTTTTTATTTTTCCTATTTCCTTTATTTCCTCTACCCATCGTTAATATCACTTTCGTTTACTTTCACGTTAGGTTATACCG
              +
              8@<A9EF-C@C<AC+@+B@,CF9C9,,CF,,,,,C,C,C,<,6,+BF+,ECEF,C@E,,;C,,,6,,;,,,,,,69,<,9,,8++:96,::,,6<CC,969,,9,+4,,,<9+++4+4<,,9:@4,<8<8<,,,,5,,<,8,:6++8,::B,,+5,:,:7>,,7,7@,,7,7@>3,,7,77,77,,,7,,7@@<DB>,,6***66@,42,66,6,6,,3*5++5+35++++++3++5+5+5+3+54+2+*+30**2**+**0*2::C***09*2*2*1***0*)))19*2)/*)
              @M00561:19:000000000-ABAUW:1:1101:15428:1464 1:N:0:9
              CACCACCTCTTTTCATGGTACCATTTGCACGCTCCAAACTTGCATAGTGACCTTTTTCGATTAATTGACCAAAGTCAACATTATAACCGTCCTTTTTTTCATCCCTCTTTTCCTCCTCTCTCTTCTCCTCCTCTCCCATCCCCCTATTCAACAGGCCATCTCGTTTTCCTTCTTCTTCTTTTTAACCAATATTTTCTTTTCTTTTCCTTCTTCTTTTTTCCTTTTCCTTTTTTTTTTTTCTCTCTTTTCTCTTCTTTTTTTCCCTCTTTTCTCCTTTTTTTCTTTCTTTTTTCTTCTTT
              +
              8-BB9FFGFFGGGG9F9-C<EE9FFFACAF,@@@@<89DFE9F9F,<C69CCFFEFFE+,<C,,<,,,<;,,,,<,,,9,,,,,,,<,+88@@C,,:++9,,<,:,9:,<<66994=,<:5:9C,<,5,4,94994,,,9:4+8++,,:,,,,,,4+,448:,:+6+:7,:8???@,7A<,,,,,,,,+,,,,,,,,,,,,33,,,,,,,,,,,,,,,+,,,,,,,,,,,,,*******,1++++++++++++++++++))+**+**0*0***0**/**)))********)))****-*
              @M00561:19:000000000-ABAUW:1:1101:12573:1467 1:N:0:9
              CAGCTTATCACCCCGGAATTGGTTTATCCGGAGATGGGGTCTTATGGCTGGAAGAGGCCAGCACCTTTTCTCCCTCCTTTTCTCTTCTGCCGGCCCTTTATATTCCACTCGTATTTTTTGTTTTCTTTCCCTTTCTTACTTTTAACCTCTTCTTGTCTCCTATGTGACCAGCCTCTATTTTTTATTATAATTTTGATAACGTTTGTCTGCTCTTTATCTCCTTCACTTCTTGTTACCTATTTTCTCTCTTCTTCGTGTTTTTAGTGCCTTGGTCTGCCGCAGCGGGCGTGCTTGTTGAC

              Cleaned FASTQ:


              @M00561:19:000000000-ABAUW:1:1101:10946:1435 1:N:0:9
              TCTCCCTTTTATCTT
              +
              CBCCCFFFFE9FAFF
              @M00561:19:000000000-ABAUW:1:1101:13609:1492 1:N:0:9
              TTGTAAAGCATCG
              +
              BC<CCCD<F9FF>
              @M00561:19:000000000-ABAUW:1:1101:17917:1554 1:N:0:9
              TGCTGGACCTGTG
              +
              6-AAB9EFGG8F,
              @M00561:19:000000000-ABAUW:1:1101:10142:1572 1:N:0:9
              TTACTGGCGTCCTTGCTTTCTCCTTC


              It appears to have truncated all my reads by a massive amount.

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #22
                Yikes!

                Can you post your command line for BBDuk? You are trimming the original files, correct (not merged one)?

                Comment

                • M4TTN
                  Member
                  • Jan 2014
                  • 77

                  #23
                  Sorry my fault. I wrote:

                  k=28 k=12

                  instead of

                  k=28 mink=12

                  New Output:


                  Memory: free=1041m, used=19m

                  Added 126482 kmers; time: 0.091 seconds.
                  Memory: free=1032m, used=28m

                  Input is being processed as paired
                  Started output streams: 0.008 seconds.
                  Processing time: 69.585 seconds.

                  Input: 2546118 reads 645049609 bases.
                  KTrimmed: 11853 reads (0.47%) 597009 bases (0.09%)
                  Result: 2546034 reads (100.00%) 644452600 bases (99.91%)

                  Time: 69.691 seconds.
                  Reads Processed: 2546k 36.53k reads/sec
                  Bases Processed: 645m 9.26m bases/sec

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #24
                    Much better. So a few had some adapters left over.

                    Comment

                    • M4TTN
                      Member
                      • Jan 2014
                      • 77

                      #25
                      If I understand it correctly adapter fragments shorter than 12 will still be left even after this cleaning process. Shoudl I specify a smaller value for mink than 12 to deal with this?

                      Or perhaps there is a different tool for trimming based upon the degree of overlap between read pairs?

                      i.e. IF overlap is less than read length THEN truncate reads to overlap length.

                      Comment

                      • M4TTN
                        Member
                        • Jan 2014
                        • 77

                        #26
                        Actually it looks like bbmerge can do this with the tbo flag:

                        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                        I wonder if there is any benefit to processing our reads through this pipeline prior to aligning to the reference?

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #27
                          Having reads free of extraneous sequence is going to benefit all downstream analysis. So you should do that trimming. Brian had mentioned about the tbo flag in his earlier post.

                          Comment

                          • M4TTN
                            Member
                            • Jan 2014
                            • 77

                            #28
                            In case anyone is interested, the FASTQ 2x300bp files cleaned up with bbduk with paramters of k=28 mink=12 resulted in two cleaned reads, which when realligned with Bowtie2 and then opened in SeqMonk create a read distribution that looks like this:



                            So clearly the residual adapter sequences are causing some issues with Bowtie correctly calling insert length, soemthign that is partially corrected by bbduk (but not fully probably due to the mink=12 parameter).

                            If we have time tomorrow, we'll redo with the tbo flag.

                            Comment

                            • M4TTN
                              Member
                              • Jan 2014
                              • 77

                              #29
                              Originally posted by GenoMax View Post
                              Having reads free of extraneous sequence is going to benefit all downstream analysis. So you should do that trimming. Brian had mentioned about the tbo flag in his earlier post.
                              OH dear. My apologies for not reading/understanding it all. Yes thanks to Brian and Genomax. It's been a tiring day, but thanks for all the help given to a complete novice!

                              Comment

                              • GenoMax
                                Senior Member
                                • Feb 2008
                                • 7142

                                #30
                                A tiring day perhaps, but I am glad it ended well.

                                Good Luck with the rest of your analysis.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Pathogen Surveillance with Advanced Genomic Tools
                                  by seqadmin




                                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                  03-24-2025, 11:48 AM
                                • seqadmin
                                  New Genomics Tools and Methods Shared at AGBT 2025
                                  by seqadmin


                                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                  The Headliner
                                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                  03-03-2025, 01:39 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-20-2025, 05:03 AM
                                0 responses
                                49 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-19-2025, 07:27 AM
                                0 responses
                                57 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-18-2025, 12:50 PM
                                0 responses
                                50 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-03-2025, 01:15 PM
                                0 responses
                                201 views
                                0 reactions
                                Last Post seqadmin  
                                Working...