Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Can you provide the exact command you are using? Your reads do have truseq adapters in them so the inserts may be smaller than you expected.

    Code:
    @M02344:9008:000000000-AJ1PF:1:1110:28244:16073 1:N:0:TGACCAAT+ATAGAGGC
    TCTGCCGTCATCGACTTCGAAGGTTCGAATCCTTCCCCTCTAACCACGGCCGAAATTCAATACCCGGATCAAGCTCAATTCGGGTCGAGGTCGGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGA[COLOR="Red"]GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATGTCGTATGCCGTCTTCTGCTTG[/COLOR]AAAAAAAATAAGTGGTGCGAAGAGAGCCTGTGGCCAACCTCATATGCGTGGAGATGTCTCG
    Last edited by GenoMax; 05-05-2016, 11:52 AM.

    Comment


    • #62
      In this case it looks like BBMerge's output is correct... as GenoMax said, you have adapter sequences indicating short inserts. Specifically, read1's first 126 bases exactly match BBMerge's output, and subsequently there is:
      AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATGTCGTATGCCGTCTTCTGCTTG
      ...a known Illumina adapter sequence, followed by AAAAAA which is common after the Illumina machine runs of the end of the adapter sequence and has no signal.

      No matter what you expect/design your insert size to be, shorter fragments will almost always be present.

      Comment


      • #63
        I am using:

        Code:
        bbmerge.sh in1=<read1> in2=<read2> out=<mergedreads> outu1=<unmerged1> outu2=<unmerged2> mismatches=0

        Comment


        • #64
          Thank you both for your help! Do you have any suggestions on how to set optional parameters to ensure that my merged file only contains sequences of my intended/designed insert size?

          Comment


          • #65
            You can postfilter it afterward:

            reformat.sh in=reads.fq out=filtered.fq minlength=202 maxlength=202

            But, bear in mind that you may be losing important data by doing so. For example, there could be a whole bunch of sequences that are 199bp long for some real biological reason (rather than problems with library prep). So, just be cautious.

            Comment


            • #66
              Thank you!

              Comment


              • #67
                Hello there! Is there a way we could generate a stats or log file with bbmerge?

                Comment


                • #68
                  Originally posted by shimingt View Post
                  Hello there! Is there a way we could generate a stats or log file with bbmerge?
                  Stats are automatically generated with each run of BBMerge. They look something like this

                  Code:
                  Pairs:                  2879431
                  Joined:                 2052925         71.296%
                  Ambiguous:              810015          28.131%
                  No Solution:            16491           0.573%
                  Too Short:              0               0.000%
                  
                  Avg Insert:             396.7
                  Standard Deviation:     98.1
                  Mode:                   415
                  
                  Insert range:           35 - 591
                  90th percentile:        524
                  75th percentile:        469
                  50th percentile:        402
                  25th percentile:        332
                  10th percentile:        262
                  You can capture STDOUT/STDERR with standard bash conventions to a file (2>&1) to get the log info.

                  Comment


                  • #69
                    Log File

                    Hello,

                    I am a novice with command prompt. I can't seem to get a log file.

                    This is my command prompt:

                    for x in *_R1_001.fastq;do echo bbmerge.sh -Xmx20g in1=$x in2=${x%_R1_001.*}_R2_001.fastq out=bbmerge\/${x%_*-*_R1_001.*}_R1R2_bbmerge.fastq>bbmerge\/${x%_*-*_L001_R1_001.*}_bbmerge_log.txt 2>&1 outu1=outu1\/$x outu2=outu2\/${x%_R1_001.*}_R2_001.fastq >> bbmerge.sh;done

                    Am I doing something wrong here?

                    There are log files but they are all empty?
                    Last edited by shimingt; 06-14-2016, 03:48 AM.

                    Comment


                    • #70
                      Try this (I am assuming that your variable expressions are correct). The log file and the redirect should be at the end of the command.

                      Code:
                      for x in *_R1_001.fastq;do your_bbmerge_command_along_with_options >bbmerge\/${x%_*-*_L001_R1_001.*}_bbmerge_log.txt 2>&1;done

                      Comment


                      • #71
                        Log file from bbmerge

                        Hello GenoMax,

                        Thanks for your reply.

                        I tried a command like this:

                        for x in *_R1_001.fastq;do echo bbmerge.sh -Xmx20g in1=$x in2=${x%_R1_001.*}_R2_001.fastq out=bbmerge\/${x%_*-*_R1_001.*}_R1R2_bbmerge.fastq outu1=outu1\/$x outu2=outu2\/${x%_R1_001.*}_R2_001.fastq > bbmerge\/${x%_*-*_L001_R1_001.*}_bbmerge_log.txt 2>&1 >> bbmerge.sh;done


                        The log files are generated according to the names, but the log files are empty.

                        Am I doing something wrong here?

                        Comment


                        • #72
                          Can you try

                          Code:
                          for x in *_R1_001.fastq;do bbmerge.sh -Xmx20g in1=$x in2=${x%_R1_001.*}_R2_001.fastq out=bbmerge\/${x%_*-*_R1_001.*}_R1R2_bbmerge.fastq outu1=outu1\/$x outu2=outu2\/${x%_R1_001.*}_R2_001.fastq > bbmerge\/${x%_*-*_L001_R1_001.*}_bbmerge_log.txt 2>&1;done

                          Comment


                          • #73
                            Dear Genomax,

                            Thanks for your help!

                            Comment


                            • #74
                              Hello,

                              how does BBMerge behave when the reads contain repetitive regions at the right end?

                              My amplicons are variable in length and derive from STR's, meaning that they are like:
                              (non-repetitive flanking region) - (tandem repeats) - (non-repetitive flanking region).
                              If the the amplicon is long enough, I could imagine that there is a case where paired reads overlap only in the repetitive region and thus multiple ways of merging are theoretically correct.
                              Until now, merging works fine. Can BBMerge ensure that merged reads are always consistent with the "real" amplicon sequence?

                              Sebastian

                              Comment


                              • #75
                                If reads overlap only in a repetitive region, they will not be merged. BBMerge looks at all the possible overlaps, and keeps track of the two top-scoring ones (based on length and match/mismatch ratio). If those two are close, the pair will be classified as "ambiguous" and not merged. However, that does not mean it will always be correct; say you have a tandem repeat of two copies, like this:

                                ARRB

                                ...where A and B are unique, and R is repeat. If the reads look like this:

                                1: ARR
                                2: RRB

                                ...then there are 2 good overlap frames, forming ARRB or ARRRB, so the merge will be rejected as ambiguous. But if the reads only span a single repeat each, like this:

                                1: AR
                                2: RB

                                ...then there will only be one apparently good overlap frame, and the reads will probably be merged incorrectly to form ARB. BBMerge's false-positive merge rate is extremely low, but it's not perfect. With shotgun data you can add the flag "rsem" to greatly reduce the chances of false-positive merges due to short tandem repeats, but that does not really work with amplicon data. You may want to simulate some data using your expected sequence to see what the actual behavior is.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                47 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X