Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Originally posted by pmiguel View Post
    Hi Brian,
    That is interesting. Any reason you chose to add only 1/4th the quality value of one of the pair? The old phrap method was to just add the quality values together if bases from different strands agreed.

    That seemed reasonable enough -- if chance of one basecall being wrong was 1 in 100 (QV 20) but the basecall from the other strand agreed and also was QV 20, then the chance they are both wrong seems like it should be 1 in 10,000 (QV 40). Right?

    --
    Phillip
    Hi Phillip,

    Mathematically, if you have two completely independent observations, with perfectly accurate quality scores, the correct formula would be to add the scores together if the bases agree, and subtract them if they differ. However, I don't really like that approach since the quality scores are not perfectly accurate and the observations are not completely independent. For one thing, if a cluster is close enough to another cluster that there is interference during read 1, there will also be interference during read 2, with a similar nonrandom bias. Or if a cluster is near the edge of a lane for read 1 and thus slightly out of focus, it will be for read 2 as well. So even if the quality scores are accurate, both are affected by a similar bias rather than unrelated random biases, and thus strictly adding them is not appropriate.

    Furthermore, a lot of downstream programs or analyses may be calibrated to assume that the dominant source of error is sequencing reading error, and ignore other error modes, as they tend to be smaller. Thus Q40 reads may yield Q40 variants. But when you have a Q80 read, the error mode will be dominated by other things like PCR errors or unwanted chemical reactions - there's no way merging two Q40 reads will yield bases with a 1/100,000,000 error rate, even if you can absolutely ensure that the overlap frame was correct, which you can't.

    There is probably a better way to derive the quality than the simple equation I am using, but I like it because it is simple and gives useful results that can generally be used by tools that are designed for raw Illumina quality scores. Deriving an equation that correctly models all factors, or does a substantially better job overall, while still being simple enough that a person can understand the relationship between input and output, would be extremely difficult, I think.

    Comment


    • #47
      Originally posted by Brian Bushnell View Post
      Hi Phillip,

      [...]

      There is probably a better way to derive the quality than the simple equation I am using, but I like it because it is simple and gives useful results that can generally be used by tools that are designed for raw Illumina quality scores. Deriving an equation that correctly models all factors, or does a substantially better job overall, while still being simple enough that a person can understand the relationship between input and output, would be extremely difficult, I think.
      Well, you've convinced me! Thanks for the response, Brian.

      --
      Phillip

      Comment


      • #48
        BBMerge now has a new flag - "outa" or "outadapter". This allows you to automatically detect the adapter sequence of reads with short insert sizes, in case you don't know what adapters were used. It works like this:

        bbmerge.sh in=reads.fq outa=adapters.fa reads=1m

        Of course, it will only work for paired reads! The output fasta file will look like this:

        >Read1_adapter
        GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
        >Read2_adapter
        GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG

        If you have multiplexed things with different barcodes in the adapters, the part with the barcode will show up as Ns, like this:

        GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

        Comment


        • #49
          How to use merged reads

          Brian,

          Cool tool (as always)! I'm loving the Geneious integration for these tools. A+ move on your part.

          I'm performing my first merges on some raw and trimmed datasets and I had a very basic question.

          After merging my PE data, I will be left with some merged reads, and some unmerged (original PE) reads, correct? For de novo assembly, would I simply concatenate the merged reads onto the bottom of the 'Left' unmerged reads, then run Trinity with L (PE + merged) and R (PE R only)?

          Best,
          Bob

          Comment


          • #50
            That's an excellent question. I've never used Trinity. A post in their forum claimed that it will ignore the extra reads if you concatenate them to the end of one of the paired files. However, their faq claims the opposite:
            If you have additional singletons, add them to the .fq file that they correspond to based on the sequencing method used (if they’re equivalent to the left.fq entries, add them there, etc).
            So it sounds like probably you should, indeed, concatenate them at the end of the left reads. You can easily test this by using a single pair of reads (one read in the left and one in the right file), then concatenating all the unmerged reads onto the left file, then assembling. If if finishes almost immediately and nothing assembles, then it's ignoring the merged reads

            Comment


            • #51
              Hi,
              I have been using this program to merge reads but have not been able to generate a file with only the unmerged reads, any ideas on how to do this? thanks

              Comment


              • #52
                For interleaved reads:

                bbmerge.sh in=reads.fq out=merged.fq outu=unmerged.fq

                For reads in two files:

                bbmerge.sh in1=read1.fq in2=read2.fq out=merged.fq outu1=unmerged1.fq outu2=unmerged2.fq

                Comment


                • #53
                  "Created read stream of 0 reads"

                  I am trying to use BBMerge with the two files I have for one sample of MiSeq data. I am getting this error however:

                  "Executing jgi.BBMerge [in1=1_S1_L001_R1_001.fastq, in2=1_S1_L001_R2_001.fastq, out=S1merged, reads, outu1=S1unmerged1, outu2=S1unmerged2]

                  BBMerge version 8.9
                  Writing mergable reads merged.
                  Unspecified format for output S1merged; defaulting to fastq.
                  Unspecified format for output S1unmerged1; defaulting to fastq.
                  Unspecified format for output S1unmerged2; defaulting to fastq.
                  Started output threads.
                  crisG: Warning - created a read stream for 0 reads.
                  Exception in thread "main" java.lang.AssertionError
                  at stream.ConcurrentGenericReadInputStream.<init>(ConcurrentGenericReadInputStream.java:129)
                  at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:129)
                  at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:51)
                  at jgi.BBMerge.runPhase(BBMerge.java:957)
                  at jgi.BBMerge.process(BBMerge.java:721)
                  at jgi.BBMerge.main(BBMerge.java:48)"

                  Why might it be saying there are 0 reads? I have opened it with FastQC and there are reads in the file.

                  Thanks

                  Comment


                  • #54
                    Sounds like the files are probably misformatted. Can you run "head 1_S1_L001_R1_001.fastq" and "head 1_S1_L001_R2_001.fastq" and post the output here?

                    Comment


                    • #55
                      Originally posted by Brian Bushnell View Post
                      Sounds like the files are probably misformatted. Can you run "head 1_S1_L001_R1_001.fastq" and "head 1_S1_L001_R2_001.fastq" and post the output here?
                      Code:
                      head 1_S1_L001_R1_001.fastq
                      @M01936:151:000000000-AL2M3:1:1101:16824:1851 1:N:0:1
                      TCGATCGCTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCACTAGAGATATCGCCCTTCGGGGCTCTCATCCCTTCTATCACACCCCTGTGCACTTTGGCCACCGCTTCATTGCGGTGTGTCTT
                      +
                      BCCACFCCFCCCGGGGGGGGGGGHGHHHGHGHHHGGGAE53CGHHHHFGHGGGHHHGHHHHHGGGGGGHHHHHFB3335FHHHGGGAEHFA11E/EGHHHHHHGHGFHHHHFHHFDEDFAFGHHHHHHFEGCFFCFGGGGHHHGFFGDDDDHHHH
                      @M01936:151:000000000-AL2M3:1:1101:18467:1911 1:N:0:1
                      TCGATCGCTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCACTAGAGATATCGCCCTTCGGGGCTCTCATCCCTTCTATCACACCCCTGTGCACTTTGGCCACCGCTTCATTGCGGTGTGTCTT
                      +
                      BCCDCFCCFDCDGGGGGGGGGGHHHHHHHHHHHHGGHHGHGHHHHHHHGHHGGHHHHHHHHHGFGGGGHHHHFHHHE35GHHHFGGGGHHEE1E/EHHHHHHFHHGHHHHHHHHHGGCGEGHHHHHHHHGHHHGHGGGGGHHHGHGGGGGGHHHH
                      @M01936:151:000000000-AL2M3:1:1101:14252:2010 1:N:0:1
                      TCGATCGCTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCACTAGAGATATCGCCCTTCGGGGCTCTCATCCCTTCTATCACACCCCTGTGCACTTTGGCCACCGCTTCATTGCGGTGTGTCTT
                      Code:
                      1_S1_L001_R2_001.fastq 
                      @M01936:151:000000000-AL2M3:1:1101:16824:1851 2:N:0:1
                      CGGACTTGATGTACGAGCTGCGTTCTTCATCGATGCGAGAACCTAGAGATCCGTTGTTGAAAGTTATTTTAGTTTATAAGTTTTTACATTCAATGACTTGTGTTTATAGGTATGGTATAGTAAAAAGACACACCGCAATGAAGCGGTGGCCAAAG
                      +
                      33>>ABFFFFFFGGGGGGGGGFGGGGHGGHGFBGHGGCGFGB2B3333DFFEGFH1AFF555BFHHHHHG53@BF5FGGHHHHHEHDGHHHHBHHFHGGHHFHHHBFHFFHHHHFGFHEGF?GFFGEFHGFFFECCEGDFFGHGG?FCDCFG/FG
                      @M01936:151:000000000-AL2M3:1:1101:18467:1911 2:N:0:1
                      CGGACTCGATGTACGAGCTGCGTTCTTCATCGATGCGAGAACCTAGAGATCCGTTGTTGAAAGTTATTTTAGTTTATAAGTTTTTACATTCAATGACTTGTGTTTATAGGTATGGTATAGTAAAAAGACACACCGCAATGAAGCGGTGGCCAAAG
                      +
                      3>>AABFCBFFFGGGGGGGGGGGGGGHHHHGHHHHGGGGGGGFHGH335GHHHFHCGHGHG5B5DHHHHHHFHFHGGHGHFHHHGHHHFFHHFHHHHHHHHFHHHHHHHGFGHHHHHHHHBGFHHGEGHHHHGFG?DGCHHHEHGCGGGFHHAGB
                      @M01936:151:000000000-AL2M3:1:1101:14252:2010 2:N:0:1
                      CGGACTTGATGTACGAGCTGCGTTCTTCATCGATGCGAGAACCTAGAGATCCGTTGTTGAAAGTTATTTTAGTTTATAAGTTTTTACATTCAATGACTTGTGTTTATAGGTATGGTATAGTAAAAAGACACACCGCAATGAAGCGGTGGCCAAAG
                      Last edited by Brian Bushnell; 01-18-2016, 09:47 AM. Reason: Formatting

                      Comment


                      • #56
                        Hmmm, that's very strange. They look fine to me. For some reason it thinks you set the max number of reads to zero. Can you try adding the flag "reads=-1"?

                        Comment


                        • #57
                          Originally posted by Brian Bushnell View Post
                          Hmmm, that's very strange. They look fine to me. For some reason it thinks you set the max number of reads to zero. Can you try adding the flag "reads=-1"?
                          Thank you very much, it worked fine with the flag added

                          Comment


                          • #58
                            What is the difference/ between ftr and ftr2?
                            Does ftr apply to the forward read and ftr2 to the reverse read?

                            Thanks in advance!

                            forcetrimright=0 (ftr) If nonzero, trim right bases of the read
                            after this position (exclusive, 0-based).
                            forcetrimright2=0 (ftr2) If positive, trim this many bases on the right end.

                            Comment


                            • #59
                              Hi Luc,

                              Both of them apply to both reads. Let's say that you have 100bp reads in and you want to trim them to 70bp by trimming the right end. You can do that in two ways - either "ftr2=30" which trims the last 30bp, or "ftr=69" which trims all bases after position 69 (zero-based). The reason both options are present is because they behave differently when reads have variable lengths.

                              Comment


                              • #60
                                Truncated merges

                                Hello,

                                I have noticed that some of my merged sequences, after using BBMerge, are truncated. My R1/R2 input sequences are 251 bp long (paired-end 250 bp sequencing). The overlap/insert, which I expect to be merged from the R1/R2 files, is 202 bp long. Although many of my merged sequences are the expected 202 bp long, a handful are much shorter.

                                The only optional parameter I have changed is setting mismatches=0, as I do not want any discrepancies in the overlap.

                                Here is an example:
                                Code:
                                Raw Data: R1 and R2 files (fastq.gz)
                                R1 File: 251 nucleotides
                                @M02344:9008:000000000-AJ1PF:1:1110:28244:16073 1:N:0:TGACCAAT+ATAGAGGC
                                TCTGCCGTCATCGACTTCGAAGGTTCGAATCCTTCCCCTCTAACCACGGCCGAAATTCAATACCCGGATCAAGCTCAATTCGGGTCGAGGTCGGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATGTCGTATGCCGTCTTCTGCTTGAAAAAAAATAAGTGGTGCGAAGAGAGCCTGTGGCCAACCTCATATGCGTGGAGATGTCTCG
                                
                                R2: 251 nucleotides
                                @M02344:9008:000000000-AJ1PF:1:1110:28244:16073 2:N:0:TGACCAAT+ATAGAGGC
                                CGGAGCCTATGGAAAAACGCCAGCAACGCGGCCCGACCTCGACCCGAATTGAGCTTGATCCGGGTATTGAATTTCGGCCGTGGTTAGAGGGGAAGGATTCGAACCTTCGAAGTCGATGACGGCAGAAGATCGGAAGAGCGTCGTGTAGGGAAGGAGTGGGCCTAGATGTGTGGGTCGCGGTGGTGGCCGGATCGATGAAGAAAGTCTGGTGCTGGTGTGTGTGACGCCAGGGCGTCAGCAGTGGTGATGTG
                                
                                ------------------------------------------------------------------------------------------------------------------------------------------------------
                                
                                BBMerge Output: 126 nucleotides
                                @M02344:9008:000000000-AJ1PF:1:1110:28244:16073 1:N:0:TGACCAAT+ATAGAGGC
                                TCTGCCGTCATCGACTTCGAAGGTTCGAATCCTTCCCCTCTAACCACGGCCGAAATTCAATACCCGGATCAAGCTCAATTCGGGTCGAGGTCGGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCG
                                
                                ------------------------------------------------------------------------------------------------------------------------------------------------------
                                Thank you for your help!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Genetic Variation in Immunogenetics and Antibody Diversity
                                  by seqadmin



                                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                  11-06-2024, 07:24 PM
                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 11-08-2024, 11:09 AM
                                0 responses
                                35 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-08-2024, 06:13 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-01-2024, 06:09 AM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-30-2024, 05:31 AM
                                0 responses
                                23 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X