Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    Would it be possible to post an image from IGV, or some kind of text diagram? I can't quite visualize what you're describing.

    Comment

    • babine
      Junior Member
      • Dec 2011
      • 7

      Sure Brian, here it is! A screenshot from Geneious on an alignment obtained with BBMap as

      Executing align2.BBMap [in1=177291_Brachy_300_Brachypodium_distachyon_2x_R1.fastq, in2=177291_Brachy_300_Brachypodium_distachyon_2x_R2.fastq, out=Bdi_1H_AK250130_mapped_bis.sam,
      ref=.\resources\Brachypodium_distachyon_2x_AH_AK250130.fasta, K=9, build=3, maxindel=30, pairlen=600, minid=0.9, padding=20, overwrite=t]

      "A picture is worth a thousand words"

      Best
      Attached Files

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        @babine: What do you mean by "not-so-accurate" reference? Is the reference as depicted in the picture wrong (i.e. there are two ends that are joined that should not be together)? Note: I don't use geneious but I assume one of the lines at the top represents the reference (and the other a consensus?)

        Comment

        • babine
          Junior Member
          • Dec 2011
          • 7

          @Genomax: yes that's exactly what I assume from the read profile. And yes the reference is the sequence with yellow background just below the consensus

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            I wonder if you should manually put a break (stretch on N's in that position) to force the two ends apart. While that may resolve this "end" of the reads issue not sure if that is how things are in the real genome.

            Are you certain your reads (R1/R2) do not overlap? If they could then you could merge them first before alignment.

            Are the reads in the image corresponding pairs (R1/R2)?

            Comment

            • Brian Bushnell
              Super Moderator
              • Jan 2014
              • 2709

              I wonder if maybe a polishing program like Quiver could automatically fix the reference for you. That would be the ideal solution. The problem, of course, is that there are misassemblies or major structural variations with respect to the reference. Hmmm...

              I think there are 3 main options.

              1) Map with the "local" flag. This will soft-clip the part of the read that extends across the misassembly, so it will not result in spurious variations being called. That's certainly the easiest approach.
              2) Try polishing the assembly with something like Quiver, to automatically fix these things.
              3) Make a new assembly, if this genome differs substantially enough from the reference that this issue is common.

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                @babine: Are you only "borrowing" the reference i.e. you do not have raw/original data for the "not-so-accurate" reference?

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  Hi everyone,

                  I wanted to mention that there is now some additional documentation for BBTools in the form of guides/tutorials, in the /docs/ folder. Currently there are guides for BBDuk, BBMerge, Seal, Tadpole, Reformat, Dedupe, BBNorm, and Taxonomy, and I plan to add more in the near future. There's also an overview of the general usage of all BBTools (UsageGuide.txt) and a list of all the commonly-used tools with a brief description (ToolDescriptions.txt).

                  I hope these will be useful, and please let me know if anything is unclear or needs to be expanded, or there is a common use that the guides don't address.

                  -Brian

                  Comment

                  • JulesWinchester
                    Junior Member
                    • Dec 2015
                    • 2

                    Hello, I am working with genomic data belonging to mammals. I have been aligning raw reads to all the genes within an organism's chromosomes. While working with this data set, I have noticed some odd values when calculating coverage with BBTool's built-in pileup feature. I would really appreciate it if anyone knew how to interpret this.

                    Here is the issue:

                    Data being used:
                    -CDS (W/ Introns, not spliced) of genes extracted from a Chromosome Genbank file (found @ NCBI ftp://ftp.ncbi.nlm.nih.gov/genomes/P...odytes/CHR_01/).
                    -Raw reads collected from ENA (Adapter free WGS sequences)

                    BBMap Commandline used to calculate coverage:
                    bbmap.sh in1=Chimp_1.fastq in2=Chimp_2.fastq ref=ChimpChrom1.fa local=t nodisk covstats=Chrom1Stats.txt covhist=Chrom1Hist.txt

                    Results for the chimp. The results for all other mammals I've worked with are very similar.
                    Coverage - 43.97
                    Standard Deviation - 507.47

                    I proceeded to look at the statistical output produced by BBMap to see if the SD values were caused by specific gene sequences; there I saw that a lot of genes had median folds equal to 0, others that had surpassed 1,000, and even some with negative values (-1). Why does this happen, is this normal? If not, is there a way to fix this?


                    I have attached the stats file produced by BBMap on this post if anyone would like to see how the output looks like. I took the liberty of parsing the file based on median fold values (equal to or more than 100), only keeping the columns belonging to gene ID and Median Fold. On a follow-up post, I will add attach a text file containing the low median fold values (anything below 100,).
                    Note: The coverage and SD tendency remains, even when using a reference fasta containing only the longest isoform per gene (1 isoform per gene).
                    Attached Files
                    Last edited by JulesWinchester; 12-19-2015, 02:46 AM.

                    Comment

                    • JulesWinchester
                      Junior Member
                      • Dec 2015
                      • 2

                      Here are the files I promised. I also attached the whole statistical output, in case anyone wants to see.

                      Thanks in advance!
                      Attached Files

                      Comment

                      • GenoMax
                        Senior Member
                        • Feb 2008
                        • 7142

                        @Brian: I am having trouble using dedupe.sh with paired end reads (using BBMap v.35.59).

                        Code:
                        Unknown parameter out1=sampleID_L001_R1_001.fastq.gz
                        
                             at jgi.Dedupe.<init>(Dedupe.java:383)
                        
                             at jgi.Dedupe.main(Dedupe.java:80)
                        command I am using is in the format you had posted

                        Code:
                        $ dedupe.sh in1=read1.fq in2=read2.fq out1=x1.fq out2=x2.fq ac=f
                        Single-end files work fine.

                        Comment

                        • Brian Bushnell
                          Super Moderator
                          • Jan 2014
                          • 2709

                          Thanks for finding that! Looks like a typo crept in; I'll fix that in the next release. In the mean time, you can use "out=" instead of "out1=".

                          Comment

                          • Brian Bushnell
                            Super Moderator
                            • Jan 2014
                            • 2709

                            Originally posted by JulesWinchester View Post
                            Hello, I am working with genomic data belonging to mammals. I have been aligning raw reads to all the genes within an organism's chromosomes. While working with this data set, I have noticed some odd values when calculating coverage with BBTool's built-in pileup feature. I would really appreciate it if anyone knew how to interpret this.
                            I don't see anything odd in general - high variances are not really unusual. But the -1 values are definitely problematic and should not happen. I'll investigate and get back to you.

                            Comment

                            • GenoMax
                              Senior Member
                              • Feb 2008
                              • 7142

                              Originally posted by Brian Bushnell View Post
                              Thanks for finding that! Looks like a typo crept in; I'll fix that in the next release. In the mean time, you can use "out=" instead of "out1=".
                              For now I only want an estimation of duplicates for some HS4000 paired end data so I will just omit "out=". I assume a single "out=" will result in an interleaved file (for PE input) that would have to be resolved afterwards?

                              Comment

                              • Brian Bushnell
                                Super Moderator
                                • Jan 2014
                                • 2709

                                Originally posted by GenoMax View Post
                                For now I only want an estimation of duplicates for some HS4000 paired end data so I will just omit "out=". I assume a single "out=" will result in an interleaved file (for PE input) that would have to be resolved afterwards?
                                "out" is synonymous with "out1". So if the input is paired, "out=x.fq" or "out1=x.fq" will produce interleaved output; "out=x1.fq out2=x2.fq" or "out1=x1.fq out2=x2.fq" would both produce dual-file output (once Dedupe parses "out1" correctly).

                                Comment

                                Latest Articles

                                Collapse

                                • GATTACAT
                                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by GATTACAT
                                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                  07-01-2026, 11:43 AM
                                • SEQadmin2
                                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by SEQadmin2


                                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                  Here are nine questions we think about, in roughly the order they matter, before...
                                  06-18-2026, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 07-02-2026, 11:08 AM
                                0 responses
                                12 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-30-2026, 05:37 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-26-2026, 11:10 AM
                                0 responses
                                20 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-17-2026, 06:09 AM
                                0 responses
                                54 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...