Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tniranj1
    Junior Member
    • Mar 2009
    • 4

    #91
    You're probably right... I'm using for too old an operating system. Hopefully I will have results-oriented questions for you in the future, rathe than technical installation stuff. ;-)
    Thanks again,
    -TiN

    Comment

    • SillyPoint
      Member
      • May 2008
      • 39

      #92
      Ben, your answer to What Da Seq's question a couple of days ago about ambiguity characters was:

      A result of Bowtie's indexing strategy is that alignments involving one or more ambiguous reference characters (N, -, R Y, etc.) are considered invalid by Bowtie, regardless of the alignment policy. This is true only for ambiguous characters in the reference; alignments involving ambiguous characters in the read are legal, subject to the alignment policy.
      Do I interpret "legal, subject to the alignment policy" to mean they are accepted, but counted as mismatches subject to the -n limit?

      Thanks,

      --SP

      Comment

      • Ben Langmead
        Senior Member
        • Sep 2008
        • 200

        #93
        Yes, that's correct. An ambiguous characters in the read is "charged" as a mismatch, which can affect whether the alignment is legal according to the alignment policy.

        Comment

        • ieuanclay
          Member
          • Feb 2009
          • 27

          #94
          Hi Ben,

          Thanks for your advice earlier, post-processing of results is now working well!

          However I have found that using --all isn't a problem, until i specify --nostrata as well which causes me to rapidly run out of memory (>32Gb) and get the std::bad_alloc error I think a few people mentioned earlier. I had built indices with smaller -o, but even using a high offrate, small footprint index has the same problem. Any suggestions (other than not using --nostrata...)?

          THanks again,

          Ieuan

          Comment

          • Ben Langmead
            Senior Member
            • Sep 2008
            • 200

            #95
            Hi Ieuan,

            Can you tell me which Bowtie version/index/reads/arguments you're using? Also, could you give the same experiment a try with version 0.9.9.2 (just released)? I'll take a look.

            Thanks,
            Ben

            Comment

            • ieuanclay
              Member
              • Feb 2009
              • 27

              #96
              Hi Ben,

              I have seen this in 0.9.9 and 0.9.9.1 (x64), m.musculus ncbi36 and 37 indices (offrates 2,3,4,5). Args were -q --solexa-quals -a --unfq ... -p 2/3/6 . Input was only ~1/2 Mb of fastq reads! Worrying because the real input will be >2Gb! In all cases the combination of -a and --nostrata seemed to be causing the problem, because with only -a the footprint was as expected.

              I will try 0.9.9.2 today and get back to you - i checked yesterday and noticed you had improved the --best behaviour (thanks!) so i'll try it with that too.

              Thanks again,

              Ieuan

              ## update ##
              0.9.9.2 does not have the same problem, and has roughly the rsaem footprint for both -a and -a --nostrata. Any idea what the change was? Either way I am happy!
              Last edited by ieuanclay; 04-07-2009, 06:32 AM. Reason: update

              Comment

              • thondeboer
                Member
                • Jan 2009
                • 24

                #97
                Hi Ben,

                Complete Genomics here....
                Have you tried to use our gapped read structure yet with Bowtie? As you may know, we have quite an unusual read structure so most mapping software is not able to use this effectively and we have build our own, but our customers would probably want to use other mapping software as well if only to compare our mapping to theirs...

                The data is available in the SRA under number SRA008092

                ftp://ftp.ncbi.nlm.nih.gov/sra/Submi...008/SRA008092/

                You can also get a sample data set which is part of the API we have released.



                We are considering changing to the SAM/BAM format as the export of our mapping data...Are you considering supporting SAM/BAM as an output format as well?

                Thanks!

                Thon
                Thon
                __________________________________
                Thon de Boer, Ph.D.
                Director of Product Management, Software
                Strand Life Sciences
                548 Market Street, Suite 82804
                San Francisco, CA 94104, USA
                [email protected]
                www.strandls.com
                Pioneers in Discovery Research Informatics
                _______________________________________

                Comment

                • Ben Langmead
                  Senior Member
                  • Sep 2008
                  • 200

                  #98
                  Hey Thon,

                  We haven't tried implementing gapped alignment yet, though tools like BWA and SOAP2 show it's doable in this framework. Can you describe the "unusual read structure"?

                  Yes, we would certainly like to support SAM/BAM output eventually. It's on the TODO list!

                  Thanks,
                  Ben

                  Comment

                  • thondeboer
                    Member
                    • Jan 2009
                    • 24

                    #99
                    Hi Ben,

                    You can read more on our read structure on our website and on this forum as well:

                    Sequencing technologies without a commercially released platform (Oxford Nanopore, Halcyon Molecular, etc.)




                    But basically we have a gapped read structure of 5 + 10 + 10 + 10 (times two) bases.
                    The first gap is "negative" that is, has overlap between the 5 and 10 base reads.
                    The other gaps are positive, that is, gaps in the more classical sense.

                    You won't know the negative gap value (it can vary from 1 to 3 overlaps) unless you map the data (or unless there is only one way to overlap) onto the reference genome.

                    Good to hear you are in support of SAM/BAM. We are considering this as our export format as well...

                    Thon
                    Complete Genomics
                    Thon
                    __________________________________
                    Thon de Boer, Ph.D.
                    Director of Product Management, Software
                    Strand Life Sciences
                    548 Market Street, Suite 82804
                    San Francisco, CA 94104, USA
                    [email protected]
                    www.strandls.com
                    Pioneers in Discovery Research Informatics
                    _______________________________________

                    Comment

                    • Ben Langmead
                      Senior Member
                      • Sep 2008
                      • 200

                      Hi Ieuan,

                      Originally posted by ieuanclay View Post
                      0.9.9.2 does not have the same problem, and has roughly the rsaem footprint for both -a and -a --nostrata. Any idea what the change was? Either way I am happy!
                      --best mode got an overhaul in 0.9.9.2 such that --best now conducts a best-first search, rather than a depth-first search with buffering and flushing of results, as before. My suspicion is that the old approach was, for some reads, buffering a huge number of results and exhausting memory. I'll take a harder look, though.

                      Thanks,
                      Ben

                      Comment

                      • dara
                        Member
                        • Apr 2009
                        • 10

                        BOWTIE_BUILD: Problems when using with large reference genomes?

                        Hi all,

                        I've been trying to run bowtie using the human_genomic.fa file from blast db as reference. When I attempted to use Bowtie-build to break up this large file into indexes, I keep getting a 'Error: could not open human_genomic.fa' message.
                        I tried creating a file with just the first 10000 lines of the human genome and that works fine. I thought bowtie can easily handle such big reference files. Has anyone else faced this issue- any suggestions of how to overcome it?

                        Here's what I did: ./bowtie-build -f human_genomic.fa human_genom

                        thanks

                        Comment

                        • Ben Langmead
                          Senior Member
                          • Sep 2008
                          • 200

                          Hi dara,

                          How large is the human_genomic.fa file? Are you using 32-bit or 64-bit bowtie-build? I've not seen this before. Most versions of Linux and glibc can handle very large files with no problem.

                          I suspect that once you fix this problem, you'll run into the problem that Bowtie can only index reference sequences in chunks of about 3.6 Gbases or so. When you try to feed bowtie-build an input with too much sequence, it will say "Error: Reference sequence has more than 2^32-1 characters! Please divide the reference into batches or chunks of about 3.6 billion characters or less each and index each independently." This is because Bowtie uses 32-bit ints internally to refer to offsets in the index. We may fix this some day, but until then you'll have to work around this by indexing your reference in chunks.

                          Ben

                          Comment

                          • dara
                            Member
                            • Apr 2009
                            • 10

                            Hi Ben,

                            Thank you for your response. The file is a human genome download from blast- Its about 8.3 gb in size and I was using the default 32-bit version of bowtie-build. Alright I will try what you suggested- will split the genome (by chromosome maybe) and then feed those splits to the bowtie-build.

                            I will let you know if that causes any issues.

                            Thanks

                            Comment

                            • dara
                              Member
                              • Apr 2009
                              • 10

                              Hi Ben,

                              Once the reference file has been split into chunks, do they have to be made into seperate indexes? So, for example if I've split the reference into chrom1, chrom2 and chrom3, would I need to do:

                              ./bowtie-build -f chrom1 indexchrom1
                              ./bowtie-build -f chrom2 indexchrom2
                              ./bowtie-build -f chrom3 indexchrom3

                              If I build separate indexes, how would I call all of them when mapping with my reads file?

                              Thanks for your help
                              Last edited by dara; 05-07-2009, 06:25 AM. Reason: name

                              Comment

                              • dara
                                Member
                                • Apr 2009
                                • 10

                                Also another question for you:

                                Any updates on plans for bowtie supporting gapped alignment?

                                thanks

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...