Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    You're probably right... I'm using for too old an operating system. Hopefully I will have results-oriented questions for you in the future, rathe than technical installation stuff. ;-)
    Thanks again,
    -TiN

    Comment


    • #92
      Ben, your answer to What Da Seq's question a couple of days ago about ambiguity characters was:

      A result of Bowtie's indexing strategy is that alignments involving one or more ambiguous reference characters (N, -, R Y, etc.) are considered invalid by Bowtie, regardless of the alignment policy. This is true only for ambiguous characters in the reference; alignments involving ambiguous characters in the read are legal, subject to the alignment policy.
      Do I interpret "legal, subject to the alignment policy" to mean they are accepted, but counted as mismatches subject to the -n limit?

      Thanks,

      --SP

      Comment


      • #93
        Yes, that's correct. An ambiguous characters in the read is "charged" as a mismatch, which can affect whether the alignment is legal according to the alignment policy.

        Comment


        • #94
          Hi Ben,

          Thanks for your advice earlier, post-processing of results is now working well!

          However I have found that using --all isn't a problem, until i specify --nostrata as well which causes me to rapidly run out of memory (>32Gb) and get the std::bad_alloc error I think a few people mentioned earlier. I had built indices with smaller -o, but even using a high offrate, small footprint index has the same problem. Any suggestions (other than not using --nostrata...)?

          THanks again,

          Ieuan

          Comment


          • #95
            Hi Ieuan,

            Can you tell me which Bowtie version/index/reads/arguments you're using? Also, could you give the same experiment a try with version 0.9.9.2 (just released)? I'll take a look.

            Thanks,
            Ben

            Comment


            • #96
              Hi Ben,

              I have seen this in 0.9.9 and 0.9.9.1 (x64), m.musculus ncbi36 and 37 indices (offrates 2,3,4,5). Args were -q --solexa-quals -a --unfq ... -p 2/3/6 . Input was only ~1/2 Mb of fastq reads! Worrying because the real input will be >2Gb! In all cases the combination of -a and --nostrata seemed to be causing the problem, because with only -a the footprint was as expected.

              I will try 0.9.9.2 today and get back to you - i checked yesterday and noticed you had improved the --best behaviour (thanks!) so i'll try it with that too.

              Thanks again,

              Ieuan

              ## update ##
              0.9.9.2 does not have the same problem, and has roughly the rsaem footprint for both -a and -a --nostrata. Any idea what the change was? Either way I am happy!
              Last edited by ieuanclay; 04-07-2009, 06:32 AM. Reason: update

              Comment


              • #97
                Hi Ben,

                Complete Genomics here....
                Have you tried to use our gapped read structure yet with Bowtie? As you may know, we have quite an unusual read structure so most mapping software is not able to use this effectively and we have build our own, but our customers would probably want to use other mapping software as well if only to compare our mapping to theirs...

                The data is available in the SRA under number SRA008092

                ftp://ftp.ncbi.nlm.nih.gov/sra/Submi...008/SRA008092/

                You can also get a sample data set which is part of the API we have released.



                We are considering changing to the SAM/BAM format as the export of our mapping data...Are you considering supporting SAM/BAM as an output format as well?

                Thanks!

                Thon
                Thon
                __________________________________
                Thon de Boer, Ph.D.
                Director of Product Management, Software
                Strand Life Sciences
                548 Market Street, Suite 82804
                San Francisco, CA 94104, USA
                [email protected]
                www.strandls.com
                Pioneers in Discovery Research Informatics
                _______________________________________

                Comment


                • #98
                  Hey Thon,

                  We haven't tried implementing gapped alignment yet, though tools like BWA and SOAP2 show it's doable in this framework. Can you describe the "unusual read structure"?

                  Yes, we would certainly like to support SAM/BAM output eventually. It's on the TODO list!

                  Thanks,
                  Ben

                  Comment


                  • #99
                    Hi Ben,

                    You can read more on our read structure on our website and on this forum as well:

                    Sequencing technologies without a commercially released platform (Oxford Nanopore, Halcyon Molecular, etc.)




                    But basically we have a gapped read structure of 5 + 10 + 10 + 10 (times two) bases.
                    The first gap is "negative" that is, has overlap between the 5 and 10 base reads.
                    The other gaps are positive, that is, gaps in the more classical sense.

                    You won't know the negative gap value (it can vary from 1 to 3 overlaps) unless you map the data (or unless there is only one way to overlap) onto the reference genome.

                    Good to hear you are in support of SAM/BAM. We are considering this as our export format as well...

                    Thon
                    Complete Genomics
                    Thon
                    __________________________________
                    Thon de Boer, Ph.D.
                    Director of Product Management, Software
                    Strand Life Sciences
                    548 Market Street, Suite 82804
                    San Francisco, CA 94104, USA
                    [email protected]
                    www.strandls.com
                    Pioneers in Discovery Research Informatics
                    _______________________________________

                    Comment


                    • Hi Ieuan,

                      Originally posted by ieuanclay View Post
                      0.9.9.2 does not have the same problem, and has roughly the rsaem footprint for both -a and -a --nostrata. Any idea what the change was? Either way I am happy!
                      --best mode got an overhaul in 0.9.9.2 such that --best now conducts a best-first search, rather than a depth-first search with buffering and flushing of results, as before. My suspicion is that the old approach was, for some reads, buffering a huge number of results and exhausting memory. I'll take a harder look, though.

                      Thanks,
                      Ben

                      Comment


                      • BOWTIE_BUILD: Problems when using with large reference genomes?

                        Hi all,

                        I've been trying to run bowtie using the human_genomic.fa file from blast db as reference. When I attempted to use Bowtie-build to break up this large file into indexes, I keep getting a 'Error: could not open human_genomic.fa' message.
                        I tried creating a file with just the first 10000 lines of the human genome and that works fine. I thought bowtie can easily handle such big reference files. Has anyone else faced this issue- any suggestions of how to overcome it?

                        Here's what I did: ./bowtie-build -f human_genomic.fa human_genom

                        thanks

                        Comment


                        • Hi dara,

                          How large is the human_genomic.fa file? Are you using 32-bit or 64-bit bowtie-build? I've not seen this before. Most versions of Linux and glibc can handle very large files with no problem.

                          I suspect that once you fix this problem, you'll run into the problem that Bowtie can only index reference sequences in chunks of about 3.6 Gbases or so. When you try to feed bowtie-build an input with too much sequence, it will say "Error: Reference sequence has more than 2^32-1 characters! Please divide the reference into batches or chunks of about 3.6 billion characters or less each and index each independently." This is because Bowtie uses 32-bit ints internally to refer to offsets in the index. We may fix this some day, but until then you'll have to work around this by indexing your reference in chunks.

                          Ben

                          Comment


                          • Hi Ben,

                            Thank you for your response. The file is a human genome download from blast- Its about 8.3 gb in size and I was using the default 32-bit version of bowtie-build. Alright I will try what you suggested- will split the genome (by chromosome maybe) and then feed those splits to the bowtie-build.

                            I will let you know if that causes any issues.

                            Thanks

                            Comment


                            • Hi Ben,

                              Once the reference file has been split into chunks, do they have to be made into seperate indexes? So, for example if I've split the reference into chrom1, chrom2 and chrom3, would I need to do:

                              ./bowtie-build -f chrom1 indexchrom1
                              ./bowtie-build -f chrom2 indexchrom2
                              ./bowtie-build -f chrom3 indexchrom3

                              If I build separate indexes, how would I call all of them when mapping with my reads file?

                              Thanks for your help
                              Last edited by dara; 05-07-2009, 06:25 AM. Reason: name

                              Comment


                              • Also another question for you:

                                Any updates on plans for bowtie supporting gapped alignment?

                                thanks

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                57 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                56 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X