Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by sdvie View Post
    I am suprised, that bowtie seems to be so memory-intensive in your case... especially with relatively few reads. Did you use a particularly large genome?

    cheers,
    Sophia
    I am using the apple genome which is about 750 mb.

    Actually, I tried the mapping several times. At the beginning, the size of the output file kept increasing and the Bowtie command only took about 2 GB RAM. After several minutes, the size stopped increasing but the RAM used by Bowtie command rise steadily to reach about 30 GB and then froze there.

    Another interesting thing is that there is a difference between output file sizes when I specified different number of cores used. i.e. the output file size is 2.66 GB when I used option '-p 10', but 2.38 GB without this option. There are about 3 M aligns different. How can this happen?
    Last edited by harrike; 05-09-2011, 07:27 AM.

    Comment


    • Try removing the command line option -a You are going to report ALL possible alignments for all reads. if you have repetitive sequences, this could be causing the memory problem. I would set a max number of matches to some high but useful value such as 10 or 30 or 40. Try that.

      Jim

      Comment


      • Originally posted by harrike View Post
        I am trying to use Bowtie to map about 300,000 reads to my reference. I use the command: bowtie -a -v 0 -p 10 -t INDEX_FILE -f READS_FILE.fasta > RESULT_FILE --un unmapped.txt.
        using the output direction via '>' seems a bit unscrupulous, what do you want to writ in the RESULT_FILE, or, why are you using the '>'?, I would consider a more direct something like:

        Code:
        bowtie -a -v 0 -p 10 -t INDEX_FILE -f READS_FILE.fasta  --un unmapped.txt
        A few days ago, I used the output direction to capture output from the --verbose flag of bowtie and in less than 8 hours, 14 GBs of space went waste compared to obtaining only a 188 MB of results in the alignment file...

        Comment


        • Hi. I wonder if anyone can help me, as I think bowtie (0.12.7) is misbehaving.

          I'm trying to map reads to a sequence with a short duplicated stretch. The problem is that given a read that should clearly map to one repeat (i.e. it has some unique sequence flanking the repeat) sometimes maps to the wrong repeat instead.

          For instance, given the read pair

          Code:
          @HWI-ST568_0055:8:1106:17676:67081#GCCAAT/1
          ATTGCTGAAGAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTA
          +HWI-ST568_0055:8:1106:17676:67081#GCCAAT/1
          ggggggggggggggggggggggegeeggegedgeegeegggggdgegdge
          
          @HWI-ST568_0055:8:1106:17676:67081#GCCAAT/2
          GATATCCTGTTTGGCCCATATTCAGCTGTTCCATCTGTTCTTGGCCCTGA
          +HWI-ST568_0055:8:1106:17676:67081#GCCAAT/2
          ggggggggggggggggggggggggggggggggggggggggggggggbgge
          if I run

          Code:
          bowtie -q --solexa1.3-quals -v 3 --minins 100 --maxins 450 --best -k 1 -t -p 8 index_name -1 testB1.fq -2 testB2.fq
          I get

          Code:
          HWI-ST568_0055:8:1106:17676:67081#GCCAAT/1      +       seq_id      5188   ATTGCTGAAGAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTA       HHHHHHHHHHHHHHHHHHHHHHFHFFHHFHFEHFFHFFHHHHHEHFHEHF      0
          HWI-ST568_0055:8:1106:17676:67081#GCCAAT/2      -       seq_id      5458   TCAGGGCCAAGAACAGATGGAACAGCTGAATATGGGCCAAACAGGATATC       FHHCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH      0       40:G>A,43:T>C,46:A>G
          Those three mismatches should (?) make this alignment not show up, given that there is another site where this could align with no mismatches. Even stranger, if I run without the --best option

          Code:
          bowtie -q --solexa1.3-quals -v 3 --minins 100 --maxins 450 -k 1 -t -p 8 index_name -1 testB1.fq -2 testB2.fq
          I do get the "right" answer

          Code:
          HWI-ST568_0055:8:1106:17676:67081#GCCAAT/1      +       seq_id      5188   ATTGCTGAAGAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTA       HHHHHHHHHHHHHHHHHHHHHHFHFFHHFHFEHFFHFFHHHHHEHFHEHF      0
          HWI-ST568_0055:8:1106:17676:67081#GCCAAT/2      -       seq_id      5533   TCAGGGCCAAGAACAGATGGAACAGCTGAATATGGGCCAAACAGGATATC       FHHCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH      0
          Anyway, I recognize this is probably a solved problem, but I'm having a tough time understanding what's going on, so if anybody could help me understand what's up, I'd be really grateful.

          Comment


          • Using "Eland" input format in Bowtie

            Like Bowtie !
            Last edited by medalofhonour; 07-19-2011, 09:17 AM.

            Comment


            • Hi,
              I have followed the tread here but finally confused what option to use.

              I have a FASTA file with illumina small RNAs that are clipped, filtered and cleaned, also collapsed in unique seqs ready for mapping.

              I want to map than onto human genome but do not know that is optimal in my situation - I want to know many times a read is mapping onto the genome.
              In this case I used -a -v 0 and -a -v 1. My concerns for -v 1 is that do not know if I allow 1 mismatch a read can map also in a place that is not real? In the opposite the concern about --v 0 is that I get only 30% of the uniq seqs aligned?
              Last edited by vebaev; 08-10-2011, 02:59 PM.
              ------------
              SMART - bioinfo.uni-plovdiv.bg

              Comment


              • I am new to this, but it seems to me that if you allow mismatches, you absolutely can get alignments that aren't real. You can also get alignments that aren't real if you don't allow mismatches!

                There are several sources of false-positives and false-negative alignments. The reference sequence you are aligning to is the consensus from probably many replicates of a particular lineage of organism. Your experimental sequences may come from a slightly different lineage of organism with a slightly different genome. If you do not allow mismatches, you will miss valid alignments that differ only by an expected polymorphic site.

                There are also several sources of error in the sequencing itself. If you're using an illumina machine, there are at least four sources of error that may mis-call a base in the sequence. If you don't allow mismatches, those reads that have an error in sequencing might not align to your genome at all.

                On the other hand, if you allow mismatches, your reads may align to several places on the genome, and how do you know which one is valid? There is a really no good answer. You could do some further processing and only consider reads that land inside exons of known genes. Or maybe you want to allow mismatches but only use those reads that match a single place on the genome.

                In our experiment we are starting with the most conservative assumptions and slowly loosening the criteria as we gain more confidence in our methodology. So we only consider reads that match perfectly against mm9 genome and which fall inside of known exons with a coverage of at least 10 reads. We'll start to loosen the criteria and see how that affects our results.

                Comment


                • hi cswarth
                  You are quite right!
                  My main concerns are for example in this case:
                  I want to annotate where in the genome are mapping 2 reads. If I do not allow mistmaches the first read will have 1 hit in intron and the second will not align to the genome at all. In the option with 1 mismatch the first read will map in the intron perfectly and in intergenic region with 1 mismatch, in other hand now the second read can map to the genome in one place as mismatching is allowed.
                  In the second scenario we are happy because the secong read can align, but then how to annotate the first read which hits are increased

                  If you followed me my point is that if I want to map more reads that cannot map with zero mismatches I will lose the "sensitivity" of my reads that are already mapped

                  I hope you got it
                  Last edited by vebaev; 08-10-2011, 04:06 PM.
                  ------------
                  SMART - bioinfo.uni-plovdiv.bg

                  Comment


                  • Hi, again
                    as I told before I'm trying to map my cleaned reads to hg19

                    If I use -a -v 0 my output is like 2GB and I see that many seq with low read counts like 1 or 2 can align ten of thousands of time onto human genome?! and it is messy...

                    I can use the option -k 100 -v 0, but If I want to know how many times a seq is mapping in the genome how to be sure as I artifivially put a threshold?
                    As I want to annotate also repeat-assosiated and other RNAs how to do that and escape from the mess of the above?

                    or beter to discard these by -m 100?

                    Best
                    Last edited by vebaev; 08-11-2011, 09:45 AM.
                    ------------
                    SMART - bioinfo.uni-plovdiv.bg

                    Comment


                    • Additional Index information

                      Hi,

                      i try to analyse Bowtie for using GPGPUs through CUDA. Next to the limited Hardware ressources, I have one big problem. It seems that Bowtie relies on structs, using C++ datatypes (please correct me if I'm wrong), but i need C compatible datatypes to get them on the device memory (global memory of the graphic card) and also to work with.
                      On my walkthrough I noticed that the first bytes are used to store some extra information for the ebwt_params struct, but:

                      How do I get the BWT?
                      How is it stored? (I think either uint32 or uint64)
                      How do i "read" the nc values (0,1,2,3) from that?

                      Are there any additional information available how the files built? (Any files, slides,.. are welcome..)

                      The plan:
                      read the index file with my own code and store it into C compatible Datatypes, get them to the device and try to make an exact alignment on GPU.

                      Thank you
                      mic

                      Comment


                      • Originally posted by [mic] View Post
                        Hi,

                        i try to analyse Bowtie for using GPGPUs through CUDA. Next to the limited Hardware ressources, I have one big problem. It seems that Bowtie relies on structs, using C++ datatypes (please correct me if I'm wrong), but i need C compatible datatypes to get them on the device memory (global memory of the graphic card) and also to work with.
                        On my walkthrough I noticed that the first bytes are used to store some extra information for the ebwt_params struct, but:

                        How do I get the BWT?
                        How is it stored? (I think either uint32 or uint64)
                        How do i "read" the nc values (0,1,2,3) from that?

                        Are there any additional information available how the files built? (Any files, slides,.. are welcome..)

                        The plan:
                        read the index file with my own code and store it into C compatible Datatypes, get them to the device and try to make an exact alignment on GPU.

                        Thank you
                        mic
                        If you are good at programming, you can check the source code of bow tie_build.
                        Xi Wang

                        Comment


                        • Originally posted by Xi Wang View Post
                          If you are good at programming, you can check the source code of bow tie_build.
                          I still tried, but the code is very nested, which makes it difficult for me to get the all-over-picture. I would be grateful if someone can help me.
                          Last edited by [mic]; 09-19-2011, 05:55 AM.

                          Comment


                          • BOWTIE, shortreads with different length

                            Hi,

                            Just tried out BOWTIE today. May I know if BOWTIE supports the mapping for shortreads of different lengths? (e.g:for r1/#1 I have 96 bp whereas for the r1/#2, i have 86 bp.) The shortreads was trimmed with a different software prior to the alignment.

                            My bowtie version is 0.12.7

                            Thanks in advance!

                            Comment


                            • bowtie 0.12.7 & SOLiD PE reads

                              Hi all,
                              There is the problem for bowtie 0.12.7 & SOLiD mate pair reads.
                              bowtie (-C -f -I 1000 -X 4000 --ff <ebwt> -1 F3.csfasta -2 R3.csfasta ) maps 0.0%, while SOLiD`s Bioscope maps about 70%.
                              Insert size is about 2500.
                              Colorspace index is OK. Synthetic csfasta reads are mapped well by bowtie. Separately F3 or R3 are mapped well.
                              What is could be wrong? Is the problem of bowtie or mate pair reads?

                              cheers
                              Last edited by belmax; 09-30-2011, 12:50 AM.

                              Comment


                              • bowtie -e (--maqerr) parameter

                                Hi all,

                                According to the bowtie manual and some posts I've read here, the -e/--maqerr <int> option indicates the maximum sum of quality scores allowed at the mismatched bases throughout the entire alignment and as such can control the total number of mismatches over the entire read length.

                                I understand that the higher this option will be, the higher number of alignments I will obtain. But I still have trouble understanding the logic behind this parameter. Indeed let's say I set -e 70 with --nomaqround.
                                A read with an overall high quality (for ex. each of its base has a Phred score of 38) and 3 mismatched bases to the reference sequence will be excluded from the alignment, since (38 * 3) > 70. While another read with an overall poor quality (for instance, having a Phred score of 10 for each of its bases) and 5 mismatches will be kept, since (10 * 5) < 70. But if we suppose that bases with low quality have higher chance to be sequencing errors than true variations, I'd rather exclude the latter read and keep the former one... (No ?)

                                If anyone could help me understand this parameter and its usage I would be very grateful.

                                Cheers

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X