Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using PET files as SET files in bowtie

    Hello - thanks for bowtie - I like it and the output is handy for me to analyse.

    I have a bit of odd behavior to report that I can't understand or figure out. I have lots of little contigs (100-1000 bp) that I am aligning against and I have both SET and PET files.

    When I align the SET against the short contigs, everything works great. <example command follows>

    ./bowtie -f shortcontigs_index lane1.fa lane1vreference.map

    When I align both files for the PET data, everything works great but obviously my results are strongly biased towards those pairs which are very close together and many of the alignments are rejected because one of the pairs is sticking out into 'space'...

    ./bowtie -f shortcontigs_index -1 lane1_1.fa -2 lane1_2.fa lane1vreference.map

    When I try to use one of the PET files as a singles file, bowtie runs for just a second, usually reporting that one of my reads is less than 2 base pairs long and then quits.

    ./bowtie -f shortcontigs_index lane1_1.fa lane1vreference.map

    Does bowtie somehow detect that the original file is a PET file and will not let me run it by itself?

    Comment


    • more on using PET as SET files in bowtie

      Hi - I just stripped all of the >tags off the reads and used one of the PET pairs as a -r raw file and it works fine...

      so, I guess that bowtie is detecting that the data is supposed to be PET from the >tag info?

      Comment


      • Hi Chuck,

        When running in unpaired mode, Bowtie doesn't try to detect whether a file is part of a pair or not. It simply treats it as a plain-old unpaired fasta file. Have you checked to see whether any of the mates really are 1-bp in that file? Are there any other peculiarities in how that file is formatted?

        If neither of those are the issue, could you let me borrow that file so I can try to diagnose myself?

        Thanks,
        Ben

        Comment


        • PET as SET

          Hi Ben,

          I've tried this for a number of different files and the result is always the same.

          Yes, there are reads that only have a single base but in PET mode, it skips them. There is a long list of errors as it rejects short reads but it does the alignment job.

          In singles mode, it seems to hit the first error and quit.

          Perhaps that is the difference? How it deals with the error?

          What's the best way to send them to you? I guess I could just take the first few thousand reads of each pair along with a reference? That should do it and avoid sending massive data files.

          Chuck

          Comment


          • Hi Chuck,

            OK - so you do have 1-bp reads. That explains the error in unpaired mode. Given that, would you rather Bowtie rejected your 1-bp reads in paired-end mode (as it currently does in unpaired mode), or would you rather Bowtie accepted (but skipped) your 1-bp reads in unpaired mode? My feeling is that Bowtie should at least print a warning by default in both cases, since 1-bp reads are usually a sign that something went wrong upstream of the aligner. If there's a good reason why 1-bp reads should be tolerated, then maybe Bowtie should also provide a command-line option that suppresses the warning in cases where the user would like to tolerate it.

            Ben

            Comment


            • Hello, thanks for bowtie
              I've problem with downloading bowtie index for human genome from ftp://ftp.cbcb.umd.edu/pub/data/bowt...s_asm.ebwt.zip. I have no problem with smaller indexes such as g_gallus.ebwt.zip.
              Is it possible to split file for downloading?

              Comment


              • Originally posted by Ben Langmead View Post
                For now, the way to do that is via options like -k/-a/--nostrata/-m. You can count the number of alignments from the output bowtie generates.



                Bowtie aligns the entire read with a certain number of mismatches.



                Bowtie's job is to find legal alignments subject to the constraints imposed by the alignment and reporting policies specified by the user (see manual for info about -k/-m/-a/--nostrata, etc). Any additional filtering you might want to perform will have to be done externally, say, in a script.



                No - you'll have to do vector trimming ahead of time.

                Hope that helps,
                Ben
                Thanks a lot for the replies.

                Comment


                • hey Ben, another question. When I try to execute "/bowtie-0.9.9.3/bowtie e_coli reads/e_coli_1000.fq" in my Mac, I get a response like this: "Warning: Could not open file "reads/e_coli_1000.fq" for reading". What could be the reason for this? I downloaded "bowtie-0.9.9.3-bin-macos-10.5-i386.zip" and my mac is OSX10.5.6 with intel.

                  thanks in advance.

                  Comment


                  • PET as SET

                    Originally posted by Ben Langmead View Post

                    Given that, would you rather Bowtie rejected your 1-bp reads in paired-end mode (as it currently does in unpaired mode), or would you rather Bowtie accepted (but skipped) your 1-bp reads in unpaired mode? My feeling is that Bowtie should at least print a warning by default in both cases, since 1-bp reads are usually a sign that something went wrong upstream of the aligner. If there's a good reason why 1-bp reads should be tolerated, then maybe Bowtie should also provide a command-line option that suppresses the warning in cases where the user would like to tolerate it.

                    Ben
                    Ben, thanks for the reply. I agree with you - no, there is no compelling reason that 1 bp reads should be accepted. They do not add anything to the alignment of these short reads but it would be useful if they were just skipped and a warning was printed. Currently, the alignment fails completely.

                    Oh, one more thing I forgot to mention, when I converted the PET files to a 'raw' format, I actually changed all of the "." in the original fa file with "N" - this might also be the reason it worked, if bowtie counts the Ns as a base, just an unknown one, but the . is a missing position.

                    Thanks again!

                    Chuck

                    Comment


                    • Originally posted by -daf- View Post
                      Hello, thanks for bowtie
                      I've problem with downloading bowtie index for human genome from ftp://ftp.cbcb.umd.edu/pub/data/bowt...s_asm.ebwt.zip. I have no problem with smaller indexes such as g_gallus.ebwt.zip.
                      Is it possible to split file for downloading?
                      Sorry for the inconvenience, i have achieved success with linux ftp command

                      Comment


                      • Originally posted by -daf- View Post
                        Sorry for the inconvenience, i have achieved success with linux ftp command
                        Hi daf,

                        I've heard that complaint from others as well. I think that the unzip programs on some platforms (e.g Mac) cannot necessarily handle extracting > 2 GB archives. I went ahead and split the large archives into 2 each. See Bowtie page for changes.

                        Thanks,
                        Ben

                        Comment


                        • Originally posted by polsum View Post
                          hey Ben, another question. When I try to execute "/bowtie-0.9.9.3/bowtie e_coli reads/e_coli_1000.fq" in my Mac, I get a response like this: "Warning: Could not open file "reads/e_coli_1000.fq" for reading". What could be the reason for this? I downloaded "bowtie-0.9.9.3-bin-macos-10.5-i386.zip" and my mac is OSX10.5.6 with intel.

                          thanks in advance.
                          Hi polsum,

                          Does the "reads/e_coli_1000.fq" file exist, relative to your current working directory when you issue that command?

                          Ben

                          Comment


                          • Why is Bowtie Fast?

                            I am very impressed with Bowtie!
                            It is mega-ultra-fast, and runs on my [windows] laptop!

                            Does anyone knows why it is so fast? Comparing with Eland and MAQ which do exactly the same?
                            These informatic 'tricks' are everything that we need to handle such ammount of data.
                            I would like to apply the principles of bowtie to my own scripts, but have no idea what makes it so fast!

                            Any comments?
                            Thanks
                            Ines de Santiago
                            Last edited by inesdesantiago; 06-12-2009, 04:46 PM. Reason: typo

                            Comment


                            • Hi Ines,

                              The Bowtie paper has details about the algorithm. You can find more visual discussions in the slides linked to from the Bowtie website (see Other Documentation section in the right-hand sidebar).

                              Thanks,
                              Ben

                              Comment


                              • Bowtie BWT indexing

                                Thanks Ben!
                                I see that the BWT-based indexing of the reference genome is a great advantage. It allows Bowtie to do its searches with very small memory footprint. But does it mean that, because it uses less memory to index the reference genome, it will be faster? Is less memory == Fast Search?
                                Ines
                                Last edited by inesdesantiago; 06-13-2009, 07:26 AM. Reason: typo

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X