Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie is a nice tool for short read alignment I think. However, I found a problem in pair-end data mapping. I produced 75bp reads by simulating Illumina's high-throughput sequencing, and aligned them to the reference sequence. By the way, only few alignments, less than 10, are reported. As 1300000 alignments are reported with non paired-end mapping, probably it is wrongly mapped I think.
    My option is "bowtie -p 8 -a -y -X 650 human -1 reads_1.fa -2 reads_2.fa output.map".

    Can anybody tell me what is the problem?
    Last edited by Wind; 08-03-2009, 03:35 AM.

    Comment


    • Originally posted by Wind View Post
      Bowtie is a nice tool for short read alignment I think. However, I found a problem in pair-end data mapping. I produced 75bp reads by simulating Illumina's high-throughput sequencing, and aligned them to the reference sequence. By the way, only few alignments, less than 10, are reported. As 1300000 alignments are reported with non paired-end mapping, probably it is wrongly mapped I think.
      My option is "bowtie -p 8 -a -y -X 650 human -1 reads_1.fa -2 reads_2.fa output.map".
      This is probably due to the -I/--minins, -X/--maxins, and/or --fr/--rf/--ff options being set incorrectly. Please double-check the manual's description of those options and verify that your invocation matches the way you've simulated your reads. Also, make sure the simulated read files are formatted correctly, with all mates lining up properly.

      Thanks,
      Ben

      Comment


      • Thanks

        Hi Ben,

        Thanks for your advice. There were many 'N's in simulated data, so that they may interrupt paired-mapping. I'll try with other data sets. Thanks.

        Comment


        • Ben, help me..

          Hi Ben,
          I have a question for you about alignment result message.
          When I align certain short reads to reference using Bowtie, can I get a result message related to none-matched case??

          I could not find an option to get a such result message.

          I want to report even if certain short reads are not aligned to reference in order to use this information(not aligned!).

          I wil wait your answer, Ben. Thank you so much.

          Comment


          • Hi tianell,

            Originally posted by tianell View Post
            When I align certain short reads to reference using Bowtie, can I get a result message related to none-matched case??

            I could not find an option to get a such result message.

            I want to report even if certain short reads are not aligned to reference in order to use this information(not aligned!).
            Sorry, no, there is no option to print such a message. I'll add this as a feature request. In the meantime, it's quite easy to deduce that number either by using the --un/--max options (and then counting), or by subtracting the reported number from the number of input reads.

            Thanks,
            Ben

            Comment


            • Isn't there a feature to export unmapped reads to a file?

              I always run bowtie and export unmapped and repeats using

              --unfq unaligned.fastq --maxfa duplicates.fastq

              taking a look at the size of both files compared to your original file gives you an approx idea of % unaligned/repeats

              Comment


              • I wanted to discuss a use-case:
                A collection of 172 million reads ranging from 36 to 76 base long was used with bowtie to map to a reference.

                $ ./bowtie --best --un leftover -p 4 -t reference reads mapped
                $ grep -c '^@' leftover
                154828705
                $ wc -l mapped
                16269083 mapped

                The total of leftover and mapped is less than what we started with. Are the remaining reads mapping to multiple locations, and thus omitted in both these files?
                --
                bioinfosm

                Comment


                • Hi boinfosm,

                  Originally posted by bioinfosm View Post
                  The total of leftover and mapped is less than what we started with. Are the remaining reads mapping to multiple locations, and thus omitted in both these files?
                  That shouldn't be the case. When only --un is used (as opposed to both --un and --max), both the unaligned reads and the reads with a number of alignments exceeding the -m limit will go into the --un file. But you're not using the -m option, so no reads should be suppressed due to multiple alignments.

                  How are you counting the number of reads in your input set? Note that grep -c '^@' isn't necessarily correct because quality strings can also start with @.

                  Thanks,
                  Ben

                  Comment


                  • thanks Ben.. the light bulb just flashed on me!
                    --
                    bioinfosm

                    Comment


                    • Question about RepeatMasked hg18 index

                      I'm doing RNA-Seq on human samples. In many instances I am mapping using the -m1 -v2 --best criteria to the preassembled hg18.asm index available on the download site. I would like to know how Bowtie handles N's in the indices? I am wondering if it is possible to cut down the mapping time by building and mapping against a repeatmasked version of the genome?

                      Comment


                      • Originally posted by davisc View Post
                        I would like to know how Bowtie handles N's in the indices? I am wondering if it is possible to cut down the mapping time by building and mapping against a repeatmasked version of the genome?
                        When Bowtie indexes the reference, it elides non-A/C/G/T characters. So if you index a reference with stretches of Ns, Bowtie will never report an alignment spanning any of the stretches.

                        And yes, mapping against the repeatmasked version of the genome (and omitting -m 1) ought to be noticeably faster.

                        Ben

                        Comment


                        • Indexing human genome?

                          Hi!

                          I'm working on building an index of human genome locally and I was wondering how long this usually takes? Its been running for about 3 hrs, just wondering what to expect. I'm on a MAC dual core with 4GB ram.

                          Thanks!
                          Lizzy

                          Comment


                          • Hi Lizzy,

                            I'd expect, oh, about 7-8 hours or so. Did it finish?

                            Thanks,
                            Ben

                            Comment


                            • Im a newbie to Bowtie....tired of the counting down the hours using MAQ.

                              Currently building an index using Bowtie. What is the difference between
                              h_sapiens_asm.ebwt.zip and
                              h_sapiens.ebwt.zip

                              Thanks

                              L

                              Comment


                              • Hi Layla,

                                h_sapiens indexes the NCBI human reference contigs and h_sapiens_asm indexes the NCBI human reference assembly. Take a look at the scripts/make_h_sapiens.sh and scripts/make_h_sapiens_asm.sh files distributed with Bowtie to see exactly what fasta files were indexed and how.

                                People often prefer the assembly because the coordinates output by bowtie are more immediately useful (e.g., they correspond to the hg18 coordinates in the Genome Browser).

                                Thanks,
                                Ben

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                21 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X