Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired-end Solexa data mapping wit Bowtie

    Hello,

    I am mapping paired-end reads from two files per lane, with the following Bowtie command line:

    ./bowtie -t -v 2 -p 8 -m 1 --solexa-quals mm9 -1 filename1.fastq -2 filename2.fastq outputfilename.map

    The program processed 200 million reads in 9 hours, but as a result only about 1% of them mapped. (I expect ~80% reads to be mapped for this experiment). It is very time-consuming to play with the Bowtie parameters for such large files, so I ask for your help.

    Any ideas what goes wrong?

    Thank you!

  • #2
    If only 1% map, then I'm sure taking the first 100,000 reads would give you plenty of sample data with which to tune your parameters without running the entire dataset through.

    Comment


    • #3
      Still can't get them mapped. It's a good idea to use truncated files. I created test files with just 1000 first reads from the two paired-end files. Now I can play with Bowtie parameters.

      The question is which parameters should I change? I already tried changing --fr/--rf/--ff, no help. What are the other possible options?
      Last edited by rebrendi; 11-09-2011, 06:16 AM.

      Comment


      • #4
        I'm not sure of the reason why this has worked for me, but try switching the sequence in which you order the fastq files. So basically you'd put "-1 filename2.fastq -2 filename1.fastq"

        Good luck!

        Comment


        • #5
          Originally posted by ERG View Post
          I'm not sure of the reason why this has worked for me, but try switching the sequence in which you order the fastq files. So basically you'd put "-1 filename2.fastq -2 filename1.fastq"

          Good luck!
          I tried this, no help

          Comment


          • #6
            Originally posted by rebrendi View Post
            Hello,

            I am mapping paired-end reads from two files per lane, with the following Bowtie command line:

            ./bowtie -t -v 2 -p 8 -m 1 --solexa-quals mm9 -1 filename1.fastq -2 filename2.fastq outputfilename.map

            The program processed 200 million reads in 9 hours, but as a result only about 1% of them mapped. (I expect ~80% reads to be mapped for this experiment). It is very time-consuming to play with the Bowtie parameters for such large files, so I ask for your help.

            Any ideas what goes wrong?

            Thank you!
            "--solexa-quals" indicates the reads have Q-scores encoded from a version of the GA Pipeline prior to 1.3. This version is ancient (in NGS terms), are you sure about this? (Though this may be irrelevant since you are aligning in -v mode which nominally ignores Q-scores.)

            Increase -m to something > 1.

            Comment


            • #7
              Can you map them as single end? Also try setting a larger value for -X depending on the insert size of the library.

              -X/--maxins <int> maximum insert size for paired-end alignment (default: 250)

              Chris

              Comment


              • #8
                It is probably indeed a matter of using the wrong quality settings and/or the -X paramter. The alignment summary in the end will tell you whether most reads got removed by the -m 1 parameter, but reducing alignments to 1% seems rather unrealistic.

                Another reason for this behavior might be processing the paired-end files with adapter/quality trimmers which remove sequences altogether. Sequence files need to be of the exact same length (same number of lines) and sequences need to correspond perfectly to each other in file 1 and file 2. Otherwise you might just try to align sequences from anywhere in the genome as sequence pairs and only a tiny subset will produce valid alignments.

                Comment


                • #9
                  Originally posted by kmcarr View Post
                  "--solexa-quals" indicates the reads have Q-scores encoded from a version of the GA Pipeline prior to 1.3. This version is ancient (in NGS terms), are you sure about this? (Though this may be irrelevant since you are aligning in -v mode which nominally ignores Q-scores.)

                  Increase -m to something > 1.
                  I tried mapping without "--solexa-quals". The same result
                  I tried Increase -m to 3 and to 10, This increased the number of mapped reads to 2% and 4% correspondingly. Still not too much help.

                  Comment


                  • #10
                    Originally posted by cjp View Post
                    Can you map them as single end? Also try setting a larger value for -X depending on the insert size of the library.

                    -X/--maxins <int> maximum insert size for paired-end alignment (default: 250)

                    Chris
                    I tried changing -X/--maxins <int> , The same result.
                    I tried mapping the two files independently in the single-read mode: 75% and 71% mapped for each of the file. So the data seems OK, but the paired-end mapping still does not work.

                    Comment


                    • #11
                      Originally posted by fkrueger View Post
                      Another reason for this behavior might be processing the paired-end files with adapter/quality trimmers which remove sequences altogether. Sequence files need to be of the exact same length (same number of lines) and sequences need to correspond perfectly to each other in file 1 and file 2. Otherwise you might just try to align sequences from anywhere in the genome as sequence pairs and only a tiny subset will produce valid alignments.
                      I have checked: the two files have exactly the same length.

                      Comment


                      • #12
                        Could you post the first say 20 lines of each file? Do the reads have similar names or belong to the same cluster?

                        Comment


                        • #13
                          Did you try other aligners such as BWA or Bowtie2. They are much better at pairing reads. Bowtie2 is easy to run and pretty quick too, but you'll need to reindex your genome.

                          example command:

                          bowtie2 -x /path/to/ref/hg19 -X 650 -p4 -1 r1.fq -2 r2.fq -S r12.bowtie2.sam

                          Chris

                          Comment


                          • #14
                            Originally posted by fkrueger View Post
                            Could you post the first say 20 lines of each file? Do the reads have similar names or belong to the same cluster?
                            Here are the first 4 lines of the first file:

                            @HWI-ST841:93099JACXX:8:1101:1134:1866 1:N:0:
                            NGGTAAGTGAGAAAATCCCCCAAAGGAGACCAAGACNCTGTTTCCTGATGC
                            +
                            #1:ABBDDFCBDBEHHHHIGIIGEGEECFFGEC?BH#00B?D?BDFFEHG>
                            @HWI-ST841:93099JACXX:8:1101:1117:1870 1:N:0:
                            NGACGCTGAGAGTTGTCATGCCTCGGTGNNNNNNNNNNNNNNNNNNNTGGC
                            +
                            #4:BBBDD?DDD+A@EIEIIIIIEFI;E#######################
                            @HWI-ST841:93099JACXX:8:1101:1196:1879 1:N:0:
                            NGAAGGTCAACTTGATCCTGATTCAACTTTGGTACCTGGTATCTGTCCAGA
                            +
                            #1=DFFFFHHHHHJIJJJJJJJJJIJJJJJJJIIJJJJJJIIIJJJJJJHI
                            @HWI-ST841:93099JACXX:8:1101:1236:1882 1:N:0:
                            NGGCAGGCAAGCTAACTGCTGCTGTGATGTTCAAGGCATGTGTTACCCATC
                            Here are the first 4 lines of the second file:
                            @HWI-ST841:93099JACXX:8:1101:1134:1866 2:N:0:
                            AGCATCTGCGTCTCTGTTACTATTTTTCAGAATGAGGGAGGAATGGGATGG
                            +
                            @@@FDDADH?D<<CF+<A,A4,:AFHG########################
                            @HWI-ST841:93099JACXX:8:1101:1117:1870 2:N:0:
                            AAGGGAGGAAGGTGTGTCACCAGCCTAAGTGAATGTGGACTGTGCTGTTTA
                            +
                            @?@FFBDDFFFHFHHIJBHIIGIDGH3:C?DGHDGGGIGEHGHGDGGFHG@
                            @HWI-ST841:93099JACXX:8:1101:1196:1879 2:N:0:
                            AGATCCTGAAGAAATCCAAAACACCATCAGATCCTTCTACAAAAGGCTATA
                            +
                            CCCFFFFFHHHHGJJJJJJJJJJJJJJJJJIJJJJJJJJJIJJIIIJJJJI
                            @HWI-ST841:93099JACXX:8:1101:1236:1882 2:N:0:
                            AGGAGGAAGAAAGATTATAAAAGCTTTACAAAAGGTTCCGCCGTTGGAAGC

                            Comment


                            • #15
                              Originally posted by cjp View Post
                              Did you try other aligners such as BWA or Bowtie2.
                              I tried Eland, there were also the same problems. I did not try BWA or Bowtie2.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X