Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • The difference between sequence and quality might not matter for the error

    Originally posted by Ben Langmead View Post
    Hi Rich,

    Another user just contacted me via email and described something similar. When I ran their reads through bowtie, I realized that part of the problem is that Bowtie is printing the wrong error message. In their case, the error message should have been something more like "Too many quality values for read..." because they had a fastq entry where the quality string was 2 characters longer than the sequence string. Do you notice any inconsistencies like that in your input?

    I'll fix the error-message bug.

    Thanks,
    Ben
    Hi All,
    I got the same error "Reads file contained a pattern with more than 1024 quality values." with Bowtie 0.12.3
    My data have 76bp / the same length in quality:
    ILLUMINA-1A5BF1 1 8 61 12450 2086 0 1 TGCTGCGCTGTGATTTCTCGCTGGCAGACTTGGGTTGGCTTTGCTGAGGGGACGTGAGACATTGTATCAGGGGCCA bbbbbbbbbbbbbbbbbbbbbbbbbbbcbcbbbbbbbbbbbbbbbbbbb`bbbIbbbb_bbbbabbb]bbbbbbbb 1

    After I convert them to fastq format (76/76) like this:
    @ILLUMINA-1A5BF1:8:1:1303:18887#0/1
    TAGGAGGGTGACCTGAAGAGTGGAAGGAAGAGTCAGGAATACTCAGAAGAACCTGTGCATATAGGCCAGGCCCGAC
    +ILLUMINA-1A5BF1:8:1:1303:18887#0/1
    aaaa_aaaaaaaaaa]aaYaaaaaaaa_`a_a_a_aaXaa_a`[_aa_`N`aa_`]a]`aXHVV]a`^X]YQHYVa

    I got the error.
    I guess that the count difference between sequence and quality might not matter for the error.

    It'd be greatly appreciated if someone can help me.

    Cheers,

    KJ
    Last edited by mskonan; 03-11-2010, 06:32 PM.

    Comment


    • Originally posted by mskonan View Post
      Hi All,
      I got the same error "Reads file contained a pattern with more than 1024 quality values." with Bowtie 0.12.3
      My data have 76bp / the same length in quality:
      ILLUMINA-1A5BF1 1 8 61 12450 2086 0 1 TGCTGCGCTGTGATTTCTCGCTGGCAGACTTGGGTTGGCTTTGCTGAGGGGACGTGAGACATTGTATCAGGGGCCA bbbbbbbbbbbbbbbbbbbbbbbbbbbcbcbbbbbbbbbbbbbbbbbbb`bbbIbbbb_bbbbabbb]bbbbbbbb 1

      After I convert them to fastq format (76/76) like this:
      @ILLUMINA-1A5BF1:8:1:1303:18887#0/1
      TAGGAGGGTGACCTGAAGAGTGGAAGGAAGAGTCAGGAATACTCAGAAGAACCTGTGCATATAGGCCAGGCCCGAC
      +ILLUMINA-1A5BF1:8:1:1303:18887#0/1
      aaaa_aaaaaaaaaa]aaYaaaaaaaa_`a_a_a_aaXaa_a`[_aa_`N`aa_`]a]`aXHVV]a`^X]YQHYVa

      I got the error.
      I guess that the count difference between sequence and quality might not matter for the error.

      It'd be greatly appreciated if someone can help me.

      Cheers,

      KJ
      I guess the program just does not recognize the sign for carriage return?
      Xi Wang

      Comment


      • Ben,

        I have just downloaded Bowtie and can't get it to run. A window opens for bowtie.exe, but then quickly closes down again. This occurs in both Ubuntu and Windows. I suspect I am missing something simple, but would appreciate your help.

        Comment


        • Originally posted by mskonan View Post
          Hi All,
          I got the same error "Reads file contained a pattern with more than 1024 quality values." with Bowtie 0.12.3
          My data have 76bp / the same length in quality:
          ILLUMINA-1A5BF1 1 8 61 12450 2086 0 1 TGCTGCGCTGTGATTTCTCGCTGGCAGACTTGGGTTGGCTTTGCTGAGGGGACGTGAGACATTGTATCAGGGGCCA bbbbbbbbbbbbbbbbbbbbbbbbbbbcbcbbbbbbbbbbbbbbbbbbb`bbbIbbbb_bbbbabbb]bbbbbbbb 1

          After I convert them to fastq format (76/76) like this:
          @ILLUMINA-1A5BF1:8:1:1303:18887#0/1
          TAGGAGGGTGACCTGAAGAGTGGAAGGAAGAGTCAGGAATACTCAGAAGAACCTGTGCATATAGGCCAGGCCCGAC
          +ILLUMINA-1A5BF1:8:1:1303:18887#0/1
          aaaa_aaaaaaaaaa]aaYaaaaaaaa_`a_a_a_aaXaa_a`[_aa_`N`aa_`]a]`aXHVV]a`^X]YQHYVa

          I got the error.
          I guess that the count difference between sequence and quality might not matter for the error.

          It'd be greatly appreciated if someone can help me.

          Cheers,

          KJ
          mskonan,

          We actually found that we missed some reads in our initial filtering for removing reads with multiple uncalled bases (denoted with "."). It seems that if the read has multiple uncalled bases this is a problem for bowtie and it gives the "Reads file contained a pattern with more than 1024 quality values" error. Once these are removed the program works fine with the same file command.

          rich

          Comment


          • kevpar, you need to start Bowtie from a terminal.
            In windows hit the windows-key+R and type "cmd" then hit run, in Ubuntu start a terminal via Programs>Accessories>Terminal.
            If you have installed Bowtie in your path you can simply type "Bowtie" and hit enter, otherwise go to the path where you installed Bowtie first.

            Running Bowtie with no arguments will give you a manual page describing which options are available. An easier way is to examine the online Bowtie manual at http://bowtie-bio.sourceforge.net/manual.shtml .

            Comment


            • bowtie options

              Hi collective brains,

              I am getting confused with some Bowtie's options. I did read the manual carefully and also read most of the reads in this thread, but still confused.

              * What is difference between -m and -M? From the manual, it seems to me that -M is equivalent to -m --best --strata?

              * What option I should use to filter out matches from repeated reads?

              Thanks,

              D.

              Comment


              • Hi again,

                Looks like I already missed some very important posts in the beginning of the thread, especially Heng Li's post quoted below:
                Originally posted by lh3 View Post
                2. My main concern about bowtie is actually related to the column 7. I think by default (no --best), bowtie just outputs the first group of hits it meets. Users would not know whether it is the best or whether it is a repeat or not. I think (maybe wrong) this behaviour is only useful for screening human contaminations. With "--best", user would know the output is the best hit, but whether it is a repeat is still unknown in some cases. I know the "unknown" cases should be rare, but it would be necessary to convince users that the rare cases would not affect accuracy. Only with "--best -k 2", a user may know whether it is a repeat or not, although he/she would not know the number of occurrences. I think the "--best -k2" is the most desired behaviour and should become the default. Bowtie is fast enough. Slowing it down by a factor 3 will still make most users quite happy (see also below). Also quoting the speed under the default option would be unfair to others.
                I totally don't get this. I think --best -k 2 means to report maximum 2 best alignments, isn't it? How does one know if it is a repeat or not by using this option?
                Originally posted by lh3 View Post
                5. back to how the alignments are reported. I think the bwa behaviour is useful if people do not care too much about speed. Knowing the number of suboptimal hits would help us to decide which alignments are reliable. I know this is important to some (not all) SV detection algorithms. If you think the bwa behaviour is costly (possibly it is), I would recommend the soap2's one. Frequently, we may want to know the exact number of occurences (no need to output the detailed aligments). I am sure having the soap2 behaviour would make bowtie more popular.
                Heng Li, do you mean bwa's default is doing this? Can you elaborate a little more?

                Also, anybody knows what option of bowtie I should use to archive this behavior?

                Thanks,

                D.

                Comment


                • Originally posted by dukevn View Post
                  Hi collective brains,

                  I am getting confused with some Bowtie's options. I did read the manual carefully and also read most of the reads in this thread, but still confused.

                  * What is difference between -m and -M? From the manual, it seems to me that -M is equivalent to -m --best --strata?

                  * What option I should use to filter out matches from repeated reads?

                  Thanks,

                  D.
                  I think if you use -m 1 and then --max filename, you will select only sequences with 1 match and then in the file 'filename', specified after --max, will be all sequences that were filtered out, i.e. sequences with more than 1 match.

                  I also had a question. I believe the default behavior of bowtie is 2 mismatches in the first 28 bases. Are mismatches allowed after the 28th base ? So, if my sequences are 36 bases, there can be up to two mismatches in the first 28 bases, but how many mismatches are allowed from 28 to 36 ?

                  Comment


                  • Originally posted by mattanswers View Post
                    I think if you use -m 1 and then --max filename, you will select only sequences with 1 match and then in the file 'filename', specified after --max, will be all sequences that were filtered out, i.e. sequences with more than 1 match.
                    Yeah I thought of using -m 1 and filtering out. But I have a feeling of doing that will filter out a lot of valuable information. I am not sure, maybe advanced and more experienced people will have good advice about this.
                    Originally posted by mattanswers View Post
                    I also had a question. I believe the default behavior of bowtie is 2 mismatches in the first 28 bases. Are mismatches allowed after the 28th base ? So, if my sequences are 36 bases, there can be up to two mismatches in the first 28 bases, but how many mismatches are allowed from 28 to 36 ?
                    Isn't -l option (http://bowtie-bio.sourceforge.net/ma...wtie-options-l) for that purpose? Why cant you try -l 36 -n 2 (or -v 2)?

                    Comment


                    • Isn't -l option (http://bowtie-bio.sourceforge.net/ma...wtie-options-l) for that purpose? Why cant you try -l 36 -n 2 (or -v 2)?[/QUOTE]

                      Thanks for your reply. I can use your suggestions to control mismatches, but I was interested in knowing for the sake of understanding my results using the default how many, if any, mismatches were allowed after the 28th base.

                      Comment


                      • Originally posted by mattanswers View Post
                        Thanks for your reply. I can use your suggestions to control mismatches, but I was interested in knowing for the sake of understanding my results using the default how many, if any, mismatches were allowed after the 28th base.
                        I dont think there is any option to control that. If you are in -n <int> mode, then the <int> maximum mismatches will be applied for length specified by -l only, and everything after that will be ignored: bowtie simply does not search/map on those extra bases and hence there is no mismatches applied there.

                        In the other case, if you use -v, then the seed will be applied for the whole read's length. -l will be ignored.

                        Cheers,

                        D.

                        Comment


                        • Originally posted by dukevn View Post
                          I dont think there is any option to control that. If you are in -n <int> mode, then the <int> maximum mismatches will be applied for length specified by -l only, and everything after that will be ignored: bowtie simply does not search/map on those extra bases and hence there is no mismatches applied there.

                          In the other case, if you use -v, then the seed will be applied for the whole read's length. -l will be ignored.

                          Cheers,

                          D.
                          Please note that there is another option:

                          -e/--maqerr <int>
                          Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the "seed". The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.
                          Xi Wang

                          Comment


                          • Originally posted by dukevn View Post
                            bowtie simply does not search/map on those extra bases and hence there is no mismatches applied there.
                            D.
                            So, if I put in sequences that are 36 bases, bowtie is only looking at the first 28 and I might just as well have put in sequences with only 28 ?

                            I don't think that is the case, because I have had sequences where tiles 28 and 32 were inadvertently left out and alignment was only 17% due to the sequence shifting (base 29 becomes 28, 30 becomes 29, etc). However, when I use the -3/ trim function the % alignment gradually increases with each base trimmed until at 6 bases trimmed from the 3' end I get the same %alignment as when tiles 28 and 32 are present. With 6 bases trimmed, that would leave 30, and room for two mismatches (28, 29).

                            So, that would suggest that bowtie does look beyond the 28th base and use these bases for alignment, but is it allowing for mismatches ( I don't want to control the behavior, I just want to know what is going on.)

                            Comment


                            • best-first chunk memory problem

                              Ben,
                              I've tried to read all the posts, but I may have missed this answer if posted.

                              I'm having problem with running bowtie on a mouse genome dataset.
                              The error is a large number of reads giving the warning:

                              Warning: Exhausted best-first chunk memory for read .....

                              current command line: bowtie -S -p 1 --solexa1.3-quals --un unmapped.fq -m 10 --max maxmapped.fq -n 3 -X 600 /ccmb/CoreBA/Data/BowtieData/mm9 -1 ../s_7_1_sequence.txt -2 ../s_7_2_sequence.txt mm9_align.sam


                              version: bowtie --version
                              bowtie version 0.12.3
                              64-bit
                              Built on ccmb-comp1.umms.med.umich.edu
                              Tue Mar 2 12:33:36 EST 2010
                              Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
                              Options: -O3
                              Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}


                              Any suggestions would be helpful as I feel that I'm not getting the level of alignment I should be seeing with this data.

                              Thanks !

                              Jim

                              Comment


                              • Tutorial - Build a new index

                                Hi guys,

                                I'm new to seqanswers and to alignment. So I started using bowtie as it seems to have a good reputation manual and tutorial.

                                I am working through the tutorial, I have got to the point where I am supposed to build a new index using the E. coli strain O157:H7 downloaded from ncbi. However when I run the command I get this error

                                could not open NC_002127.fna

                                I'm using terminal in Mac OSX 10.5.8. Anybody had the same problem? Am I not putting the NC-002127.fna file in the correct directory?

                                I would move on in the tutorial however I need to use the build option to create an index for the organism I work on.

                                Thanks in advance.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 08:47 AM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                54 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X