Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by saberdsl View Post
    hello nilshomer,can you tell me the masks work well with 35bp solid reads
    Use the recommended indexes for 25bp reads found in the manual. I have found that those work well with 35bp reads.

    Comment


    • #17
      Originally posted by saberdsl View Post
      hello nilshomer,can you tell me the masks work well with 35bp solid reads
      I have gotten good results with the indexes recommend in the BF book (section 7.1.2 -- yes for 50bp reads). Feel free to remove the indexes that use masks > than you read length.

      Let us know how it goes.
      -drd

      Comment


      • #18
        Does anyone have recommended masks for mouse mm9 solid 50 bp reads?
        http://kevin-gattaca.blogspot.com/

        Comment


        • #19
          Originally posted by KevinLam View Post
          Does anyone have recommended masks for mouse mm9 solid 50 bp reads?
          Use the recommended indexes for hg18 for 50bp reads.

          Comment


          • #20
            I have question regarding the "masks" in the index. What are the masks and why do we need them? Thanks for the response!

            Comment


            • #21
              the supplementary word doc is quite helpful in understanding bfast

              to quote from it
              in response to WHAT
              Instead of indexing the location of k-mer words in the genome, we generalize this concept to indexing the start positions of k-letter substrings that are obtained from a mask, which is slid along the reference genome at one base shifts to generate the index data. This is similar to spaced seeds introduced previously in homology search programs . For example, the letter selection mask suggested by the bit-pattern 0011001010, directly applied to the sequence "AAGATTACAG", selects the letter key "GAAA".

              In reponse to why
              it is a way of indexing the reference genome to speed up lookups.

              if you are asking why do we need more than one....

              Greater accuracy is to be achieved by using multiple indexes based on different masks to define the index keys, but keeping the number of letters in the key, k, large for uniqueness. Avoid using shorter keys (reducing k) to obtain accuracy, which results in exponential growth in spurious candidate locations.
              http://kevin-gattaca.blogspot.com/

              Comment


              • #22
                Thank you, Kevin!

                I have one more question for the forum. Is there a way to decipher the 4th row of a fastq file? By 4th row, I mean the fastq version of the phred-like values found in a *.qual file. I would like to parse the 4th row but I don't understand what each ` or ! or ? means other than that it is some mysterious code for quality value digits. Thank you for your reply!

                Comment


                • #23
                  Originally posted by elinor View Post
                  I have question regarding the "masks" in the index. What are the masks and why do we need them? Thanks for the response!
                  At the risk of sounding cranky, I would encourage you to read the user manual, published paper, and/or supplemental material before asking this question. Nils does a nice job of explaining his program, which is not always the case for software developers, so take advantage of these resources.

                  -Harold

                  Comment


                  • #24
                    Originally posted by elinor View Post
                    Thank you, Kevin!

                    I have one more question for the forum. Is there a way to decipher the 4th row of a fastq file? By 4th row, I mean the fastq version of the phred-like values found in a *.qual file. I would like to parse the 4th row but I don't understand what each ` or ! or ? means other than that it is some mysterious code for quality value digits. Thank you for your reply!
                    Please search around for a few minutes before asking such questions. You can find their decoding in many threads on this site. Use the search function here, or in google add the string "site:seqanswers.com" to your search. Wikipedia also has a FASTQ entry.

                    Comment


                    • #25
                      Nils - I've been reading through all your documentation and it's really great -- thanks! However, I'm struggling to figure out how to build indices for sacCer, genome size ~ 12Mb. It would seem like a key size of 16 would be about right. I've looked at btestindexes, which is what I think should be used to generate an appropriate index set. Your post says they should be generated with btestindexes using "recommended" settings but I can't figure out what those should be. Suggestions?

                      Comment


                      • #26
                        Originally posted by abattenhouse View Post
                        Nils - I've been reading through all your documentation and it's really great -- thanks! However, I'm struggling to figure out how to build indices for sacCer, genome size ~ 12Mb. It would seem like a key size of 16 would be about right. I've looked at btestindexes, which is what I think should be used to generate an appropriate index set. Your post says they should be generated with btestindexes using "recommended" settings but I can't figure out what those should be. Suggestions?
                        What read length(s) do you have? I would suggest to use the recommended indexes in the manual as a first pass. If you don't like the results, you can then build custom indexes. The vast majority are satisfied with the recommended indexes.

                        Comment


                        • #27
                          Nils - These are 36 bp reads. I have both SOLiD and Illumina data. Should I use the "25 bp" SOLiD mask set from your SOM? Also, it would be nice to know how to use btestindexes so alternative index sets could be generated and compared. Thanks, Anna

                          Comment


                          • #28
                            Originally posted by abattenhouse View Post
                            Nils - These are 36 bp reads. I have both SOLiD and Illumina data. Should I use the "25 bp" SOLiD mask set from your SOM? Also, it would be nice to know how to use btestindexes so alternative index sets could be generated and compared. Thanks, Anna
                            Yes, use the 25bp indexes. I am not sure if there are other posts here where I go into detail about "btestindexes". If not, let me know and I will try to explain it.

                            Comment


                            • #29
                              Nils - I've just tried the 25bp SOLiD mask set and I'm getting a lot of false negatives, as determined by alignments showing up in a deleted gene. These reads don't show up in a BWA alignment of the same data. So I think I need a set of BFAST masks with a larger key size. I'm pretty sure I've looked everywhere for more info on btestindexes with no luck (altho I've been reading so much stuff the last few days my head is about to explode Thanks, Anna

                              Comment


                              • #30
                                Originally posted by abattenhouse View Post
                                Nils - I've just tried the 25bp SOLiD mask set and I'm getting a lot of false negatives, as determined by alignments showing up in a deleted gene. These reads don't show up in a BWA alignment of the same data. So I think I need a set of BFAST masks with a larger key size. I'm pretty sure I've looked everywhere for more info on btestindexes with no luck (altho I've been reading so much stuff the last few days my head is about to explode Thanks, Anna
                                Try the 50bp ones... if that doesn't map many reads, then I'll see what I can do.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                9 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                50 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X