Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie and Color Space

    Hey, I tried the latest release of Bowtie, which now supports color space.

    I downloaded the dataset from the link below and got the results further below after slightly less than 4 hours.

    (I study sequence patterns in Nucleosomal DNA.)


    ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX000/SRX000425/
    (both files)

    /Desktop/bowtie-0.12.0-beta1$ ./bowtie -C -q -a -m 1 c_elegans_ws200_c c_elegans_425.fastq c_elegans_425_positions.txt | awk -v OFS='\t' '{if($2 == "-") {$4 += (length($5)-1)} ; print $0}'

    # reads processed: 107422570
    # reads with at least one reported alignment: 35370418 (32.93%)
    # reads that failed to align: 66794981 (62.18%)
    # reads with alignments suppressed due to -m: 5257171 (4.89%)
    Reported 35370418 alignments to 1 output stream(s)

    I think the quality of reads using Solid is not as good as Solexa (i.e. more mismatches). I used the default 2 mismatch threshold.

    I'm going to try to compare with BFAST someday once I figure out how to make proper masks for C. elegans. Bowtie is nice for those who have no idea how these programs work and are not computer science majors. The manual is very easy to understand and one can simply download the indices for commonly used genomes from the website.

    -Clayton

  • #2
    Originally posted by cutcopy11 View Post
    Hey, I tried the latest release of Bowtie, which now supports color space.

    I downloaded the dataset from the link below and got the results further below after slightly less than 4 hours.

    (I study sequence patterns in Nucleosomal DNA.)


    ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX000/SRX000425/
    (both files)

    /Desktop/bowtie-0.12.0-beta1$ ./bowtie -C -q -a -m 1 c_elegans_ws200_c c_elegans_425.fastq c_elegans_425_positions.txt | awk -v OFS='\t' '{if($2 == "-") {$4 += (length($5)-1)} ; print $0}'

    # reads processed: 107422570
    # reads with at least one reported alignment: 35370418 (32.93%)
    # reads that failed to align: 66794981 (62.18%)
    # reads with alignments suppressed due to -m: 5257171 (4.89%)
    Reported 35370418 alignments to 1 output stream(s)

    I think the quality of reads using Solid is not as good as Solexa (i.e. more mismatches). I used the default 2 mismatch threshold.

    I'm going to try to compare with BFAST someday once I figure out how to make proper masks for C. elegans. Bowtie is nice for those who have no idea how these programs work and are not computer science majors. The manual is very easy to understand and one can simply download the indices for commonly used genomes from the website.

    -Clayton
    I would set the # of mismatches to 10% of the read length since SOLiD error can be up to 10% and a SNP will eat up two mismatches in color space.

    Comment


    • #3
      Can you tell me how to convert csfasta and qual files to csfastq? Thanks!

      Comment


      • #4
        exactly. you need to increase mismatch tolerance levels. it takes 2 mismatches for a single SNP, 1 mismatch is system error. Most people run 6 mismatches for a 50bp tag if you have the processing power.

        Comment


        • #5
          Originally posted by xuying View Post
          Can you tell me how to convert csfasta and qual files to csfastq? Thanks!
          Hi Xuying,
          You are much more likely to get an answer to a question if you start a new thread. Especially if the current thread is unrelated to your question.

          --
          Phillip

          Comment


          • #6
            Originally posted by xuying View Post
            Can you tell me how to convert csfasta and qual files to csfastq? Thanks!
            I can't, but google can: http://www.google.com/search?rlz=1C1...les+to+csfastq
            --
            Senthil Palanisami

            Comment


            • #7
              Originally posted by spenthil View Post
              If it is going to be that way, then try

              For all those people who find it more convenient to bother you with their question rather than to Google it for themselves.

              Comment


              • #8
                paired reads

                has anyone had success using bowtie with colorspace mate pair inputs?

                Comment


                • #9
                  Originally posted by snetmcom View Post
                  exactly. you need to increase mismatch tolerance levels. it takes 2 mismatches for a single SNP, 1 mismatch is system error. Most people run 6 mismatches for a 50bp tag if you have the processing power.
                  But, Bowtie only can set the mismatch up to 3 (-v). How could you fix it?

                  Comment


                  • #10
                    Originally posted by June View Post
                    But, Bowtie only can set the mismatch up to 3 (-v). How could you fix it?
                    It allows up to 3 mismatches in the seed, you don't need to set the -v parameter. You might need to increase -e to allow high-quality mismatches for SNPs though.

                    It is quite amasing to align 10 M reads in 15 min on a single node... But it seems strange that --best is not default.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    29 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X