Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • doxologist
    Member
    • Jan 2009
    • 96

    Third Party Software for Colorspace data?

    Hi:

    besides ABI software, what software are people using for colorspace data? I heard some people working with MAQ (is this true?) Can Bowtie work with colorspace data?

    Thanks
  • snetmcom
    Senior Member
    • Oct 2008
    • 159

    #2


    there are some others, but I dont think they are using CS correctly.

    Comment

    • doxologist
      Member
      • Jan 2009
      • 96

      #3
      thanks for your response. Have you had experience with NextGENe? It doesnt seem that many people are using it. I'm trying it as well, but it seems to translate CS data to fasta before matching - doesn't this loose the ability to match to mismatches (since CS mismatches change all the subsequent nucleotides)?

      Comment

      • Roald
        Director at CLC bio
        • Aug 2008
        • 26

        #4
        Disclaimer: I work at CLC bio

        We have just included native color space assembly in our NGS cell software
        Welcome to QIAGEN Digital Insights LabCorp uses QCI and HGMD to improve identification and interpretation of genetic variants within inhereited diseases.Read...


        You can grab a white paper with benchmarks at http://clcbio.com/index.php?id=1368

        Cheers

        Roald

        Comment

        • doxologist
          Member
          • Jan 2009
          • 96

          #5
          I am now trying NextGenE and it seems that it translates colorspace data to fasta first before the analysis. Is this correct? If so, it seems that there is much potential error (0-2) and not a method recommended by ABI. Does this seem right?

          Comment

          • doxologist
            Member
            • Jan 2009
            • 96

            #6
            Thanks Ronald. I'll take a look at CLC bio.

            Comment

            • westerman
              Rick Westerman
              • Jun 2008
              • 1104

              #7
              Originally posted by doxologist View Post
              I am now trying NextGenE and it seems that it translates colorspace data to fasta first before the analysis. Is this correct? If so, it seems that there is much potential error (0-2) and not a method recommended by ABI. Does this seem right?
              I am not familiar with NextGenE but if they are indeed translating to base-space instead of doing their work within color-space then, yes, there is a great potential for error. Unlike traditional sequencing technologies where a single miscall would only affect that particular base, in the Solid a miscall will affect all downstream bases. Also by not working in color-space then one missing a large strength of the Solid -- great SNP calling.

              Comment

              • Roald
                Director at CLC bio
                • Aug 2008
                • 26

                #8
                You are both absolutely right that a huge amount of information is lost by aligning SOLiD data in sequence space, rather than in color space.
                The benchmarks we have made (see http://clcbio.com/index.php?id=1368 ) showed that the number of aligned reads increase by over 80% when reads are aligned in color space rather than in sequence space. This example is for reads of length 35 and the tendency will only increase as reads get longer.

                Comment

                • doxologist
                  Member
                  • Jan 2009
                  • 96

                  #9
                  Hmm... great.. thanks for the info. Perhaps it is already addressed... how does CLC Bio compare with Zoom and BFAST?

                  Comment

                  • Mr Mutundes
                    Member
                    • Jan 2009
                    • 17

                    #10
                    Allow me to ask what may be a dumb question...

                    If I "double encode" (to use the ABI term) both my reads and my reference sequence (so that colors are represented by ACGTs), then why can't I use bowtie, blat, blastall or whatever alignment program I like and expect success? Sure there would be some post-alignment work involved in distinguishing biological variants from sequencing errors but I don't see why the alignment itself wouldn't be valid and useful.

                    Thanks

                    Comment

                    • ECO
                      --Site Admin--
                      • Oct 2007
                      • 1360

                      #11
                      Originally posted by Mr Mutundes View Post
                      Allow me to ask what may be a dumb question...

                      If I "double encode" (to use the ABI term) both my reads and my reference sequence (so that colors are represented by ACGTs), then why can't I use bowtie, blat, blastall or whatever alignment program I like and expect success? Sure there would be some post-alignment work involved in distinguishing biological variants from sequencing errors but I don't see why the alignment itself wouldn't be valid and useful.

                      Thanks
                      Hey! Your answer is in Post #7 above!

                      Comment

                      • Mr Mutundes
                        Member
                        • Jan 2009
                        • 17

                        #12
                        no no no! "Double encoding" doesn't put you in base space!

                        Let me put the question again: a sequence of colors is often represented by digits, but can just as easily be represented by characters ACGT (somewhere in the AB corona lite stuff this is referred to as "double encoding") . If I have a query sequence and a target sequence both encoded this way then because they both "look like" nucleotide sequences they are acceptable as input to standard nucleotide alignment programs. But what is being aligned are two color sequences, not two base sequences. So if there is a color sequencing error the alignment will NOT be perturbed as it would be in an alignment done in " base space". (I think...) So - why can't we use traditional alignment programs?

                        Happy to be corrected!

                        Comment

                        • westerman
                          Rick Westerman
                          • Jun 2008
                          • 1104

                          #13
                          There are at least three problems, Mr Mutundes, with using double-encoded sequences with traditional alignment programs.

                          (1) As I mentioned above, a single color (or double-encoded) change in the start of the sequence will decode to entirely different base sequences.

                          (2) Related to the above, opposite strands do not match. Thus you have to tell your traditional program to align to one strand at a time.

                          (3) Traditional programs expect that a SNP to a single base change. Sequencing errors are also a single base. However in color space (and thus double-encoded space) SNPs are sequential changes and errors are a single change.

                          In summary the problem is not double-encoding per se -- as you point out it should not matter if the alphabet 0, 1, 2, 3 or the alphabet A, C, G, T is used. Rather the problem is that traditional programs do not know how to cope with the power and weakness of color-space.

                          Sitting down in front of a chalkboard with another person does a lot for the 'ah-ha!' discovery moment. Since I can not do that with you I will instead use my next couple of messages as a way to convey the above ideas. I assume that you know how color-space encoding is done by the sequencer. Also for ease of typing I will use runs of 7 bases instead of the normal 25 or 35 or (eventually) more.

                          Comment

                          • westerman
                            Rick Westerman
                            • Jun 2008
                            • 1104

                            #14
                            Single change causes big problems.

                            If I have two reads in color space

                            (1 CS) T3232032
                            (2 CS) T1232032

                            Which are the actual bases in base space

                            (1 BS) ACGTTAG
                            (2 BS) GATCCGA

                            And in double-encoded space without primer trimming:

                            (1 DEN) TTGTGATG
                            (2 DEN) TCGTGATG

                            Or in the more proper primer trimmed double-encoding (since the primer means something different than the double-encoding; e.g., the 'T' primer is actually a 'T' and not a substitute for the number '3'):

                            (1 DET) GTGATG
                            (2 DET) GTGATG

                            So now you take the double-encoded trimmed (DET) reads and put them into a traditional assembler. Congratulations, you have now assembled ACGTTAG and GATCCGA together!

                            Even if you take the double-encoded non-trimmed reads and put them through a traditional assembler then you end up with the same incorrect assembly since 7 of the 8 double-encoded bases align. Note that this percentage is even more against you if you are using 25- or 35-base reads. If you insist that your assembler make exact matches (8 of 8 in this case) then you never get adjacent overlaps and thus no contigs.

                            Comment

                            • westerman
                              Rick Westerman
                              • Jun 2008
                              • 1104

                              #15
                              Opposite strand reads do not align

                              I am using a repetitive sequence here but the same idea is true for non-repeat areas.

                              In color space there are two reads:

                              (CS 1) T0000000
                              (CS 2) T3000000

                              These represent in base space:

                              (BS 1) TTTTTTT
                              (BS 2) AAAAAAA

                              If these are reads on opposite strands then they should align. So let's convert them into double-encoding and put them through a traditional alignment program.

                              (DET 1) AAAAAA
                              (DET 2) AAAAAA

                              Ooops! It is going to be hard to find any alignment that way!

                              Comment

                              Latest Articles

                              Collapse

                              • GATTACAT
                                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by GATTACAT
                                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                07-01-2026, 11:43 AM
                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 11:08 AM
                              0 responses
                              7 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-30-2026, 05:37 AM
                              0 responses
                              11 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              53 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...