Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sgupta
    Junior Member
    • Nov 2008
    • 6

    ABI Color Space to Bases

    Hi,

    I am trying to convert color space sequences generated by ABI SOLiD sequencer to actually bases using the following color space data "matrix":

    AA=0
    AC=1
    AG=2
    AT=3
    CC=0
    CA=1
    CT=2
    CG=3
    GG=0
    GT=1
    GA=2
    GC=3
    TT=0
    TG=1
    TC=2
    TA=3

    So, this
    >44_35_267_F3
    T20220213203000111000122223221121222

    gets converted to

    >44_35_267_F3
    CCTCCTGCTTAAAACACCCCAGAGATCTGTCAGAG

    I want to do this to be able to use alignment programs that cannot work with ABI color space data. But so far I think I am doing something wrong because my alignment rates are less than 5% using published data (allowing upto 2 mismatches, mouse genome).

    Any insights would be really appreciated.

    I may just go ahead and use MAQ to do this in color space but I am not sure why this does not work the way I am currently trying to doing it. I am very new to SOLiD data so I maybe missing some piece of information here.

    Thanks in advance.
  • lgoff
    Member
    • Feb 2008
    • 82

    #2
    SOLiD Alignment

    I have found that it is much better to do any analysis that you can in colorspace before you make the transition to DNA space. We are currently using the SHRiMP (U Toronto) alignment algorithm for fast and accurate alignment in colorspace. But even still 5% seems pretty low for DNA-space alignments of SOLiD data.

    Comment

    • lgoff
      Member
      • Feb 2008
      • 82

      #3
      Script available

      To answer your original question, just send me an email and I will provide you a python script that will convert .csfasta to .fasta as needed.

      Loyal
      lgoff(at)broad.mit.edu

      Comment

      • Chipper
        Senior Member
        • Mar 2008
        • 323

        #4
        Lgoff,
        what advantage do you see with SHRiMP compared to the ABI tools? It is said to be very slow?

        sgupta,
        Direct conversion is not possible for reads that have any sequencing error since it will change all following bases in base space. Your coversion look correct though, but it is very common that sequences have at least one cs error. SOCS and ZOOM! are supposed to do colorspace alignments, perhaps worth a try.

        Comment

        • lgoff
          Member
          • Feb 2008
          • 82

          #5
          SOLiD

          Originally, I was very put off by the SOLiD pipeline. It was initially very closed and there wasn't much I could do outside of the genome resequencing for which it was originally designed. The matching is relatively fast with SOLiD, but I do like the k-mer+Smith-waterman approach of SHRiMP. While the SOLiD pipeline has become much more robust. When we received our original machine, with the original cluster, it was underpowered for anything human. We had to re-develop our own pipeline for the specific applications we were using SOLiD for (smRNAs at the time). So we went with SHRiMP, and I have stuck with it since. Since we are lucky enough to be able to parallelize everything very nicely, the speed is not terribly an issue for us. I haven't tried the SOLiD pipeline in the past few months. Am I missing any dramatic improvements?

          Comment

          • ECO
            --Site Admin--
            • Oct 2007
            • 1360

            #6
            Hi Loyal,

            Can you share your strategy for parallel processing with SHRiMP?

            Comment

            • Torst
              Senior Member
              • Apr 2008
              • 275

              #7
              Originally posted by ECO View Post
              Can you share your strategy for parallel processing with SHRiMP?
              Parallelizing SHRiMP is as simple as splitting your input fasta/fastq file into smaller ones, running SHRiMP on each, then merging the hits output file.

              Nesoni (open source) does this automatically for you: http://www.vicbioinformatics.com/software.nesoni.shtml

              Comment

              • nilshomer
                Nils Homer
                • Nov 2008
                • 1283

                #8
                Originally posted by sgupta View Post
                Hi,

                I am trying to convert color space sequences generated by ABI SOLiD sequencer to actually bases using the following color space data "matrix":

                AA=0
                AC=1
                AG=2
                AT=3
                CC=0
                CA=1
                CT=2
                CG=3
                GG=0
                GT=1
                GA=2
                GC=3
                TT=0
                TG=1
                TC=2
                TA=3

                So, this
                >44_35_267_F3
                T20220213203000111000122223221121222

                gets converted to

                >44_35_267_F3
                CCTCCTGCTTAAAACACCCCAGAGATCTGTCAGAG

                I want to do this to be able to use alignment programs that cannot work with ABI color space data. But so far I think I am doing something wrong because my alignment rates are less than 5% using published data (allowing upto 2 mismatches, mouse genome).

                Any insights would be really appreciated.

                I may just go ahead and use MAQ to do this in color space but I am not sure why this does not work the way I am currently trying to doing it. I am very new to SOLiD data so I maybe missing some piece of information here.

                Thanks in advance.
                You can also do the conversion directly on our web server:
                http://genome.ucla.edu/bfast-server/. Click on the left tab that says CS2NT/NT2CS and enjoy!

                I would recommend aligning in color space since one color error will cause all bases after the color error to be translated incorrectly. Many great color space aware mapping tools exist.

                Nils

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 07-02-2026, 11:08 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                15 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                20 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                54 views
                0 reactions
                Last Post SEQadmin2  
                Working...