Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • doxologist
    Member
    • Jan 2009
    • 96

    csfasta --> fasta conversion

    I have a fast trivial question:
    what's the fastest/easier way to "decode" or convert the csfasta to fasta? I'm just doing this for a handful at a time for code-checking.

    thanks in advance.
  • lgoff
    Member
    • Feb 2008
    • 82

    #2
    Comparing?

    Are you looking to benchmark methods or just need to decode a set of sequences? Contact me directly if you would like a python module for SOLiD sequence manipulation(s) including csfasta --> fasta.

    Comment

    • Rao
      Member
      • Oct 2008
      • 36

      #3
      You mean converting colorspace seq.. to basespace seq...

      Comment

      • westerman
        Rick Westerman
        • Jun 2008
        • 1104

        #4
        The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.

        Comment

        • doxologist
          Member
          • Jan 2009
          • 96

          #5
          Originally posted by lgoff View Post
          Are you looking to benchmark methods or just need to decode a set of sequences? Contact me directly if you would like a python module for SOLiD sequence manipulation(s) including csfasta --> fasta.
          just for trivial conversion... decode

          Comment

          • jsun529
            Member
            • Apr 2009
            • 52

            #6
            Originally posted by westerman View Post
            The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.
            I get an error message run that code with :
            ImportError: No module named agapython.util.Dibase

            Where do I get the module? I run both code on Linux(ubuntu) and mac terminal, neither work

            Comment

            • westerman
              Rick Westerman
              • Jun 2008
              • 1104

              #7
              The module should come with corona lite. I suspect that you do not have your corona lite setup environment set up properly. From the README:

              3) Configure your environment *

              For csh/tcsh:
              % setenv CORONAROOT <INSTALL_DIR>/corona_lite
              % source $CORONAROOT/etc/profile.d/corona.csh

              For sh/ksh/bash:
              %export CORONAROOT=<INSTALL_DIR>/corona_lite
              %source $CORONAROOT/etc/profile.d/corona.sh

              * Remember to update your shell's init script (.cshrc, .bashrc,
              etc.) for future sessions with Corona Lite.

              Comment

              • roedel
                Junior Member
                • Jun 2009
                • 2

                #8
                csfasta -&gt; fasta

                When I tried to register at ABI to download the CORONA-lite program, I did not receive a confirmation. Then I used the colour scheme given in



                to write a perl script that does the conversion. As far as I understood, the first base in the csfasta is part of the adaptor sequence and should therefore be omitted in the fasta. This can be triggered by setting the shift parameter to 1 (0 would repeat the first base).

                ./csfasta2fasta.pl seqence.csfasta 1 > output.fasta

                If anyone could tell me if this does approximately the same as the CORONA-lite conversion script, I would be happy.
                Attached Files

                Comment

                • chiuchengliu
                  Junior Member
                  • Apr 2009
                  • 1

                  #9
                  Originally posted by roedel View Post
                  When I tried to register at ABI to download the CORONA-lite program, I did not receive a confirmation. Then I used the colour scheme given in



                  to write a perl script that does the conversion. As far as I understood, the first base in the csfasta is part of the adaptor sequence and should therefore be omitted in the fasta. This can be triggered by setting the shift parameter to 1 (0 would repeat the first base).

                  ./csfasta2fasta.pl seqence.csfasta 1 > output.fasta

                  If anyone could tell me if this does approximately the same as the CORONA-lite conversion script, I would be happy.
                  Your script works well except for an extra ">\n" in the output file.

                  ps: the translation of cs to bs loses the independent quality of adjacent color spaces. say, one miscalled colorspace in the middle will spoil the latter half bases.

                  Comment

                  • yoyoq
                    Junior Member
                    • Jul 2009
                    • 9

                    #10
                    thank you for that tool,

                    what the hell is double encoded fasta?

                    Comment

                    • westerman
                      Rick Westerman
                      • Jun 2008
                      • 1104

                      #11
                      'Double-encoded' is where a color-space file is encoded as ACGT. Said ACGT is not base space but a way to encode the 0123 of color-space into something that non color-space aware programs can use.

                      As an example, given the base-space sequence:

                      GTGCACCGTGCACG

                      This encodes into color-space:

                      G1131103113113

                      And can be double-encoded into:

                      GCCTCCATCCTCCT

                      Double-encoding is simple. 0 goes to 'A', 1 to 'C', etc. As I mention it is simply a way to make color-space into ACGT. I call it an abomination since it means nothing biologically useful yet looks like a biological sequence. It can lead to all sorts of false results if one does not realize what one is dealing with.

                      Comment

                      • yoyoq
                        Junior Member
                        • Jul 2009
                        • 9

                        #12
                        thanks

                        thanks,
                        yes i can confirm that it leads to biological confusion.

                        Comment

                        • yoyoq
                          Junior Member
                          • Jul 2009
                          • 9

                          #13
                          slight mod to conversion perl script

                          modified the conversion to avoid making that huge hash.
                          i was hitting memory limits the old way.
                          Attached Files

                          Comment

                          • mbreese
                            Junior Member
                            • Sep 2009
                            • 5

                            #14
                            The included colorspace -> basespace mapping is missing a few entries. Basically anything that includes a '4' or '.' is an N.

                            (Python format)
                            __colorspace = {
                            'A0': 'A',
                            'A1': 'C',
                            'A2': 'G',
                            'A3': 'T',
                            'A4': 'N',
                            'A.': 'N',
                            'C0': 'C',
                            'C1': 'A',
                            'C2': 'T',
                            'C3': 'G',
                            'C4': 'N',
                            'C.': 'N',
                            'G0': 'G',
                            'G1': 'T',
                            'G2': 'A',
                            'G3': 'C',
                            'G4': 'N',
                            'G.': 'N',
                            'T0': 'T',
                            'T1': 'G',
                            'T2': 'C',
                            'T3': 'A'
                            'T4': 'N',
                            'T.': 'N',
                            'N0': 'N',
                            'N1': 'N',
                            'N2': 'N',
                            'N3': 'N',
                            'N.': 'N',
                            }

                            Comment

                            • westerman
                              Rick Westerman
                              • Jun 2008
                              • 1104

                              #15
                              Actually you are also missing '5' and '6'. Also what about base-space that isn't an N (e.g., R, Y, etc.). Using a table like the above -- which is what the ABI-provided encodeFasta.py program uses -- is a poor way of handling the conversion IMHO. Unless you want to force non-1,2,3,4 to being a 4 and non-A,C,G,T to an N.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              38 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              100 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              122 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              114 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...