Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I actually just pulled the table from ABI. The attached perl scripts didn't handle N's at all.

    I'm very new to this, and haven't run across 5 or 6 in our data. What do they stand for? (Ambiguity codes?)

    Comment


    • #17
      There are 3 transitions ... N to N; N to known (ACGT), known to N. These transitions can be represented by 3 different color-space numbers. In this case '4', '5', and '6'. Off the top of my head I do not remember which is which. Also only some of the ABI programs actually work with such this concept. encodeFasta.py, which the ABI SNP-calling manual says to use, does not handle any of the cases. It makes me wonder at times if ABI even uses their own programs on any real-life data. :-(

      Comment


      • #18
        I was sitting here avoiding work -- I have an intractable problem, ugh! -- wondering where I had seen that 4,5,6 color-space encoding. So I looked it up. The 'dna_subroutines.pm' (perl library, obviously) has the following which is used by the 'convert_to_dibase' subroutine. At least of the 26 programs in the 'bin' directory use the 'dna_subroutines.pm' module although I am not certain if any use the convert_to_dibase routine. None seem to do directly. None of the the python routines use the 4,5,6 color-space encoding.

        So ... using a '4' is probably good enough.

        $color{AN} = 4;
        $color{CN} = 4;
        $color{GN} = 4;
        $color{TN} = 4;
        $color{NA} = 5;
        $color{NC} = 5;
        $color{NG} = 5;
        $color{NT} = 5;
        $color{NN} = 6;

        Comment


        • #19
          "N" in basespace

          How come NA, NC, NT, NG all have the same colorspace code '5'.
          This means that once you have N for a given base you never know what is the next base? You don't know if it is A,G,C,T ...
          Right?
          Ines
          Last edited by inesdesantiago; 10-05-2009, 06:58 AM. Reason: typo

          Comment


          • #20
            Originally posted by inesdesantiago View Post
            How come is NA, NC, NT, NG all have the same code '5'.
            This means that once you have N for a given base you never know what is the next base? You don't know if it is A,G,C,T ...
            Right?
            Ines
            That is only partially correct but for the first approximation it is correct. You certainly can not properly decode from colorspace (CS) into basespace (BS) if there are 4,5, or 6s in the CS. However this does not keep you from using the information in matching.

            [Note: CS reads off of the sequencer will have a simple period (.) when there is an unknown and 0 through 3 for known ... 4,5,6s are only used when computationally processing BS->CS->BS translations]

            Let's go for an example.

            Say we have a (poor) reference sequence that in BS is:

            TCACGNGTCAAC

            Translating this into CS so that it can be mapped:

            T21134412101

            Computationally if we tried to convert this CS back to BS we would get:

            TCACGNNNNNNN

            On the hand if we had an actual CS read from the sequencer such as:

            T21130012101

            We can certainly map, allowing for mismatches, that actual read to our reference. If we had enough reads coming off the sequencer that were all the same as the above (or, better, had slightly different start points and also overlapped the region in question), then we could say with confidence that while our reference sequence indicated an 'N' in the position, our actual sequenced organism has a 'G'.

            Note that you can get into trouble with the above if your reads could potentially map to other parts of your organism and those parts are not part of your reference. This is a major reason for wanting different start sites and long reads. So tread with care.

            Comment


            • #21
              Thanks for the reply

              I have a collection of reads that are 35 nuc long.
              In all of them there is a '.' in the same position, so when I translate from
              colorspace to basesapce all of my reads became only 23 nucleotides long plus a tail of 12 N's:

              TCGAATGACTGTGACGTGCAGTCNNNNNNNNNNNN

              this is happening to all reads in the file. Maybe something went wrong with the sequencing?

              For mapping proposes, do you thing that it's better to use the 23nuc reads then the ones with the 'Ns'? I guess if I use the reads with so many N's they can actually map to wrong positions.
              Is this right?

              Thank you
              Ines

              Comment


              • #22
                Dear westerman,
                Thanks for the reply

                I have a collection of reads that are 35 nuc long.
                In all of them there is a '.' in the same position, so when I translate from
                colorspace to basesapce all of my reads became only 23 nucleotides long plus a tail of 12 N's:

                TCGAATGACTGTGACGTGCAGTCNNNNNNNNNNNN

                this is happening to all reads in the file. Maybe something went wrong with the sequencing?

                For mapping proposes, do you thing that it's better to use the 23nuc reads then the ones with the 'Ns'? I guess if I use the reads with so many N's they can actually map to wrong positions.
                Is this right?

                Thank you
                Ines

                Comment


                • #23
                  Using the 23nuc reads would be good. Even better is to do your mapping in colorspace without doing translation first. That way sequencer errors should be taken care of.

                  Comment


                  • #24
                    That's a good idea, I haven't thought about mapping using colorspace...
                    To bad bowtie doesn't map with colorspace yet..
                    Regards,
                    Ines

                    Comment


                    • #25
                      Originally posted by inesdesantiago View Post
                      That's a good idea, I haven't thought about mapping using colorspace...
                      To bad bowtie doesn't map with colorspace yet..
                      Regards,
                      Ines
                      Try bwa (in colorspace) with a seed length of <=22, or better yet a program that allows masking of the position with dots (I think mapreads can do it, maybe others can as well).

                      Comment


                      • #26
                        You can mask using the mapreads via the '-p' parameter. Usually this is done via the matching_large_genomes_cmap_save_script.pl program although other SOLiD routines also call mapreads.

                        E.g., try '-p 1111111111111111111100000000000000000000' or whatever fits your tag length and desired pattern.

                        mapreads will still try to map the full length tag and thus will have problems when the masked part seemingly overhangs the ends. That is, mapreads does chop off the masked part to make a shorter read but rather keeps the read full length.

                        Comment


                        • #27
                          Originally posted by westerman View Post
                          The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.
                          Hi all,

                          Does any one have the link or zipped file for the ABI 'corona lite'?

                          Many thanks.

                          Comment


                          • #28
                            Try this for Corona-Lite, i couldn't seem to find it on Life Techs site:

                            Comment


                            • #29
                              Originally posted by idonaldson View Post
                              Try this for Corona-Lite, i couldn't seem to find it on Life Techs site:
                              http://skip.ucsc.edu/phage_contigs/hartzog_phage/tools/
                              Thanks. I just checked "corona_lite_v4.0r2.0.tgz", is it the latest version?

                              Comment


                              • #30
                                Originally posted by gladexp View Post
                                Thanks. I just checked "corona_lite_v4.0r2.0.tgz", is it the latest version?
                                Probably. Corona lite is rather old. We've gone through Bioscope and are now using LifeScope since CL was released.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                50 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X