Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SOLiD SAGE v3 Analysis Software

    Hello !

    I would like to ask two questions about the SOLiD SAGE software version 1.06.

    1)
    I have used the data analysis tool for mapping SOLiD SAGE sequences onto a reference sequence. Thereby, six output files were generated, which is in contrast to the software guide which states the creation of two output files:

    solidXYZ_F3.csfasta.27.0.output.tab
    solidXYZ_F3.csfasta.27.0.results.tab
    solidXYZ_F3.csfasta.27.1.output.tab
    solidXYZ_F3.csfasta.27.1.results.tab
    solidXYZ_F3.csfasta.27.2.output.tab
    solidXYZ_F3.csfasta.27.2.results.tab

    I have tried to figure out their relationships, but could not find a reasonable solution yet.
    Does anyone have any idea why there are six instead of two output files?

    2)
    In addition, I would like to know how does the SAGE software transfer cs-fasta-reads (color space reads of 35bp) into 27bp-tags starting with the bases CATG ?

    I am thankful for any suggestions.

    Best regards!

  • #2
    First of all, thank you very much for choosing SOLiD SAGE. I'm invovled in the software development.

    In the six files you got, there are actually three pairs of them: pair of 0 color space mismatch allowed, 1 color space mismatch allowed, and 2 color space mismatches allowed in the mapping procedure. In another word,

    Pair for 0 mismatch:
    solid0365_20090529_2_cynoM_F3.csfasta.27.0.output.tab
    solid0365_20090529_2_cynoM_F3.csfasta.27.0.results.tab

    Pair for 1 mismatch:
    solid0365_20090529_2_cynoM_F3.csfasta.27.1.output.tab
    solid0365_20090529_2_cynoM_F3.csfasta.27.1.results.tab

    Pair for 2 mismatches:
    solid0365_20090529_2_cynoM_F3.csfasta.27.2.output.tab
    solid0365_20090529_2_cynoM_F3.csfasta.27.2.results.tab

    Number 27 means, tag length has been set to 27bp. Two files mentioned in the manual refer to the following files:
    solid0365_20090529_2_cynoM_F3.csfasta.27.2.output.tab
    solid0365_20090529_2_cynoM_F3.csfasta.27.2.results.tab

    Hope this helps.
    Regards,

    -Tony

    Comment


    • #3
      Tony, how does SAGE results compare to WT, i.e when should one chose SAGE?

      Comment


      • #4
        Originally posted by Chipper View Post
        Tony, how does SAGE results compare to WT, i.e when should one chose SAGE?
        WhT is great when you need the identification and entire sequence of every transcript variant, mutation and polymorphism; wheras SAGE is most practical when you need a deeper and comprehensive digital expression profile(s) of multiple samples for screening purposes which utilize only a small fraction of the cost and resources (~1/50) than single WhT run.

        If you need to quantify the general genome-wide expression differences of multiple samples - why do you need to sequence the entire length of GAPDH more than 10,000 times? Just use the 27bp SAGE tag for a faster, cheaper and deeper profile.

        For example: use SAGE if you are studying expression changes in dose response, time course or grouped populations where you need good deep digital expression profiles for many samples.

        Hope this helps.

        Comment


        • #5
          Dears,
          only a cautionary note: there is a considerable difference between the orignal SAGE protocol and the SOLiD-SAGE protocol. In the original SAGE protocol the generation and sequencing of the ditag provided the security that quantification was free of amplification bias, because double or multiple ditags made up from the same tags most probably derived from amplification of the same ditag could be eliminated. This security is lost in the SOLiD protocol because only a single tag is sequenced and you never know if it is derived from amplification or not. The only protocol I know of that prevents amplification bias is SuperTag Digital Gene Expression (STDGE) profiling.
          All the best
          peter

          Comment


          • #6
            SOLiD SAGE Analysis software settings

            I'm confused in selecting the option for the tag length and no of mismatches. I assume the tag length that has most no. of matches with the read can be determinant of which tag length is appropriate. But to judge on both the parameters by this way I would have to analyze the results obtained by all combination of the values of both these parameters i.e. tag length 26-28 bp and no. of mismatches 0-3. Is there any shorter way to decide on the values of these parameters? What values of tag length and no. of mismatches are chosen in general?

            Comment


            • #7
              Using SOLiD SAGE analysis tool version 1.09, we always chosed tag length 27 bp (out of three possibilities, namely 26, 27, and 28 bp) and allowed up to 2 mismtaches. The number of mapped reads was very low. The latest verstion of the SOLiD SAGE analysis tool (version 1.10) allows to choose from a larger range of tag length values. ABI recommends to choose 21-23 bp, we have selected 22 bp. Allowing still up to 2 mismatches, the number of mapped reads could be increased notably as one would expect. It seems that 22 bp is sufficient long to map reads uniquely onto the transcriptome and identify the genes.

              Comment


              • #8
                Originally posted by DNAjunk View Post
                Using SOLiD SAGE analysis tool version 1.09, we always chosed tag length 27 bp (out of three possibilities, namely 26, 27, and 28 bp) and allowed up to 2 mismtaches. The number of mapped reads was very low. The latest verstion of the SOLiD SAGE analysis tool (version 1.10) allows to choose from a larger range of tag length values. ABI recommends to choose 21-23 bp, we have selected 22 bp. Allowing still up to 2 mismatches, the number of mapped reads could be increased notably as one would expect. It seems that 22 bp is sufficient long to map reads uniquely onto the transcriptome and identify the genes.
                Hi, DNAjunk, where did you find " SOLiD SAGE analysis tool (version 1.10) "? The version I can find from SOLID website is solid.sage.v106. regards

                Comment


                • #9
                  Hi!
                  Thanks for your request.
                  I have received version 1.10 some weeks ago from ABI representative in Europe...
                  I do not know why the website is not up to date.

                  Comment


                  • #10
                    Thanks for the info!

                    Comment


                    • #11
                      Originally posted by DNAjunk View Post
                      Hi!
                      Thanks for your request.
                      I have received version 1.10 some weeks ago from ABI representative in Europe...
                      I do not know why the website is not up to date.
                      Hi DNAjunk,

                      Thanks for the info. I've got the 1.10(corrected version) from ABI and have run it already. However, a lot of tags with thousands count that output from SAGE software do not exist in the mapping output .ma file. Did you run it successfully?

                      Thanks!

                      Comment


                      • #12
                        Hi RockSolid

                        I am a bit confused now, because the output files I get with vs. 1.10 are results.tab, match.lines, and output.tab; I do not get a *.ma file.

                        results.tab contains the tag sequence and the read id mapped to it
                        Tag_Seq GI_num GI_Pos Read_ID Mismatch
                        TTATGTGGACCATTTTCTCAGA 57129 485 >559_18_1051_F3 1

                        output.tab contains the tags and gene as well as count
                        Tag Count GI Description
                        AAAAAAAAAAAAAAGACTACTA 8 GI93426 >gid|93426|ref|SYCE1_001143764 SYCE1_001143764.? Derived from: Homo sapiens synaptonemal complex central element protein 1 (SYCE1), transcript variant 4, mRNA.

                        match.lines contains the ids of the mapped reads:
                        >559_18_1051_F3,-4242439.1

                        Could you give me some more details about the output you get?

                        Comment


                        • #13
                          Thanks for the Reply. DNAjunk.

                          I got "output.tab" and "results.tab" from the SAGE v110.

                          In the output.tab, there are lots of tags which can not be traced in the reference genome:

                          For example, the following is one of the records in the output.tab:

                          GGACGATGAGACCGACCTCGGA 3133 GIGI003281 >gi|GI003281|ref|13780108-13780851

                          I couldn't find the above tag "GGACGATGAGACCGACCTCGGA" in the reference genome(c. elegans).

                          Meanwhile, I found difficult to get the NCBI RefSeq for the c. elegans from NCBI website. Could you please let know where did you find your NCBI refSeq? Thanks very much for your time.

                          Comment


                          • #14
                            1) I forgot to mention that I have modified SAGE v110 so that the match.lines file is not removed upon completion of the mapping program. I need the match.lines for further analysis.

                            2) I have checked output.tab files and couldn't find tags that were present in the output.tab file but not in the reference mRNA. Usually, I use SAGE v110 to map reads on a reference mRNA. When I map reads (including SAGE reads) to genome then I use BioScope mapping pipeline.

                            3) Normally, we map to RefSeq human mRNA (NCBI) (current rel. 41 of 9-May-2010) or mRNA that was derived from in-house assembly of a monkey genome. We didn't analyze C.elegans SAGE data yet. Maybe the problem with the wrong tags in the output.tab file is related to the C.elegans you were using. We have C.elegans references from the Sanger Institute (genome and CDS). I could blast the tag from above GGACGATGAGACCGACCTCGGA and it maps 100% to caeel x3j62 elongation factor 1. Maybe you could use the Sanger database: http://www.sanger.ac.uk/resources/databases/ ?

                            Comment


                            • #15
                              How to find Percentage of reads mapped to reference???

                              how do I find out percentage of reads mapped to the
                              reference from SOLiD SAGE Analysis tool? In the mapping result, SOLiD SAGE
                              Analysis tool gives you the list of each type of tag and its count in
                              reference. But since tag is a part of read sequence and therefore it can
                              be a possibility that a tag can come from more than one read, so the
                              percentage of reads may not be calculated from the count of tags, so where
                              can I get the information of percentage of reads mapped to reference?

                              I will be grateful if anyone can help me out on this.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin


                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              39 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              41 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X