Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Corona Lite 4.0 Pairing Pipeline problem

    Hello to all from Russia! We completed the sequencing process and now we are assembling our data on reference genome we mapped our read but after that we catch small troubles with Pairing Pipeline... We use Corona Lite 4.0 r 2.0... We don't carry out "sorted" and "split", because we use Corona Lite 4.0 version. We carry out pairing_by_group.pl and after that Corona create folder with scripts... we tried "bash" this scripts (for example: PAIR_0.sh), but program said that it didn't find "scratch folder". Please say to me, could I bash this scripts without PBS or not?

  • #2
    You can run the corona-lite generated scripts without a queuing system. I have done it many times. If the program says it can't find your 'scratch folder' ... well, that means you do not have a scratch folder. You can assign one via the '--scratch=' command line parameter. Type in '--help' to see the possible parameters.

    Comment


    • #3
      Thank to you! I should to add '--scratch=' command to pairing_by_group.pl is not it? I think I have another problem - I started pairing_by_group.pl and Corona Lite requested some variables such as f3_length, r3_length (see attach file). In manual I do not find it variables in pairing_by_group.pl script...
      Attached Files
      Last edited by nedoluzhko; 09-23-2009, 11:20 PM. Reason: miss

      Comment


      • #4
        The f3_length, etc. parameters are optional. The message you see is merely informational. The manual does not explain them but if you do a 'pairing_by_group.pl --help' then all of the parameters are explained.

        One example:

        -f3l, --f3_length <arg> Match length of F3 tag. Use if match length is different than the tag length


        I do note that you are getting a lot of permission errors. You should correct these.

        Comment


        • #5
          Thank you very much! Please, may me is there big manuscript about Corona Lite i don't understand more in this program because I am beginner in bioinformatics... This pdf (http://solidsoftwaretools.com/gf/dow...ion_v4.2.1.pdf) don't give a lot information for me

          Comment


          • #6
            Corona Lite documentation is in a variety of places and it is often frustrating to find everything. The pdf that you quotes is the manual for SNP discovery. There is also a 42-page guide titled "SOLiD Data Analysis Pipeline". I am not sure where I got my copy -- perhaps from the ABI instead of solidsoftwaretools? -- but it is a useful resource. If I get the chance I will look for it.

            Comment


            • #7
              Dear Westerman! Many thanks for you! Please give answers for my several questions. I have statistics after mapping and I don't understand some terms. Is the beads = reads or not? Please see on attach file: what is "number 1"? Is the "number 2" - divergence from reference sequence - SNPs or errors of sequencing? Is the 83 % reference genome do not covered (see number - 5)... What is number 3 and 4?
              Attached Files

              Comment


              • #8
                #1) refers to how the beads were mapped. 19.7% of them had zero adjacent colorspace mismatches that where next to any other colorspace mismatch. These beads may have other mismatches but none of them are adjacent to each other. Or the beads may not have had any mismatches. #2 below refers to this more detail.

                0.51% of the beads had mismatches that were adjacent to other mismatches however these mismatches are considered to be 'valid' (and thus probably SNPs) -- as you may know of the 16 possible dual mismatch combinations only 4 are considered to be valid transitions.


                #2) This sort of repeats #1 but in more detail. We can tell there were 16.7M beads with an 'error' (e.g., a mismatch) and 14.7M of these had a single mismatch. By inference we can then say of the 27.4M beads with zero adjacent mismatches (from #1) 10.7M of them had zero mismatches. (27.4M minus 14.7M).

                We then look at the beads with adjacent errors and see that 701K of them had valid adjacent errors (mismatches) -- these are likely to be be SNPs -- while 286K of them had invalid errors; these latter beads are considered 'errors of sequencing' and are discarded and not used.

                ---> To actually see the SNPs you need to continue the pipeline and run the SNP calling portion. The above only gives you a quick indication of how the beads are mapping.

                #3 and #4) There were 26M points (bases) on your reference where the first 'base' of a bead was placed (or mapped). On the average 2.46 beads were placed (mapped) to each of these 26M points.

                --> It looks like you have a ~2 GBase reference sequence and 139M beads. Since you have a much larger reference than the number of beads, in an ideal situation each bead should be able to be placed down in its own unique point (base). Of course real life is never that ideal but, still, it seems like you have managed to amplify your starting material in such a way that too few points (bases) are covered and that those which are covered have too many beads covering them. The sequencing itself seems to be reasonable (46% of the beads matching is a bit low but I have seen even lower) and the number of sequencing errors is very low.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                59 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                57 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X