Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate RAD-Seq digestion sites?

    Hi, everyone!
    I'm new here.i need your help!
    I have got the paired-end RAD-Seq data, now i want to calculate how many digestion site have been covered ? how can i finish that?
    Thanks for your help!
    Last edited by fanwei; 08-19-2013, 06:34 PM.

  • #2
    Hi fanwei,

    You could run Stacks (http://creskolab.uoregon.edu/stacks/) for general RAD-Seq analysis, including how many RAD loci you have sequenced. If you have a reference genome, and are wondering how many of the in silico cut sites are present in your data, you could create a "RAD reference" of the cut sites + 100 bp adjacent DNA and align your reads against that.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Originally posted by SNPsaurus View Post
      Hi fanwei,

      You could run Stacks (http://creskolab.uoregon.edu/stacks/) for general RAD-Seq analysis, including how many RAD loci you have sequenced. If you have a reference genome, and are wondering how many of the in silico cut sites are present in your data, you could create a "RAD reference" of the cut sites + 100 bp adjacent DNA and align your reads against that.
      Thank you!
      yes, i have a reference genome. i have finished mapping using bwa, and using GATK for SNP calling. Approximately 6300 SNPs per sample have been found. But when i want to find specific SNPs between two samples, little has been found(less than 10). It seems that little overlaps exist. My sequencing depth is 3~4X.
      Can Stacks deal with this situation?
      Thanks for your help!

      Comment


      • #4
        If the coverage is low, you probably aren't getting enough depth to call SNPs at most loci. At 3-4X, you won't even pick up most heterozygous SNPs. If the goal is to find SNPs specific to a particular sample, you need to sequence to a high depth, feel confident that you don't have missing data, and then compare.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Originally posted by SNPsaurus View Post
          If the coverage is low, you probably aren't getting enough depth to call SNPs at most loci. At 3-4X, you won't even pick up most heterozygous SNPs. If the goal is to find SNPs specific to a particular sample, you need to sequence to a high depth, feel confident that you don't have missing data, and then compare.
          Yea, because heterozygous SNPs is genetic instability, my goal is to find homozygous SNPs. Do you think the coverage is too low?
          And sequencing is completed by company, they choose the TaqαI(TCGA) to digest genomic DNA. Now i'm wondering whether it is reasonable? Because there are too many digestion site in genome.

          Comment


          • #6
            Was that for RAD or ddRAD or GBS, do you know? If it is RAD-Seq, then the digesting with a 4-cutter enzyme will produce short fragments resistant to shearing, making library creation very inefficient. For any of the methods, a frequent cutter like that will produce 3-5 million tags for a moderate sized genome of 500 Mb. So it is not surprising you have low coverage, unless they sequenced just 2 samples per HiSeq lane.

            I'm guessing they only sequenced a portion of the possible cut sites, and so you ended up with a semi-random set of tags in one sample versus the other, with little overlap between them. If it was ddRAD or GBS, you also have to worry if they were not careful in the size distribution selection, since then one sample may end up with a bigger size range of fragments and a different set of loci selected.

            Why was it paired-end sequenced? Tell me a little about the species, etc.

            If a locus is sequenced at 3X, and it is diploid, then 25% of the time you'll only sequence one chromosome or the other, missing the heterozygosity. So you'll many times think it is homozygous for one allele in one sample and homozygous in the other allele in the other sample, when it is really heterozygous in both.
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • #7
              Originally posted by SNPsaurus View Post
              Was that for RAD or ddRAD or GBS, do you know? If it is RAD-Seq, then the digesting with a 4-cutter enzyme will produce short fragments resistant to shearing, making library creation very inefficient. For any of the methods, a frequent cutter like that will produce 3-5 million tags for a moderate sized genome of 500 Mb. So it is not surprising you have low coverage, unless they sequenced just 2 samples per HiSeq lane.

              I'm guessing they only sequenced a portion of the possible cut sites, and so you ended up with a semi-random set of tags in one sample versus the other, with little overlap between them. If it was ddRAD or GBS, you also have to worry if they were not careful in the size distribution selection, since then one sample may end up with a bigger size range of fragments and a different set of loci selected.

              Why was it paired-end sequenced? Tell me a little about the species, etc.

              If a locus is sequenced at 3X, and it is diploid, then 25% of the time you'll only sequence one chromosome or the other, missing the heterozygosity. So you'll many times think it is homozygous for one allele in one sample and homozygous in the other allele in the other sample, when it is really heterozygous in both.
              Thank you very much! Sorry for incomplete information provided. And i'm quite agree with you!
              Species is rice. It is diploid. The genome is about 400Mb. We choosed paired-end RAD-sequencing method.As previously mentioned,sequencing depth is 3~4X, coverage is 8%.My goal is to find specific SNPs per sample.
              Can you give me some suggestions?

              Comment


              • #8
                If you got the amount of sequencing expected, then the experiment was designed poorly, since that amount of sequencing is guaranteed to give a bad outcome. If I am understanding you, only 8% of the sites are sequenced in a sample. The chance of having reads in both samples is then (.08 X .08 = 0.0064) or less than 1% of the sites will be sequenced in both samples. Then, the low sequencing coverage of 3X at the sites also guarantees that there will be many miscalling of the SNPs.

                So, I don't see any way to rescue this experiment other than lots more sequencing. But it would probably be better to start over with a good design, unfortunately.
                Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                Comment


                • #9
                  Originally posted by SNPsaurus View Post
                  If you got the amount of sequencing expected, then the experiment was designed poorly, since that amount of sequencing is guaranteed to give a bad outcome. If I am understanding you, only 8% of the sites are sequenced in a sample. The chance of having reads in both samples is then (.08 X .08 = 0.0064) or less than 1% of the sites will be sequenced in both samples. Then, the low sequencing coverage of 3X at the sites also guarantees that there will be many miscalling of the SNPs.

                  So, I don't see any way to rescue this experiment other than lots more sequencing. But it would probably be better to start over with a good design, unfortunately.
                  Thank you very much! I'll redesign my work.

                  Comment


                  • #10
                    Originally posted by fanwei View Post
                    Thank you very much! Sorry for incomplete information provided. And i'm quite agree with you!
                    Species is rice. It is diploid. The genome is about 400Mb. We choosed paired-end RAD-sequencing method.As previously mentioned,sequencing depth is 3~4X, coverage is 8%.My goal is to find specific SNPs per sample.
                    Can you give me some suggestions?
                    You should probably sequence a number of samples in each variety to assay the full genetic diversity of each. If you are looking for SNPs specific to a sample, it is easy to be misled when looking at a small number of individuals.

                    Not knowing enough about your system, a typical approach would be to sequence around 100,000 loci at moderate depth (5X) for a large number of individuals (here at SNPsaurus we work in 96-well plate units). You'll get high-quality genotype calls for homozygous alleles, and can multiplex 190 individuals in a lane.
                    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                    Comment


                    • #11
                      Originally posted by SNPsaurus View Post
                      Hi fanwei,

                      You could run Stacks (http://creskolab.uoregon.edu/stacks/) for general RAD-Seq analysis, including how many RAD loci you have sequenced. If you have a reference genome, and are wondering how many of the in silico cut sites are present in your data, you could create a "RAD reference" of the cut sites + 100 bp adjacent DNA and align your reads against that.
                      hi, i'm trying to run Stacks, i have read manual downloaded from web,but also encounter problems. It seems complex. Are you familiar with that? Could you kindly help me how to run Stacks?

                      Comment


                      • #12
                        Sorry, we use our own analysis software for nextRAD. There is a user community at https://groups.google.com/forum/#!forum/stacks-users that might be able to help.
                        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                        Comment


                        • #13
                          Originally posted by SNPsaurus View Post
                          Sorry, we use our own analysis software for nextRAD. There is a user community at https://groups.google.com/forum/#!forum/stacks-users that might be able to help.
                          You are very nice! Thank you!

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin




                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM
                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 08:47 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          60 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          60 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          54 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X