Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DPNII , NlaIII and solexa tag library

    Hello,

    Solexa tag library can be made by DPNII or NlaIII. If DPNII will be better than NlaIII? it can be stored in -20C, and seems much stable than NlaIII. Can one bp difference in tag make big difference?

    I am stressed by choosing a kit to make an expression tag Solexa library. Can I ask your opinion? Thanks!

    Pingh

  • #2
    Hi Pingh,

    It's funny, I just posted this somewhere else right before I found this post. I have only worked with NlaIII, but I've done a small project for someone who used DpnII and noticed that this enzyme seems to cut much less frequently than NlaIII, at least in some genomes? The organism in question is the poplar tree populus trichocarpa. While generating a pre-annotated tag set for the genome, I found that NlaIII would have made a far larger tag set than DpnII. This difference is also well beyond a question of tag length alone, as you can see:

    DpnII @ 16bp: 706,506 (unique tag sequences)
    DpnII @ 17bp: 1,390,024
    NlaIII @ 16bp: 1,214,509
    NlaIII @ 17bp: 2,392,656

    These are unique sequences, not actual genomic coverage. But repeat tags only vary between 5.5-6.5%. I don't know if you can get 17bp reads out of DpnII, but I'd still go with NlaIII here. I am not a molecular biologist though, and cannot tell you about kinetics or stability or reliability of one enzyme over another.

    Also, this is only one organism, but next week I'll have another seven organisms compared in this way. I've posted this at the solexa group at Google, but I'll post what I find here next week, as well.

    Ariel Paulson
    Programmer Analyst
    Microarray Group
    Stowers Institute for Medical Research

    Comment


    • #3
      Thanks Ariel. I will order the NlaIII and try. Let's see how many tags I can get. it is maize.

      Comment


      • #4
        Maize NlaIII tags

        Pingh,

        If you haven't ordered already, let me bring in the maize genome and test it. I've surveyed 10 organisms so far and did find a couple with slight biases towards DpnII. But most had far more NlaIII sites, sometimes double.

        Ariel

        Comment


        • #5
          Ariel do you have a script that can quickly generate a tag library?

          Comment


          • #6
            I do, it comes in two components: one to generate a tag library + annotation set, the other to automate annotation of Eland output. The only hitch is, the genome must be available as an Ensembl-style mysql DB. And you need a *lot* of RAM (well that depends on the organism). I'm developing a GFF-based version, but that will be another month at least.

            I've just finished rebuilding the first half and am waiting on our pipeline man to re-run Eland using the new tag tables. I have to make sure the second half still works properly, before I can really say I have scripts that work!

            I've tested the tag-table generator on 11 organisms so far (2 more are being tested as I write this). Things look good, but it's way too much data to be totally certain of. The earlier version has been tested in depth on chicken and mouse, and I couldn't find any errors. Are you interested in alpha-testing?

            Comment


            • #7
              Maize NlaIII tags

              Pingh,

              I am finding about a 3:2 bias in favor of NlaIII sites, using maizesequence.org's zea_mays_core_48, BAC database. There are some oddities with the build, though. For one, genes are annotated to contigs, not chromosomes, so I can't tell you which enzyme produces more gene-associated tags yet (though it's probably NlaIII). Also, the tag/site ratio is extremely low. Since GATC and CATG are both palindromes, you could get (in theory) 2 tags for every 1 restriction site, so your maximum tag/site ratio would be 200%. Real ratios that I've seen vary between 130%-190%, because some tags are repeats. But Zea mays is down around 50%?? This means lots and lots of repeats, and I'm finding that over 1/3 of unique tag sequences are indeed repetitive (regardless of enzyme), which is very high.

              I'm not totally sure yet if this is an informatics artifact, but I've never found this problem in other genomes. I'm curious, so I'll re-analyze the genome at the contig level instead of the chromosomal, and see what happens. This could take a few days, since there are over 300k contigs. But either way, NlaIII has better coverage than DpnII, it just means that mapping reads back to the genome will be an interesting task.

              Ariel

              Comment


              • #8
                NlaIII and DpnII

                I analyzed the "tag-omes" of thirteen model organisms to see how DpnII would fare versus NlaIII, in terms of number of unique tags is could generate and how many sites could be found. The results were rather surprising to me, but maybe they aren't surprising to someone else (and I want to hear from them!).

                From this small survey, looks like higher eukaryotes have a strong bias against DpnII sites, while lower eukaryotes do not. Plants vary. In only two cases did DpnII come out ahead in any way: Anopheles, where DpnII generates 0.6% more unique tags than NlaIII, and C.elegans, which has 1.3% more DpnII sites, but still loses to NlaIII because NlaIII generates slightly more unique tags.

                File can be found at: http://research.stowers-institute.or..._release_1.xls

                Any observations?

                Comment


                • #9
                  Thanks Ariel, I ordered NlaIII. Let's see how many tags I can get.
                  Can I ask have you tried 5% TBE gel to purify PCR product? The protocol suggest 6% Novex TBE PAGE gel, 1.0 mm, 10 well, but we have only Bio-rad Criterion Cell, and 5% TBE precast gel. Do you think that will be OK? How much ethidium bromide do you think good for Dark Reader transilluminator? Many thanks!

                  Comment


                  • #10
                    I think the higher number of NlaIII sites is due to need to avoid CpG dinucleotides in organisms where methylation of CpG's occurs.
                    NlaIII uses CATG sequence so CpG's are not are not a problem for these sites regardless of the nucleotides that precede or follow the sequence.
                    However DpnII has GATC so if it is preceded by a C or followed by a G you will have a CpG site. I did a quick check in my organism of interest (bovine) and found a dramatic underepresentation of C nucleotides in the preceding position and a similar under-representation of G nucleotides in the suceeding position.

                    Comment


                    • #11
                      Pmcget -- That is a very interesting observation! Makes sense to me. Actually I was going to test Bos taurus in the next set of genomes, too.


                      Pinghli -- let me forward that question to my molecular bio people!

                      Comment


                      • #12
                        pmcget, that's a nice insight and I can buy it as a major factor, especially since the effect was most strongly observed in vertebrates. But it seems to me that CG suppression would result in DpnII sites being 75% of NlaIII, assuming no significant GC bias in the genome. Since the Homo sapiens genome is only 42% GC the difference due to CG suppression would be even less. Ariel's numbers show a 50% reduction for vertebrates. I'll admit that I have no clue what other factors may be at play, and that my preceding analysis my be off base as well. Do you think there might be something in addition to CG suppression involved?

                        O.K., I just paused to do some Googling and found something interesting. First I must acknowledge that I am sort of responsible for kicking off this whole thing, it was my data set of poplar gene expression data which Ariel was helping me with. So, I just found a paper (http://www.pnas.org/cgi/reprint/0401641101v1.pdf) examining poplar ESTs in which they found a significant CG suppression in codon positions 2-3 in poplar. By contrast Arabidopsis shows very little bias against CG in its codons. This seems to agree with Ariel's site frequency observations in poplar vs. Arabidopsis.

                        Comment


                        • #13
                          Originally posted by pinghli View Post
                          Can I ask have you tried 5% TBE gel to purify PCR product? The protocol suggest 6% Novex TBE PAGE gel, 1.0 mm, 10 well, but we have only Bio-rad Criterion Cell, and 5% TBE precast gel. Do you think that will be OK? How much ethidium bromide do you think good for Dark Reader transilluminator? Many thanks!
                          Pingh,

                          This was the answer I got from our mol bio department:

                          {
                          I have only used the 6% Novex TBE PAGE gel. My question would be if this person is using a 5% TBE PAGE gel? Polyacrylamide and agarose are different and separate DNA differently. If it is not I would look into buying the Novex gels. (I err on the side of caution with Illumina protocols until I've had time and samples to try different things)
                          The thicker the gel the longer it will take to run. With a 5% gel this person will need to watch their gel and maybe not run it as long as the protocol says. 200V is also a high voltage so just make sure the gel doesn't get too hot and melt. It does not say how much EtBr to use so I believe I used 4ul in about 50mls of buffer and let it sit for 2-3 min. I do not let the gel soak in the EtBr/TBE mixture very long.
                          This has always given me bright bands.
                          }

                          Ariel

                          Comment


                          • #14
                            Thanks Ariel.

                            Comment


                            • #15
                              Tag table software now available

                              I've had several of requests to make our tag table creation software publicly available, so now it is.



                              This is designed for use with Illumina's GEX protocol. There are two halves. The first script uses an Ensembl database to produce tag tables which are similar to Illumina's proof-of-concept tables for human and mouse. However it does not follow their classification rubric, we developed our own. The second script consolidates and annotates the aligned data; right now only Eland alignments are supported, specifically the s_*_eland_result.txt format.

                              It's written in pretty transparent perl, and it haven't bothered with use strict or -w or anything (yet). You're welcome to hack around with it. If people like it, I'll develop it further. There are already some developments planned, which are listed on the site. The site also goes into detail regarding the internal operations of the software and assumptions about genomic features. There's more to post there so I'll be working on it over the next couple of months.

                              Thank you again Kevin Carr for helping me test/improve this software!

                              Please direct any feedback to apa[at]stowers-institute[dot]org.

                              Ariel Paulson
                              Programmer Analyst
                              Microarray Group
                              Stowers Institute for Medical Research

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin


                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              39 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              41 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X