Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De novo assembly of variants using Cortex

    Hi All

    Just a quick note to say that our software and statistics for de novo assembly of variants from individuals and from populations are now pubished at Nature Genetics

    "De novo assembly and genotyping of variants using colored de Bruijn graphs",
    Iqbal, Caccamo, Turner, Flicek, McVean (doi:10.1038/ng.1028)

    This link will work for a bit



    Software is available here


    which includes a link for how to join our mailing list.

    Some highlights for you

    1. Cortex allows you to do de novo assembly of multiple samples simultaneously in order to call variants WITHOUT having to build or use a consensus reference. It's a very memory efficient de Bruijn assembler (if you're assembling bacteria you can assemble thousands simultaneously on a standard 32Gb server for example, or if you're doing humans you can do 10 on a 256Gb RAM server).

    2. The paper gives a bunch of example cases.
    - Variant calling in a high coverage human and comparing results with 1000 Genomes calls by validating both sets using fully sequenced fosmids.
    - Variant calling in a population of chimps not using the reference at all
    - HLA typing
    - assembling 164 humans from the 1000 Genomes project into a 4 colour graph (Europe, Africa, Asia, and the reference) and then pulling out novel sequence and estimating population frequency

    3. We provide a mathematical model (validated with simulations and on human and chimp data) that allows you to predict discovery power given experimental parameters (read length, depth) and informatic parameters (kmer) and biological parameters (repeat content), and on the length of the variant you want to call. This allows you to design your experiment based on your goals (eg high sensitivity SNP calling or high specificity SV calling would have different designs)

    Anyway - if you are analysing a species which has no reference or has a bad reference, or if you are analysing a population of bacteria some of which you think are highly diverged from the reference, or if you are interested in getting an unbiased view of variation in a species, you might be interested in giving it a try. I have used it successfully on human, chimp, plasmodium and s. aureus so far.

    Happy New Year and best wishes!

    Zam Iqbal

  • #2
    Cortex performance on an extensive aneuploidy organism

    Dear Zam,
    Some organism like Leishmania have extensive aneuploidy where copy number of 36 chromosomes is highly variable. Do I need particular parameter tuning in CORTEX for such an organism? The high quality draft genome exists for a parasite I am working on, but there are still many gaps so it seems CORTEX is an ideal tool.
    Thank
    Hideo

    Comment


    • #3
      Hi Hideo

      >Some organism like Leishmania have extensive aneuploidy where copy number of 36 >chromosomes is highly variable. Do I need particular parameter tuning in CORTEX for >such an organism?

      Cortex doesn't have an automated way of dealing with one chromosome with variable ploidy. Can I ask - have you sequenced one isolate, or do you think you have a mix of "individuals" each with a different copy number for that chromosome?


      >The high quality draft genome exists for a parasite I am working on, but there are still >many gaps so it seems CORTEX is an ideal tool.

      Sounds like a very interesting problem. Can you explain a bit more about the design of your experiment? Do you have one sample or many? Is the sample the same one from which the reference was built? Are you interested in diversity in general (and you want not to have results confused by this chromosome), or are you specifically interested in copy number of this chromosome? I can give better advice when I understand what it is you want to do

      best regards

      Zam

      Comment


      • #4
        QUOTE=Zam

        >Cortex doesn't have an automated way of dealing with one chromosome with variable ploidy. Can I ask - have you sequenced one isolate, or do you think you have a mix of "individuals" each with a different copy number for that chromosome?

        We have many cloned parasites as well as more isolates (not cloned) sequenced. (The previous results based on smaller samples are given in http://genome.cshlp.org/content/21/12/2143.short) In some sense our project is similar to the malaria project of Kwiatkowski group which you may familiar with. We already called SNPs and checked CNVs. We have checked the depth, insertion size, length depth distribution and single read distribution and so on for CNVs. And we do not see any extensive CNVs. But we suspect more CNVs are hidden in the gapped regions which by definition contain many variants.
        That is why I am excited about Cortex, which can do population level CNV detection. The parasite is a single cell protozoa which reproduces mostly clonal and the genome is well conserved at base and synteny level except ploidy. But fake recombination can arise due to quick ploidy changes. (2 weeks can change ploidy status.)

        >Sounds like a very interesting problem. Can you explain a bit more about the design of your experiment?
        >Do you have one sample or many? well over 100.
        >Is the sample the same one from which the reference was built?
        Yes, one sample is the reference parasite. Our institute maintains the bug.
        And we do have many samples under various stress.

        >Are you interested in diversity in general (and you want not to have results confused by this chromosome), or are you specifically interested in copy number of this chromosome?

        We are interested in ploidy as well as CNVs not affected by ploidy and SNPs.
        Practically every thing. We also have parallel projects involve more samples and metabolomic analysis of selected clonal samples.

        When an organism has extensive aneuploidy regularly, many conventional population genetics methods break down. Many programs have disclaimer saying that they are for diploid since many theories assume stable chromosome number. But, ploidy can be so unstable and variable so aneuploidy may not have strong impact on heterogeneity of SNPs. [ie, stable multiple copy chromosomes can accumulate more hetero SNPs. But if ploidy is unstable, then the prevalence of hetero SNPs is dictated by the recent diploid or even monosomy chromosome state.]

        Do you have any take on how we deal with SNP diversity when there is extensive aneuploidy; like how association study and LD are affected by ploidy change?

        Any case, I am going to test Cortex now... If you do not mind to take a look at our data I will provide you more information. I think you will surprise to see the extent of ploidy change.
        best,
        Hideo

        hi1@sanger ac uk

        Comment


        • #5
          Sounds good - why don't we take this offline - I'll email you.
          Zam

          Comment


          • #6
            Good day, Zam!

            I have a question - is there any manuals for running and installation of cortex_con?

            I would appreciate your help!

            Anna

            Comment


            • #7
              Hi Anna
              I don't think there is such a manual, but the right people to ask are Ricardo and Richard at TGAC:
              [email protected]
              [email protected]
              Email them directly, and tell them I sent you.
              cheers

              Zam

              Comment


              • #8
                Thank you!!

                Anna

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin


                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                39 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                41 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                35 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X