Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zmartine
    Junior Member
    • Nov 2009
    • 4

    Assisted de novo genome assembly? Create new consensus mapping reads to reference?

    Greetings.

    My issue:

    I need the SNP-difference(s) between clone 1 and clone 2 of a haploid eukaryote. I do not have an assembled genome of this species for mapping, but there is one that is closely related with 100% synteny (I'll call it X, here).

    I have illumina 50pb paired-end reads. I tried mapping clone 1 and clone 2 to X separately, then extracting the SNPs, removing the intersection of the sets and just using the complement. But there are too many SNPs, 1/100bp, and the sequence quality differs problematically among the samples (the clone 1 sample is gorgeous, clone 2 is a little iffy but higher coverage), thus the SNP list for clone 2 is more than twice as long as clone 1 (but the spurious SNPs have a > 100 coverage and Q scores over 100 in many cases.

    (Pipeline = BWA -> sampe -> SAMtools mpileup -> BCFtools vcfutils.pl -> SNPs)

    I'm thinking of assembling one clone to X, then exporting that as a new sequence, and then mapping clone 2 to that and cutting out the middle man. I am the only one in my immediate area working on this and I am just a mapping monkey, I don't know much about how to assemble a new genome and use it as a reference -- so any advice on how to do this or alternatives to solving the Find the SNP between Clone 1 and Clone 2 Problem is much appreciated.
    Last edited by zmartine; 02-07-2012, 09:00 AM.
  • nickloman
    Senior Member
    • Jul 2009
    • 355

    #2
    This is a bit tricky but I would say your most likely options are:

    1) do a true de novo assembly of clone 1, exclude the repetitive contigs using coverage depth as a guide and call SNPs by mapping clone 2 against those contigs - you could use Velvet, SOAPdenovo etc. for the assembly, and your regular aligner for mapping

    2) do a reference-guided de novo assembly of clone 1 and then map clone 2 against that - you could use CLC Bio or MIRA for that

    3) do a true de novo assembly of clone 1 with the same method as 1), scaffold the contigs against your reference using something like BAMBUS and map SNPs back

    Have a look at http://www.molecularevolution.org/re...owtie_activity for a simple tutorial on Velvet.

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #3
      If you want a non de novo method, you could try aligning the higher quality data to your reference, make a new reference by correcting for the SNPs you found, and then realign to that corrected reference. Hopefully, you should find that the number of new SNPs is drastically lower. You can try iterating that another time, then align sample 2 to your corrected reference.

      Comment

      • Zam
        Member
        • Apr 2010
        • 51

        #4
        I'm beginning to feel a bit self-conscious about the fact that I seem to do so much self-promotion on seqnaswers recently. Apologies to those of you bored of hearing from me, I'll be brief. Martine - there is a de novo assembly variant caller, called Cortex, designed for this kind of question, which works on haploid and diploid organisms.
        See

        and the paper here:

        It will assemble and look for differences directly, and spit out variants in
        flank-allele1-allele2-flank format, which if you want can then be turned into VCF with respect to your X outgroup/related genome, or VCF with respect to a consensus.

        If, as seems to be the case, you have a tonne of coverage, you can tell it to only pay attention to high quality reads.

        Comment

        • nickloman
          Senior Member
          • Jul 2009
          • 355

          #5
          That sounds cool Zam, I definitely will be giving it a try for my own projects.

          Comment

          • zmartine
            Junior Member
            • Nov 2009
            • 4

            #6
            Zam, this looks like just what I was searching for!

            Comment

            • MQ-BCBB
              Member
              • May 2009
              • 25

              #7
              error

              Originally posted by nickloman View Post
              That sounds cool Zam, I definitely will be giving it a try for my own projects.
              Hello Zam, I am working with Martine in trying to run cortex. I ran the following line and received the error below:

              command:
              cortex_var_31_c1 --pe_list PyN67C_R1.fastq,PyN67C_R2.fastq --format FASTQ --quality_score_threshold 5 --remove_pcr_duplicates --remove_seq_errors --dump_binary PyN67C_R1-R2.ctx --kmer_size 21 --max_read_len 50

              error:
              Start loading @HWI-ST183:319:c02a2acxx:4:1101:1017:2087 1:Y:0: and @HWI-ST183:319:c02a2acxx:4:1101:1017:2087 2:Y:0:
              cannot open file:@HWI-ST183:319:c02a2acxx:4:1101:1017:2087 1:Y:0:

              Comment

              • Zam
                Member
                • Apr 2010
                • 51

                #8
                Hi!
                The --pe_list option wants a pair of filelists - ie two lists of FASTQ.
                You have given it a pair of FASTQ. The manual goes through
                some explicit examples. Also worth doing the first two examples
                in the demo directory.
                Cheers
                Zam

                Comment

                • jkbonfield
                  Senior Member
                  • Jul 2008
                  • 146

                  #9
                  You could also look into ICORN http://icorn.sourceforge.net/ as this is an iterative correction of the reference to pull in more data and to find more SNPs. It should work well on closely related species, but I'm not sure how far it can push things.

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  23 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  29 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  39 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  61 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...