Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Novoalign update?

    Hi Colin.
    I have been working with Novoalign a bit and am finding it useful in picking up indels and SNPs missed by other aligners. I am wondering if it can also pick up structural aberrations that I have missed using other approaches. Is there an update on the timelines for the following features, mentioned in the documentation:

    "novostruct Uses paired end alignments to identify locations where the individual being sequenced is structurally different to the
    reference sequences. This could be inter sequence variations such as large insertions, deletions and inversions or inter sequence variations.

    Jul'08

    novoasm Using results from novoalign and novopair calls SNPs and short indels.
    ACE format output is provided for viewing of alignments.

    Aug '08

    novodensity Read density analysis for copy number, expression level and, peak detection.

    Aug '08"

    ?

    Thanks,

    Ryan
    Last edited by myrna; 08-11-2008, 12:39 PM.

    Comment


    • #17
      Hey Myrna,

      If you're interested in knowing more about what we're doing with SNP/Assembly, see http://www.novocraft.com/wiki/tiki-v...desc&forumId=1

      Comment


      • #18
        Novocraft and Maq

        Thanks for the link, this was just what I needed. I will give the Novoalign->Eland->Maq conversion a try. What do you see as the largest problem/concern caused by the loss of mapping scores in doing this conversion? Do you think there would be some way to scale the Novoalign scores to Maq's mapping quality scale such that you could include them?

        Comment


        • #19
          This is an area we're trying to perfect at the moment. Basically you gotta know that novoalign mapping quality scores are meant to be as close to maq mapping qualities as we hope to get. Therefore scaling may not be necessary if we can show that low quality novoalign mapping qualities are the same as those for maq , and vice versa for maq.
          The .map file is the key here because it contains this information and we're neglecting these by using eland format Therefore it's crucial for us to go from the text format in novoalign to the maq format whilst keeping all that useful information.
          The good news is that because we're mapping more with novoalign you have more SNPs being called. We hope to have this format conversion with quality scores ready by next week.
          Perhaps you can send me a private msg and I can provide you with some charts showing how these mapping qualities compare between novoalign and maq??

          Comment


          • #20
            novoalign2maq

            I would think that using the export file format as an intermediate (instead of the eland format) would allow you to get around the base (and mapping) quality issue. Heng Li, have you (or anyone else) attempted to convert novo* outputs into native Maq alignment files?

            Comment


            • #21
              Hey Myrna,
              It's ready to try out. Pls see

              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

              Comment


              • #22
                Hi Myrna,
                We've aded a function to maq that converts native novo... report formats into maq map format. The source code is available in our forum. This conversion maintains the quality values and also converts gapped alignments, which is not possible if conversion is done from the Eland report format.
                With this conversion you can use maq to do the assemblies and call SNPs and Indels. You can even use maq indelpe on single end reads aligned by novoalign and then converted to maq.
                Our plans for our own assembly, SNP caller etc are running a bit behind.
                Cheers, Colin

                Comment


                • #23
                  Mulithreading now supported in NovoCraft Aligners

                  Multithreading has been added to novoalign and novopaired. The results look really good.

                  We ran some tests on our new multithreaded version to evaluate alignment performance on a small set of 200K Illumina reads versus the Human Genome NCBI36. The 200x36x37-071207_EAS51_0064-s_2_1.fastq and 200x36x37-071207_EAS51_0064-s_2_2.fastq FASTQ-formatted files were downloaded from the ftp://ftp.ncbi.nih.gov/pub/TraceDB/S...A000271/fastq/ FTP site. The first 200,000 reads in these files were used.
                  A linux server with eight 2.33 Ghz CPU Cores and 32Gb RAM were used. Time was monitored from the elapsed time figure in novopaired/novoalign output reports using UNIX tail.


                  CPU usage was monitored and it was found that using 8/8 cores didnt improve performance much over using 7/8 cores.

                  There appears to be a significant gain in performance of the multithreaded versions of novopaired and novoalign ( figure 1).



                  Table 1: Performance of multithreaded novoalign and novopaired on 200,000 Illumina reads searched against the NCBI36 Human Genome





                  Columns 4 and 5 are % of time taken with 1 CPU therefore 4 Cores takes 1/4 time of using 1 CPU, and 7 cores 14.8% (table 1). Each alignment process consumed at most 16.1Gb (52% RAM).

                  Comment


                  • #24
                    I finally got to use novoalign and use novo2maq to make SNP calls. It seems the depth of coverage I see on SNP calls from novo aligned data is much lesser than that from MAQ.. almost 1/3.

                    Why would that be?

                    Of the 4 million reads in the lane, novoalign mapped only 1.6 million (all default params)
                    --
                    bioinfosm

                    Comment


                    • #25
                      Bioinfosm , that's interesting. I'd expect that you would firstly find more high mapping quality reads with novoalign and that would improve the depth. However, if it's doing the opposite then it is something we'll need to look at.

                      If you've run the same data with MAQ then I assume you're using fastq-formatted reads.
                      I'm interested to see what the `maq mapstat' output for the novoalign and maq .map files are.
                      Something else to look at is when you did novo2maq did it convert the headers correctly. This is easily checked with maq mapview.

                      Could you perhaps send me a tail of the novoalign output and version as well?


                      Originally posted by bioinfosm View Post
                      I finally got to use novoalign and use novo2maq to make SNP calls. It seems the depth of coverage I see on SNP calls from novo aligned data is much lesser than that from MAQ.. almost 1/3.

                      Why would that be?

                      Of the 4 million reads in the lane, novoalign mapped only 1.6 million (all default params)

                      Comment


                      • #26
                        Hi Bioinfosm,
                        Further to Zees request could you include a head of the novoalign output as well as the tail.

                        Can you email directly to support at novocraft dot com

                        Thanks, Colin

                        Comment


                        • #27
                          Thanks for the response. There was something with the headers which I noticed and correcting that gave me a lot more reads mapped by novoalign compared to maq. However, the qualities of some of them are pretty low, along with lots of flags when looking at the mapstat output.

                          I will email that data to support for further analysis...
                          btw whats your homopolymer filter?
                          --
                          bioinfosm

                          Comment


                          • #28
                            The homopolymer filter picks up reads that are all A's or all C's etc. i.e. the same base called in every position in the read. Some Illumina read files have a significiant percentage of these. They can be caused by dust on the slide or by camera picking up the edge of a lane.

                            Comment


                            • #29
                              With regard the flag values, the novo2maq module was incorrectly setting paired end flags on single end reads. I've posted and updated source file in our support forum at www.novocraft.com

                              Comment


                              • #30
                                Flags

                                Oh no! I was just reveling in the fact that novo2maq did set flags as paired in single end data. This has allowed me to run indelpe and find some very convincing indels. Not sure how many of them are real, but looking at the coverage a lot are convincing by eye. Without the ability to run indelpe, many of these sites are mistakenly called SNPs. Is there still an option to pull the indels from a novoalign output? I suppose as long as flag 130 is still set it should work fine. I understand the rationale that Maq only trusts indels from paired data (and only does gapped alignment when reads are anchored by a mate), but I would like to get Colin's opinion about whether we can trust indels from single end reads (and if so, what mapping quality thresholds?)

                                Thanks,

                                Ryan
                                Last edited by myrna; 09-13-2008, 07:02 AM.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X