Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CASAVA, Pipeline 1.3

    I've just looked through the just released CASAVA manual. Whilst it would seem to have some new tools for visualising/calling SNPs and RNAseq, it seems totally dependent on ELAND alignments.

    We havent used ELAND since we started read lengths of 45bp+. We didnt find it very good for >32bp.

    Am I missing something here?

    david

  • #2
    Are you committed to CASAVA? If not, can you tell me what applications you are interested in?

    Comment


    • #3
      Couldn't you use something like Bowtie, which yields a similar kind of output, and bend it into ELAND format?

      Comment


      • #4
        I thought CASAVA was an Illumina product, as is Eland. I don't think you're missing anything - of course they want you to use their products end to end. (= On the other hand, even the WTSS SNP & Exon expression software I wrote handles more than one input format, so I think it's just Illumina trying to bring people back into the Eland fold.

        Frankly, there are so many SNP callers out there, until I see some solid reason to switch to CASAVA (and back to Eland), its not even on my radar.
        The more you know, the more you know you don't know. —Aristotle

        Comment


        • #5
          I agree with you

          but - quick, one-stop, vendor-supported and visualization support for investigators are some reasons pro it, umm maybe, as I have not had the chance to look yet
          --
          bioinfosm

          Comment


          • #6
            Well, for people just getting into the game, I'm sure it'll be easy to set up and get running.

            That's how Microsoft managed to get 95% of the internet using population using Internet Explorer for a while.... (-;
            The more you know, the more you know you don't know. —Aristotle

            Comment


            • #7
              qseq.txt format

              Originally posted by dvh View Post
              I've just looked through the just released CASAVA manual. Whilst it would seem to have some new tools for visualising/calling SNPs and RNAseq, it seems totally dependent on ELAND alignments.

              We havent used ELAND since we started read lengths of 45bp+. We didnt find it very good for >32bp.

              Am I missing something here?

              david
              Also seq.txt & prb.txt now "optional" bustard output, default being qseq.txt, but not much info on this format in the pipeline manual. As we haven't updated the software yet, does anyone have some new qseq.txt files to play with information of the q scores used?

              Comment


              • #8
                For RNAseq there are systems such as ERANGE and FindFeatures (Vancouver SR package).
                ERANGE seems quite limited to specific genomes and I'm working with certain genomes that have no reference sequence.
                I have not tried FindFeatures.

                It would be good to have a generic system to do tag counting in samples given a set of known exon positions and mapping results from alignment to whole genome, mRNA and exon junctions.

                Comment


                • #9
                  FindFeatures is a fairly simple program. I don't think anyone outside of the BC Genome Science Centre is using it - although if anyone has the urge to try it, I'm more than happy to provide support.

                  Anthony
                  The more you know, the more you know you don't know. —Aristotle

                  Comment


                  • #10
                    Originally posted by apfejes View Post
                    Well, for people just getting into the game, I'm sure it'll be easy to set up and get running.

                    That's how Microsoft managed to get 95% of the internet using population using Internet Explorer for a while.... (-;
                    Hi apfejes

                    Disclaimer: I work at Illumina and am one of the developers of CASAVA, but these are my personal opinions.

                    As I see it, the beauty of sequencing data is that once you've got it into As,Cs,Gs and Ts it becomes a 'commodity item' and I think trying to compete with the combined brainpower of the entire sequencing community by trying to 'lock users in' beyond that stage would be extremely tough, and it's not clear to me if we would gain much by doing so.

                    CASAVA is more meant to make it easier to process datasets on 'human genome resequencing' scales - a human genome at say 30x sequence coverage presents logistical issues beyond those associated with, say, a ChipSeq dataset of a couple of Gbases (and I in no way wish to trivialize those, I know this is already a dauntingly large dataset in many ways) and now we are not so far away from "1 run (from whatever platform) = 1 genome" we don't want these to stand in the way of the science. Ideally algorithm developers would be able concentrate on algorithms and not file formats and so forth.

                    The idea is that 'under the hood' CASAVA handles the necessary sorting, binning and filtering of reads. SNP callers and other downstream applications then access the alignment data they need by making function calls to a library.

                    The software evolved from the code we used for our Yoruba genome analysis and can be used as a standalone genome analysis tool. The currently released version only includes the SNP calling module but internally we have modules for e.g. short indel and structural variant detection that we are looking to move towards release. CASAVA is also used as a backend to provide input data for the Genome Studio software we are releasing.

                    I would actually be very happy if people were to use CASAVA to process MAQ and/or BowTie data and I imagine it would be quite straightforward to write a parser, lack of time is the only reason we haven't looked at this ourselves.

                    Cheers

                    Tony

                    Comment


                    • #11
                      Hi Tony,

                      Thanks for the reply - I hadn't meant to imply that Illumina was working towards some grand evil plan to take over the sequence analysis space, as microsoft has done in the past with the Windows desktop - only that Illumina is providing a tool the way that microsoft did, where it will now be easier to use the one that comes with the tool "out of the box" than to move on to something else. (And that's not necessarily a bad thing.)

                      As far as it not having parsers because you haven't had time to write them, I certainly understand the phenomenon - I've run into it several times myself. If the software were open source, or the source code were publicly available, others might be willing to contribute those missing parts, which would be an option for allowing other aligners to be used. (I suspect that's not in illumina's best interest, however, so I'm not really expecting to see that.)

                      In any case, I think the major issue I have is that I have only heard much about CASAVA second hand in meetings and otherwise, so I'm likely missing key information. Perhaps you can point us to some literature on the web that would be able to fill in the missing pieces for the rest of us. I'd certainly appreciate reading more than just marketing pieces - which I haven't yet come across. Is there something I've missed out there?

                      Anthony
                      The more you know, the more you know you don't know. —Aristotle

                      Comment


                      • #12
                        Hi apfejes

                        Thanks for the reply, you make several good points. At the moment the software is available on the same basis as our existing 'analysis pipeline' software package - ie instrument owners can download it free, including access to the source code. Unfortunately (much as I might like to) it's not for me to comment on whether our policy on that might change in the future.

                        We've presented posters on it at a couple of conferences recently and there's a sizeable manual that comes with it. As it's a new venture I think we're adopting somewhat of a softly softly approach to releasing it - some people will try it whether you publicize it or not, and that gives us feedback that we can add to the ideas we already have about how it can evolve to best meet users' needs. I think you're right though that a tech note aimed at the kind of folks who read this board would be a good idea.

                        We're not really proprietary about which aligners or other tools people use - it's their data after all. Personally I see things moving towards more of a decoupling between alignment tools and downstream tools (SNP callers and so forth) that use alignments. I think the SAMTools project is a very positive step in that direction, it seems to me it has many of the same aims as CASAVA.

                        Cheers

                        Tony

                        Comment


                        • #13
                          Hi Tony,
                          I've been given a couple of qseq.txt files to align for clients and the format looks pretty simple except for the quality values. I'm seeing a lot of B's in the quality string and it looks like this is the lowest quality value. In earlier _sequence.txt files quality values were in form log(p/(1-p)) + '@' and codes went as low as ';'
                          These qseq.txt files look like you may be using phred type log(p) + '@'. Any chance you could enlighten us.

                          Thanks, Colin
                          Last edited by sparks; 02-23-2009, 07:15 PM. Reason: formula correction

                          Comment


                          • #14
                            Hi Colin

                            You have it spot on, they are now in Phred format. Just to state it fully for the benefit of others: ASCII='@'+10*log10(1/p), p being the estimated probability of the base being in error. This change was made as of Pipeline 1.3.

                            Cheers

                            Tony

                            Comment


                            • #15
                              Hi Tony,
                              Thanks for the that, I'm sure you are right though some Illumina documentation being sent out with export files still talks about -5 being a valid quality value so you guys should check your documentation.
                              I've also noticed in the qseq files I have that the lowest code is a B which translates to a Phred score of 2. This happens even for bases called as '.'. If Perr was 0.75 then Phred would be 1.24 so it looks like you round up to 2. This is might be of interest to people who are using qualities in alignment and in SNP calling. I did like the previous Solexa scale as it gave a finer resolution for higher Perr values.

                              Thanks again., Colin

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X