Seqanswers Leaderboard Ad

**rs705** · 01-28-2009, 06:41 AM

Are you committed to CASAVA? If not, can you tell me what applications you are interested in?

**swbarnes2** · 01-28-2009, 12:23 PM

Couldn't you use something like Bowtie, which yields a similar kind of output, and bend it into ELAND format?

**apfejes** · 01-28-2009, 01:51 PM

I thought CASAVA was an Illumina product, as is Eland. I don't think you're missing anything - of course they want you to use their products end to end. (= On the other hand, even the WTSS SNP & Exon expression software I wrote handles more than one input format, so I think it's just Illumina trying to bring people back into the Eland fold.

Frankly, there are so many SNP callers out there, until I see some solid reason to switch to CASAVA (and back to Eland), its not even on my radar.

**bioinfosm** · 01-29-2009, 10:51 AM

I agree with you

but - quick, one-stop, vendor-supported and visualization support for investigators are some reasons pro it, umm maybe, as I have not had the chance to look yet

**apfejes** · 01-29-2009, 10:55 AM

Well, for people just getting into the game, I'm sure it'll be easy to set up and get running.

That's how Microsoft managed to get 95% of the internet using population using Internet Explorer for a while.... (-;

**GRT** · 02-01-2009, 08:40 AM

qseq.txt format

Originally posted by dvh View Post

I've just looked through the just released CASAVA manual. Whilst it would seem to have some new tools for visualising/calling SNPs and RNAseq, it seems totally dependent on ELAND alignments.

We havent used ELAND since we started read lengths of 45bp+. We didnt find it very good for >32bp.

Am I missing something here?

david

Also seq.txt & prb.txt now "optional" bustard output, default being qseq.txt, but not much info on this format in the pipeline manual. As we haven't updated the software yet, does anyone have some new qseq.txt files to play with information of the q scores used?

**zee** · 02-01-2009, 05:44 PM

For RNAseq there are systems such as ERANGE and FindFeatures (Vancouver SR package).
ERANGE seems quite limited to specific genomes and I'm working with certain genomes that have no reference sequence.
I have not tried FindFeatures.

It would be good to have a generic system to do tag counting in samples given a set of known exon positions and mapping results from alignment to whole genome, mRNA and exon junctions.

**apfejes** · 02-01-2009, 07:25 PM

FindFeatures is a fairly simple program. I don't think anyone outside of the BC Genome Science Centre is using it - although if anyone has the urge to try it, I'm more than happy to provide support.

Anthony

**coxtonyj** · 02-09-2009, 03:56 AM

Originally posted by apfejes View Post

Well, for people just getting into the game, I'm sure it'll be easy to set up and get running.

That's how Microsoft managed to get 95% of the internet using population using Internet Explorer for a while.... (-;

Hi apfejes

Disclaimer: I work at Illumina and am one of the developers of CASAVA, but these are my personal opinions.

As I see it, the beauty of sequencing data is that once you've got it into As,Cs,Gs and Ts it becomes a 'commodity item' and I think trying to compete with the combined brainpower of the entire sequencing community by trying to 'lock users in' beyond that stage would be extremely tough, and it's not clear to me if we would gain much by doing so.

CASAVA is more meant to make it easier to process datasets on 'human genome resequencing' scales - a human genome at say 30x sequence coverage presents logistical issues beyond those associated with, say, a ChipSeq dataset of a couple of Gbases (and I in no way wish to trivialize those, I know this is already a dauntingly large dataset in many ways) and now we are not so far away from "1 run (from whatever platform) = 1 genome" we don't want these to stand in the way of the science. Ideally algorithm developers would be able concentrate on algorithms and not file formats and so forth.

The idea is that 'under the hood' CASAVA handles the necessary sorting, binning and filtering of reads. SNP callers and other downstream applications then access the alignment data they need by making function calls to a library.

The software evolved from the code we used for our Yoruba genome analysis and can be used as a standalone genome analysis tool. The currently released version only includes the SNP calling module but internally we have modules for e.g. short indel and structural variant detection that we are looking to move towards release. CASAVA is also used as a backend to provide input data for the Genome Studio software we are releasing.

I would actually be very happy if people were to use CASAVA to process MAQ and/or BowTie data and I imagine it would be quite straightforward to write a parser, lack of time is the only reason we haven't looked at this ourselves.

Cheers

Tony

**apfejes** · 02-09-2009, 11:47 AM

Hi Tony,

Thanks for the reply - I hadn't meant to imply that Illumina was working towards some grand evil plan to take over the sequence analysis space, as microsoft has done in the past with the Windows desktop - only that Illumina is providing a tool the way that microsoft did, where it will now be easier to use the one that comes with the tool "out of the box" than to move on to something else. (And that's not necessarily a bad thing.)

As far as it not having parsers because you haven't had time to write them, I certainly understand the phenomenon - I've run into it several times myself. If the software were open source, or the source code were publicly available, others might be willing to contribute those missing parts, which would be an option for allowing other aligners to be used. (I suspect that's not in illumina's best interest, however, so I'm not really expecting to see that.)

In any case, I think the major issue I have is that I have only heard much about CASAVA second hand in meetings and otherwise, so I'm likely missing key information. Perhaps you can point us to some literature on the web that would be able to fill in the missing pieces for the rest of us. I'd certainly appreciate reading more than just marketing pieces - which I haven't yet come across. Is there something I've missed out there?

Anthony

**coxtonyj** · 02-13-2009, 05:29 AM

Hi apfejes

Thanks for the reply, you make several good points. At the moment the software is available on the same basis as our existing 'analysis pipeline' software package - ie instrument owners can download it free, including access to the source code. Unfortunately (much as I might like to) it's not for me to comment on whether our policy on that might change in the future.

We've presented posters on it at a couple of conferences recently and there's a sizeable manual that comes with it. As it's a new venture I think we're adopting somewhat of a softly softly approach to releasing it - some people will try it whether you publicize it or not, and that gives us feedback that we can add to the ideas we already have about how it can evolve to best meet users' needs. I think you're right though that a tech note aimed at the kind of folks who read this board would be a good idea.

We're not really proprietary about which aligners or other tools people use - it's their data after all. Personally I see things moving towards more of a decoupling between alignment tools and downstream tools (SNP callers and so forth) that use alignments. I think the SAMTools project is a very positive step in that direction, it seems to me it has many of the same aims as CASAVA.

Cheers

Tony

**sparks** · 02-23-2009, 07:15 PM

Hi Tony,
I've been given a couple of qseq.txt files to align for clients and the format looks pretty simple except for the quality values. I'm seeing a lot of B's in the quality string and it looks like this is the lowest quality value. In earlier _sequence.txt files quality values were in form log(p/(1-p)) + '@' and codes went as low as ';'
These qseq.txt files look like you may be using phred type log(p) + '@'. Any chance you could enlighten us.

Thanks, Colin

**coxtonyj** · 02-24-2009, 01:25 AM

Hi Colin

You have it spot on, they are now in Phred format. Just to state it fully for the benefit of others: ASCII='@'+10*log10(1/p), p being the estimated probability of the base being in error. This change was made as of Pipeline 1.3.

Cheers

Tony

**sparks** · 02-24-2009, 07:26 AM

Hi Tony,
Thanks for the that, I'm sure you are right though some Illumina documentation being sent out with export files still talks about -5 being a valid quality value so you guys should check your documentation.
I've also noticed in the qseq files I have that the lowest code is a B which translates to a Phred score of 2. This happens even for bases called as '.'. If Perr was 0.75 then Phred would be 1.24 so it looks like you round up to 2. This is might be of interest to people who are using qualities in alignment and in SNP calling. I did like the previous Solexa scale as it gave a finer resolution for higher Perr values.

Thanks again., Colin

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 32 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 48 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

CASAVA, Pipeline 1.3

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News