SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can the upcoming Sandy Bridge i7 Extreme assemble a genome? ymc Bioinformatics 30 06-06-2012 06:38 AM
help. Casava 1.8 demultiplexing senpeng Illumina/Solexa 1 09-19-2011 07:40 AM
CASAVA v1.8 with indels tonio100680 Bioinformatics 3 08-19-2011 04:53 AM
Demultiplexing and CASAVA 1.7 tonio100680 Bioinformatics 14 06-16-2011 10:48 PM
Upcoming in 2009? dsturgill Events / Conferences 1 11-07-2008 01:41 AM

Reply
 
Thread Tools
Old 01-20-2011, 11:44 AM   #21
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by selen View Post
Semyon,

Within the sample folder, the name of each fastq file provides the sample, index, lane and read information. What about the last three digit (001, 002..)? Do they represent the repeated analysis of the same data?

No, the last digits represent the splitting up of a large file into smaller files. Certain cluster environments will be unhappy if individual files go beyond a certain size. There will be a configurable parameter to control the maximum number of entries in any one fastq file.

Thanks,
Semyon
skruglyak is offline   Reply With Quote
Old 01-20-2011, 02:15 PM   #22
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by Auction View Post
Semyon

It's great to hear the news, and I'm very concern on the speed for bcl converter in CASAVA1.8. How many hours do we need to get a compressed FASTQ for a typical Hiseq 2000 run (with and without multiplexing)? And how about its parallelization support? Thanks.

Ying

Hi Ying,

there will be parallelization support. Exact timings are difficult to provide because there is a dependence on compute environment. It is also important to know whether you are CPU or I/O bound. We even see large variation depending on cluster utilization. As a very rough approximation, I would estimate 20 CPU hours for a standard HiSeq run, so maybe 2.5 hours if you use 8 CPUs.

thanks,
Semyon
skruglyak is offline   Reply With Quote
Old 01-21-2011, 07:57 AM   #23
Auction
Member
 
Location: california

Join Date: Jul 2009
Posts: 24
Default

Quote:
Originally Posted by skruglyak View Post
Hi Ying,

there will be parallelization support. Exact timings are difficult to provide because there is a dependence on compute environment. It is also important to know whether you are CPU or I/O bound. We even see large variation depending on cluster utilization. As a very rough approximation, I would estimate 20 CPU hours for a standard HiSeq run, so maybe 2.5 hours if you use 8 CPUs.

thanks,
Semyon
Semyon

Thank you for the information. And how many additional time (percentage) will demultiplexing introduce in the new version? In additional, in CASAVA 1.7 "BAM output from RNA-Seq builds does not contain information on how alignments span exons. Such reads are represented by a separate BAM record for each partial exon alignment." Such setting makes it difficut to visualize such reads in IGV. Will the CASAVA 1.8 support the flag like "27M140N3M" as in Tophat, therefore the user can easy detect the reads that span exons? Thanks.

Ying

Ying
Auction is offline   Reply With Quote
Old 01-21-2011, 10:14 AM   #24
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by kmcarr View Post
I have a concern about #2. Currently the illumina2srf tool uses the qseq files as input to generate the .srf files which are required for submission of NGS sequencing data to the NCBI or EBI SRAs. Will it still be possible to generate qseqs or would it be possible for the CASAVA team to work with the developers of the sequenceread toolkit to allow it to work directly from the .bcl files?

We will be working with the archives to make sure that the submission process is smooth. I will post more details as they become available. The current thinking is that BAM files will become the submission format and there will no longer be a need to go from qseq to srf. Submission directly from .bcl would not work for a variety of reasons.

Thanks,
Semyon
skruglyak is offline   Reply With Quote
Old 01-21-2011, 01:50 PM   #25
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by Auction View Post
Semyon

Thank you for the information. And how many additional time (percentage) will demultiplexing introduce in the new version? In additional, in CASAVA 1.7 "BAM output from RNA-Seq builds does not contain information on how alignments span exons. Such reads are represented by a separate BAM record for each partial exon alignment." Such setting makes it difficut to visualize such reads in IGV. Will the CASAVA 1.8 support the flag like "27M140N3M" as in Tophat, therefore the user can easy detect the reads that span exons? Thanks.

Ying

Ying
Hi Ying,
I believe that demultiplexing time will be "in the noise" but I will post once we have some better numbers. Regarding RNA-Seq, the new version should meet your needs. BAM files produced in CASAVA 1.8 RNA-Seq builds use the CIGAR skip character ("N") to represent intron spanning reads as in Tophat's SAM output. These files can be visualized in the Broad IGV without modification.

Thanks,
Semyon
skruglyak is offline   Reply With Quote
Old 01-22-2011, 12:18 AM   #26
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by skruglyak View Post
[...]
2. The converter will produce compressed FASTQ files rather than qseq files.
[...]
Does this mean that I finally can use CASAVA 1.8 with just my fastq files without having the qseqs and some other files? People sometimes have no access to the original qseqs files and, as a consequence, were not able to use CASAVA (correct me if I am wrong).

cheers,
Sven
sklages is offline   Reply With Quote
Old 01-24-2011, 08:30 AM   #27
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by sklages View Post
Does this mean that I finally can use CASAVA 1.8 with just my fastq files without having the qseqs and some other files? People sometimes have no access to the original qseqs files and, as a consequence, were not able to use CASAVA (correct me if I am wrong).

cheers,
Sven
Hi Sven,

You certainly will not need qseq files. If you have nothing but fastq, I guess you could use ELAND in stand-alone mode, but you would be missing the statistics. To really use CASAVA 1.8, you would also need the fastq files to be in a simple directory structure described in the document and you would need some config files. Of course, if you just start with our new bcl converter (to be distributed with 1.8), the directory structure, the fastq, and the config files will all be generated.

Thanks,
Semyon
skruglyak is offline   Reply With Quote
Old 01-24-2011, 10:08 PM   #28
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by skruglyak View Post
Hi Sven,

You certainly will not need qseq files. If you have nothing but fastq, I guess you could use ELAND in stand-alone mode, but you would be missing the statistics. To really use CASAVA 1.8, you would also need the fastq files to be in a simple directory structure described in the document and you would need some config files. Of course, if you just start with our new bcl converter (to be distributed with 1.8), the directory structure, the fastq, and the config files will all be generated.

Thanks,
Semyon
Hi Semyon,

for the new GAII/HiSeq2000 runs I will surely use the software as intended, but as usual, other people read about CASAVA's capabilities/performance and want their datasets to be mapped and, very important, to be variant-called again. If the datasets are "old", we don't keep them online anymore, what is left in this case, are the user's FastQ files. That's why I am asking. The fastq files themselves should be enough for mapping and SNP calling!?

cheers,
Sven
sklages is offline   Reply With Quote
Old 01-25-2011, 05:01 AM   #29
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by sklages View Post
Hi Semyon,

for the new GAII/HiSeq2000 runs I will surely use the software as intended, but as usual, other people read about CASAVA's capabilities/performance and want their datasets to be mapped and, very important, to be variant-called again. If the datasets are "old", we don't keep them online anymore, what is left in this case, are the user's FastQ files. That's why I am asking. The fastq files themselves should be enough for mapping and SNP calling!?

cheers,
Sven
If CASAVA 1.8 will take plain FASTQ for input for mapping and SNP calling (to be confirmed), I would expect you'd have to convert your old Solexa/Illumina 1.3+ FASTQ files into Sanger FASTQ files first.
maubp is offline   Reply With Quote
Old 01-25-2011, 05:32 AM   #30
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by maubp View Post
If CASAVA 1.8 will take plain FASTQ for input for mapping and SNP calling (to be confirmed), I would expect you'd have to convert your old Solexa/Illumina 1.3+ FASTQ files into Sanger FASTQ files first.
Well, that's ok, just (another) conversion. If I'd need some run specific files ... this would make things more complicated (again) ..
sklages is offline   Reply With Quote
Old 01-25-2011, 07:33 AM   #31
jeny
Member
 
Location: france

Join Date: Mar 2010
Posts: 16
Default

In my point of view there are goods changes and i am happy to learn about this new version.
But, why doesn't illumina make aware its users about these changes (simply by mail)? Information is badly communicated and I hope there will be changes in this aspect too.
jeny is offline   Reply With Quote
Old 01-25-2011, 07:40 AM   #32
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by jeny View Post
In my point of view there are goods changes and i am happy to learn about this new version.
But, why doesn't illumina make aware its users about these changes (simply by mail)?
I'm sure they will be telling registered customers directly, but that may take a while to reach end users.
Quote:
Originally Posted by jeny View Post
Information is badly communicated and I hope there will be changes in this aspect too.
As an outsider things are getting better - posting this announcement here on seqanswers.com is a positive step. Its a good way to reach a lot of developers and end users
maubp is offline   Reply With Quote
Old 01-25-2011, 04:38 PM   #33
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by sklages View Post
Hi Semyon,

for the new GAII/HiSeq2000 runs I will surely use the software as intended, but as usual, other people read about CASAVA's capabilities/performance and want their datasets to be mapped and, very important, to be variant-called again. If the datasets are "old", we don't keep them online anymore, what is left in this case, are the user's FastQ files. That's why I am asking. The fastq files themselves should be enough for mapping and SNP calling!?

cheers,
Sven
I see your point, but there still are some run specific files that are needed. Specifically, the Bustard summary.xml file is used in the generation of alignment statistics. A config file is needed to provide the alignment parameters, but that one is easy to make. I understand the desire to just go from FASTQ for older data sets and will try to think of a path to do this in an easy way.

Thanks,
Semyon
skruglyak is offline   Reply With Quote
Old 02-05-2011, 10:36 PM   #34
rajasereddy
Junior Member
 
Location: India

Join Date: Oct 2010
Posts: 7
Default nice

it is really nice that illumina brings updates regularly

Last edited by rajasereddy; 02-05-2011 at 10:42 PM.
rajasereddy is offline   Reply With Quote
Old 02-05-2011, 10:37 PM   #35
rajasereddy
Junior Member
 
Location: India

Join Date: Oct 2010
Posts: 7
Default Bustardsumary.xml

Dear skruglyak
After upgrade is there any change in the path of bustardsummary.xml file? After upgrade we are unable to locate the file though we are able to get all the data. What would be the possible reason?

Thanks in advance
rajase
rajasereddy is offline   Reply With Quote
Old 02-06-2011, 07:53 PM   #36
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by rajasereddy View Post
Dear skruglyak
After upgrade is there any change in the path of bustardsummary.xml file? After upgrade we are unable to locate the file though we are able to get all the data. What would be the possible reason?

Thanks in advance
rajase
Hi Rajase,

some file locations will be different due to the new directory structure. I am a little confused by your post because we have not yet released the software, so I am not sure how you could have upgraded. Feel free to send me a message directly with your version numbers and details of the situation and I will try to help.

thanks,

Semyon
skruglyak is offline   Reply With Quote
Old 02-21-2011, 11:23 PM   #37
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

.. well, that raises one simple question: when will 1.8 be released?

cheers, Sven
sklages is offline   Reply With Quote
Old 02-22-2011, 08:13 AM   #38
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by sklages View Post
.. well, that raises one simple question: when will 1.8 be released?

cheers, Sven

Hi Sven,

we plan to release to several early access sites next week. We will use the release in our services group and also gather feedback from early access and do the wide release once the feedback is collected and addressed.

Thanks,
Semyon
skruglyak is offline   Reply With Quote
Old 02-23-2011, 05:57 AM   #39
aparna
Member
 
Location: USA

Join Date: Feb 2009
Posts: 15
Default

Hi Seymon,

Are you guys changing anything in demultiplex process?Pretty confused to deal with the directories created during demultiplex process.
Thx.
aparna is offline   Reply With Quote
Old 02-23-2011, 06:34 AM   #40
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by aparna View Post
Hi Seymon,
Are you guys changing anything in demultiplex process?Pretty confused to deal with the directories created during demultiplex process.
Thx.
That's perfectly true! Good point ...
Well, ok, a little script will do data compiling for you, but nevertheless, it's unnecessarily confusing :-)

Sven
sklages is offline   Reply With Quote
Reply

Tags
casava, illumina, secondary analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO