Seqanswers Leaderboard Ad

**lewewoo** · 11-11-2011, 09:29 PM

Thanks guys,
This is the first time for us to convert .bcl to fastq, not too much experience, but soon will improve computing ability; just a little updating what we have and what I am doing, maybe have more good ideas about this process.
We have suffered a lot from installing and running CASAVA before did it; we installed a Linux CentOS using Parallels in a MAC pro machine. The MAC pro is pretty powerful: 2X2.93 GHz 6-Core Intel Xeon, 24GB 1333MHz DDR3, and 5TB drive; however, the virtue machine only allows us use 8 cores and 8 GB memory. However, we have data from 209 cycles of 8 lane from HiSeq machine, and the computer has run like 40 hours, we don't want to stop it and hope it can finish the work next Monday. Of course, after this time, we would like to do the following things to improve computing abilities, however, please have a look and give some advice to feasibilities:

1) wipe off MAC OS and install CentOS--- big concerns here, anybody have been successful in this? can share some experience?
2) if purchase a stronger server, any suggestions?
3) to improve the speed of I/O, connecting hard drives with fiber channel?

thanks for any suggestions and comments!

**GenoMax** · 11-14-2011, 07:09 AM

If this is a one-time analysis then it is best to be patient and let the analysis take its course. Hopefully you will not hit a limit (likely culprit will be the RAM available in your virtual machine) somewhere along the way.

If you are planning to do this regularly then you could consider doing one or more the following.

Originally posted by lewewoo View Post

Thanks guys,

1) wipe off MAC OS and install CentOS--- big concerns here, anybody have been successful in this? can share some experience?
2) if purchase a stronger server, any suggestions?
3) to improve the speed of I/O, connecting hard drives with fiber channel?

thanks for any suggestions and comments!

**lewewoo** · 11-14-2011, 03:31 PM

ultimate solution: super power servers!

**lewewoo** · 11-14-2011, 07:43 PM

CASAVA extracted fastq, however, there is a folder called:
Undetermined_indices
it contains fastq files also; basically, how many reads in this folder? how big about this folder? any quality control for this folder?
thanks!

**Prosuite** · 11-14-2011, 08:32 PM

Here are some scenario mentioned in Casava manual regarding undetermined_indices:
--In addition to generating FASTQ files, CASAVA uses a user-created sample sheet to
divide the run output in projects and samples, and stores these in separate directories. If
no sample sheet is provided, all samples will be put in the Undetermined_Indices
directory by lane, and not demultiplexed.
--The Undetermined_indices directory contains the reads with an unresolved or
erroneous index.
--If the majority of reads end up in the 'Undetermined_indices' folder, check
the --use-bases-mask parameter syntax and the length of the index in the
sample sheet. It may be that you need to set the --use-bases-mask option to
the length of the index in the sample sheet + the character 'n' to account for
phasing. Note that you will not be able to see which indices have been placed
in the 'Undetermined_indices' folder
--Unless otherwise specified in the sample sheet, samples without index will end up in
the project folder Undetermined_indices, and in a sample folder named after the lane
(e.g. Sample_lane1).

**lewewoo** · 11-14-2011, 10:00 PM

Thanks for the advice! I read about this on the illumina manual and it will also be great if someone can share in field experience about this...

Fortunately, the majority of the reads was determined; however, I noticed that all the reads of R2 have bad quality: per base GC content is inconsistent with theory predictions, and per base N content are beyond warning level; the QC is done by FastQC and as they said N content may be caused by base callings; since our base callings are RTA, I am thinking if we do OLB maybe can improve it? or this low quality was due to the detection step of R2?

Note: R2 mean the reads of the second part of the pair-end reads; it is said the sequencing process is R1--index--R2...

Thanks for sharing any information and experience!

**GenoMax** · 11-15-2011, 04:35 AM

Unless you have the ".cif" files available for this run you are not going to be able to run OLB. Do you have a specific reason to run OLB (exclude certain tiles, lane, use a specific lane as a control for base calling) otherwise there is likely to be no added benefit.

Originally posted by lewewoo View Post

since our base callings are RTA, I am thinking if we do OLB maybe can improve it? or this low quality was due to the detection step of R2?

Thanks for sharing any information and experience!

**lewewoo** · 11-15-2011, 07:10 AM

so that means this poor quality is not caused by RTA? I have this concern because the Illumina manual said RTA may cause some errors.
Yes I have all the .cft files and cycles lanes... everything...
I will investigate more the data quality today...
Thanks!

**GenoMax** · 11-16-2011, 04:39 AM

You should consider contacting illumina techsupport, if you think there is a specific problem with read 2 from this run. They should be able to set up a remote connection to the machine that generated this data and look into this directly.

Are the basecall plots normal looking for read 2?

Originally posted by lewewoo View Post

so that means this poor quality is not caused by RTA? I have this concern because the Illumina manual said RTA may cause some errors.
Yes I have all the .cft files and cycles lanes... everything...
I will investigate more the data quality today...
Thanks!

**weasteam** · 01-02-2012, 09:58 AM

Originally posted by lewewoo View Post

1) wipe off MAC OS and install CentOS--- big concerns here, anybody have been successful in this? can share some experience?
2) if purchase a stronger server, any suggestions?
3) to improve the speed of I/O, connecting hard drives with fiber channel?

thanks for any suggestions and comments!

I installed the Ubuntu in my Mac, it works very well.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News