Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks guys,
    This is the first time for us to convert .bcl to fastq, not too much experience, but soon will improve computing ability; just a little updating what we have and what I am doing, maybe have more good ideas about this process.
    We have suffered a lot from installing and running CASAVA before did it; we installed a Linux CentOS using Parallels in a MAC pro machine. The MAC pro is pretty powerful: 2X2.93 GHz 6-Core Intel Xeon, 24GB 1333MHz DDR3, and 5TB drive; however, the virtue machine only allows us use 8 cores and 8 GB memory. However, we have data from 209 cycles of 8 lane from HiSeq machine, and the computer has run like 40 hours, we don't want to stop it and hope it can finish the work next Monday. Of course, after this time, we would like to do the following things to improve computing abilities, however, please have a look and give some advice to feasibilities:

    1) wipe off MAC OS and install CentOS--- big concerns here, anybody have been successful in this? can share some experience?
    2) if purchase a stronger server, any suggestions?
    3) to improve the speed of I/O, connecting hard drives with fiber channel?

    thanks for any suggestions and comments!
    Last edited by lewewoo; 11-11-2011, 09:32 PM.

    Comment


    • #17
      If this is a one-time analysis then it is best to be patient and let the analysis take its course. Hopefully you will not hit a limit (likely culprit will be the RAM available in your virtual machine) somewhere along the way.

      If you are planning to do this regularly then you could consider doing one or more the following.


      Originally posted by lewewoo View Post
      Thanks guys,


      1) wipe off MAC OS and install CentOS--- big concerns here, anybody have been successful in this? can share some experience?
      2) if purchase a stronger server, any suggestions?
      3) to improve the speed of I/O, connecting hard drives with fiber channel?

      thanks for any suggestions and comments!

      Comment


      • #18
        ultimate solution: super power servers!

        Comment


        • #19
          CASAVA extracted fastq, however, there is a folder called:
          Undetermined_indices
          it contains fastq files also; basically, how many reads in this folder? how big about this folder? any quality control for this folder?
          thanks!

          Comment


          • #20
            Here are some scenario mentioned in Casava manual regarding undetermined_indices:
            --In addition to generating FASTQ files, CASAVA uses a user-created sample sheet to
            divide the run output in projects and samples, and stores these in separate directories. If
            no sample sheet is provided, all samples will be put in the Undetermined_Indices
            directory by lane, and not demultiplexed.
            --The Undetermined_indices directory contains the reads with an unresolved or
            erroneous index.
            --If the majority of reads end up in the 'Undetermined_indices' folder, check
            the --use-bases-mask parameter syntax and the length of the index in the
            sample sheet. It may be that you need to set the --use-bases-mask option to
            the length of the index in the sample sheet + the character 'n' to account for
            phasing. Note that you will not be able to see which indices have been placed
            in the 'Undetermined_indices' folder
            --Unless otherwise specified in the sample sheet, samples without index will end up in
            the project folder Undetermined_indices, and in a sample folder named after the lane
            (e.g. Sample_lane1).

            Comment


            • #21
              Thanks for the advice! I read about this on the illumina manual and it will also be great if someone can share in field experience about this...

              Fortunately, the majority of the reads was determined; however, I noticed that all the reads of R2 have bad quality: per base GC content is inconsistent with theory predictions, and per base N content are beyond warning level; the QC is done by FastQC and as they said N content may be caused by base callings; since our base callings are RTA, I am thinking if we do OLB maybe can improve it? or this low quality was due to the detection step of R2?

              Note: R2 mean the reads of the second part of the pair-end reads; it is said the sequencing process is R1--index--R2...

              Thanks for sharing any information and experience!

              Comment


              • #22
                Unless you have the ".cif" files available for this run you are not going to be able to run OLB. Do you have a specific reason to run OLB (exclude certain tiles, lane, use a specific lane as a control for base calling) otherwise there is likely to be no added benefit.

                Originally posted by lewewoo View Post

                since our base callings are RTA, I am thinking if we do OLB maybe can improve it? or this low quality was due to the detection step of R2?


                Thanks for sharing any information and experience!

                Comment


                • #23
                  so that means this poor quality is not caused by RTA? I have this concern because the Illumina manual said RTA may cause some errors.
                  Yes I have all the .cft files and cycles lanes... everything...
                  I will investigate more the data quality today...
                  Thanks!

                  Comment


                  • #24
                    You should consider contacting illumina techsupport, if you think there is a specific problem with read 2 from this run. They should be able to set up a remote connection to the machine that generated this data and look into this directly.

                    Are the basecall plots normal looking for read 2?

                    Originally posted by lewewoo View Post
                    so that means this poor quality is not caused by RTA? I have this concern because the Illumina manual said RTA may cause some errors.
                    Yes I have all the .cft files and cycles lanes... everything...
                    I will investigate more the data quality today...
                    Thanks!

                    Comment


                    • #25
                      Originally posted by lewewoo View Post

                      1) wipe off MAC OS and install CentOS--- big concerns here, anybody have been successful in this? can share some experience?
                      2) if purchase a stronger server, any suggestions?
                      3) to improve the speed of I/O, connecting hard drives with fiber channel?

                      thanks for any suggestions and comments!
                      I installed the Ubuntu in my Mac, it works very well.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      9 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      67 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X