Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks guys,
    This is the first time for us to convert .bcl to fastq, not too much experience, but soon will improve computing ability; just a little updating what we have and what I am doing, maybe have more good ideas about this process.
    We have suffered a lot from installing and running CASAVA before did it; we installed a Linux CentOS using Parallels in a MAC pro machine. The MAC pro is pretty powerful: 2X2.93 GHz 6-Core Intel Xeon, 24GB 1333MHz DDR3, and 5TB drive; however, the virtue machine only allows us use 8 cores and 8 GB memory. However, we have data from 209 cycles of 8 lane from HiSeq machine, and the computer has run like 40 hours, we don't want to stop it and hope it can finish the work next Monday. Of course, after this time, we would like to do the following things to improve computing abilities, however, please have a look and give some advice to feasibilities:

    1) wipe off MAC OS and install CentOS--- big concerns here, anybody have been successful in this? can share some experience?
    2) if purchase a stronger server, any suggestions?
    3) to improve the speed of I/O, connecting hard drives with fiber channel?

    thanks for any suggestions and comments!
    Last edited by lewewoo; 11-11-2011, 09:32 PM.

    Comment


    • #17
      If this is a one-time analysis then it is best to be patient and let the analysis take its course. Hopefully you will not hit a limit (likely culprit will be the RAM available in your virtual machine) somewhere along the way.

      If you are planning to do this regularly then you could consider doing one or more the following.


      Originally posted by lewewoo View Post
      Thanks guys,


      1) wipe off MAC OS and install CentOS--- big concerns here, anybody have been successful in this? can share some experience?
      2) if purchase a stronger server, any suggestions?
      3) to improve the speed of I/O, connecting hard drives with fiber channel?

      thanks for any suggestions and comments!

      Comment


      • #18
        ultimate solution: super power servers!

        Comment


        • #19
          CASAVA extracted fastq, however, there is a folder called:
          Undetermined_indices
          it contains fastq files also; basically, how many reads in this folder? how big about this folder? any quality control for this folder?
          thanks!

          Comment


          • #20
            Here are some scenario mentioned in Casava manual regarding undetermined_indices:
            --In addition to generating FASTQ files, CASAVA uses a user-created sample sheet to
            divide the run output in projects and samples, and stores these in separate directories. If
            no sample sheet is provided, all samples will be put in the Undetermined_Indices
            directory by lane, and not demultiplexed.
            --The Undetermined_indices directory contains the reads with an unresolved or
            erroneous index.
            --If the majority of reads end up in the 'Undetermined_indices' folder, check
            the --use-bases-mask parameter syntax and the length of the index in the
            sample sheet. It may be that you need to set the --use-bases-mask option to
            the length of the index in the sample sheet + the character 'n' to account for
            phasing. Note that you will not be able to see which indices have been placed
            in the 'Undetermined_indices' folder
            --Unless otherwise specified in the sample sheet, samples without index will end up in
            the project folder Undetermined_indices, and in a sample folder named after the lane
            (e.g. Sample_lane1).

            Comment


            • #21
              Thanks for the advice! I read about this on the illumina manual and it will also be great if someone can share in field experience about this...

              Fortunately, the majority of the reads was determined; however, I noticed that all the reads of R2 have bad quality: per base GC content is inconsistent with theory predictions, and per base N content are beyond warning level; the QC is done by FastQC and as they said N content may be caused by base callings; since our base callings are RTA, I am thinking if we do OLB maybe can improve it? or this low quality was due to the detection step of R2?

              Note: R2 mean the reads of the second part of the pair-end reads; it is said the sequencing process is R1--index--R2...

              Thanks for sharing any information and experience!

              Comment


              • #22
                Unless you have the ".cif" files available for this run you are not going to be able to run OLB. Do you have a specific reason to run OLB (exclude certain tiles, lane, use a specific lane as a control for base calling) otherwise there is likely to be no added benefit.

                Originally posted by lewewoo View Post

                since our base callings are RTA, I am thinking if we do OLB maybe can improve it? or this low quality was due to the detection step of R2?


                Thanks for sharing any information and experience!

                Comment


                • #23
                  so that means this poor quality is not caused by RTA? I have this concern because the Illumina manual said RTA may cause some errors.
                  Yes I have all the .cft files and cycles lanes... everything...
                  I will investigate more the data quality today...
                  Thanks!

                  Comment


                  • #24
                    You should consider contacting illumina techsupport, if you think there is a specific problem with read 2 from this run. They should be able to set up a remote connection to the machine that generated this data and look into this directly.

                    Are the basecall plots normal looking for read 2?

                    Originally posted by lewewoo View Post
                    so that means this poor quality is not caused by RTA? I have this concern because the Illumina manual said RTA may cause some errors.
                    Yes I have all the .cft files and cycles lanes... everything...
                    I will investigate more the data quality today...
                    Thanks!

                    Comment


                    • #25
                      Originally posted by lewewoo View Post

                      1) wipe off MAC OS and install CentOS--- big concerns here, anybody have been successful in this? can share some experience?
                      2) if purchase a stronger server, any suggestions?
                      3) to improve the speed of I/O, connecting hard drives with fiber channel?

                      thanks for any suggestions and comments!
                      I installed the Ubuntu in my Mac, it works very well.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin


                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      45 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X