Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Obtaining cluster densities from a Hiseq2500 data set

    Hi,
    I'm making a wrapper for demultiplexing with bcl2fastq2 Conversion Software v2.17. Following demultiplexing I would like to collect various statistics from the run in a file.
    It is fairly easy to get raw cluster counters, PF cluster counts etc. However I'm having problems finding a file that contains information about the cluster density.
    I know that it can be found in the interop folder in binary format, and viewed with the Sequence Analysis Viewer, but I would like to collect it in a single file for later use.

    Any Ideas?

  • #2
    This can be used for parsing the InterOp folder files: https://bitbucket.org/invitae/illuminate

    Comment


    • #3
      Oh, that's excellent. I was wondering just yesterday if something existed to parse from the InterOp binaries. Thanks!

      Comment


      • #4
        I wrote the very first tiny part of that, and people at my company finished it! I still can't believe ILMN doesn't provide anything...grrrr

        Comment


        • #5
          Nice, our sequencing folks were asking me about programmatically storing stuff from those files just yesterday. Now I don't have to reinvent the wheel!

          Comment


          • #6
            Not directly related to original post but with the patterned flowcells, cluster number/density becomes irrelevant (since that is a fixed number). %PF is the thing to watch and numbers are in the demultiplex report coming from bcl2fastq v.2.17.x.
            Last edited by GenoMax; 01-14-2016, 02:17 PM.

            Comment


            • #7
              First of all, thanks to Genomax for leading me to the illuminate program. this provide just what I needed. While located in the runfolder I ran the command "illuminate --tile ." and got a quick summary
              TILE METRICS
              ------------
              Mean Cluster Density: 829082
              Mean PF Cluster Density: 497376
              Total Clusters: 305632923
              Total PF Clusters: 183352987
              Percentage of Clusters PF: 59.991242
              Aligned to PhiX: 0.000014
              Read - PHASING / PRE-PHASING:
              1 - 0.001078 / 0.000119
              2 - 0.000000 / 0.000000
              3 - 0.000955 / 0.000337

              However I needed to get the density per lane.
              Adding --csv to the command " illuminate --tile --csv . > tileinfo.csv" enabled me to parse the information of each tile to a CSV file. In my search for other parses I found the R package savR, and here I got the information on what the different codes are:
              100 Cluster Density
              101 PF Cluster Density
              102 Number of clusters
              103 Number of PF clusters
              400 Control lane

              Now it was fairly simple, to filter lines based on the code, and to sum up the numbers for each lane, and get the average cluster density per lane, I checked and I got the same number as shown in the summery tab using the Sequence analysis viewer :-)

              Comment


              • #8
                I am using Illuminate, its awesome, but it does not support the files from NextSeq... Anyone have any recommendations?

                Comment


                • #9
                  Illumina makes a C++ library available to parse the contents of InterOp folder here: https://github.com/Illumina/interop That should be compatible will all extant Illumina sequencers.

                  Comment


                  • #10
                    Originally posted by angsm View Post
                    I am using Illuminate, its awesome, but it does not support the files from NextSeq... Anyone have any recommendations?
                    From my experience it works just fine on NextSeq runs. The intensity metrics are a little funny due to the 2 color chemistry but other than that there shouldn't be any problems.

                    Comment


                    • #11
                      Ohhh. I did not try the python library itself, it works! I was using the command line and the version was older.

                      Thanks!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin


                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      45 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X