Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bio_informatics
    Senior Member
    • Nov 2013
    • 182

    Why and how: PhiX SPike-in?

    Hi Seqanswer team, and forum members,

    Pardon me if this seems to be a broad and naive question.

    I do not use sequencing kits, neither I do the quality checks of the data that immediately comes out of the sequencer. I deal with the data that comes after all these checks, and thus, I remain oblivious to what goes on behind the scenes before and during sequencing.

    I'm seeing PhiX term in 16s rRNA papers, and its concentration.

    I found these urls:.





    Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)


    Concentration varies from sequencing platform, and what you're sequencing. For low diversity sample, sometimes ~10% is suggested sometimes ~50%.

    My query:
    - I'd like a basic (read as superficial) understanding/definition what is the importance of PhiX?
    How does an external element helps to have a better quality data?

    - Doesn't it contaminate primers, bar code, indices?
    Bioinformaticscally calm
  • HESmith
    Senior Member
    • Oct 2009
    • 512

    #2
    PhiX serves multiple functions:

    1) technical control for clustering reaction (spiking in a known amount of phiX should yield a known number of phiX clusters).
    2) technical control for sequencing accuracy (on-the-fly alignment to the phiX reference is used to calculate the sequencing error rate).
    3) introduction of sequencing diversity in low-complexity libraries (diversity is needed to discriminate clusters and create signal thresholds for base-calling). As the software has improved, the recommended amount of phiX spike-in has decreased.

    The phiX library does not contain an index; these reads are assigned to the Undetermined_indices directory.

    Comment

    • bio_informatics
      Senior Member
      • Nov 2013
      • 182

      #3
      Hi HESmith,

      Thanks for valuable points and your reply.

      Would introducing higher % (~30-50) of PhiX for amplicons, at MiSeq platform would cause bias?
      Bioinformaticscally calm

      Comment

      • microgirl123
        Senior Member
        • Jun 2012
        • 199

        #4
        Older basecalling software on the MiSeq used to require a 50% phiX spike for low-diversity (amplicon samples). More recently, the software has been updated and only requires a 5-10% spike.

        Comment

        • bio_informatics
          Senior Member
          • Nov 2013
          • 182

          #5
          Hi microgirl,
          Thanks for your reply.
          Bioinformaticscally calm

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Originally posted by bio_informatics View Post
            Hi HESmith,

            Thanks for valuable points and your reply.

            Would introducing higher % (~30-50) of PhiX for amplicons, at MiSeq platform would cause bias?
            Why do you think that would happen? phiX does compete for spots on the flowcell (but it should not out-compete specific amplicons) so you do end up losing some sequencing capacity.

            Comment

            • bio_informatics
              Senior Member
              • Nov 2013
              • 182

              #7
              Hi Genomax,
              Thanks for your reply.

              I'm reading posts related to PhiX, joining pieces from here and there, and probably the question is incorrect/naive.

              I read:
              "Depending on what MiSeq software version was used for the sequencing, an inadequate amount of PhiX combined with too high a cluster density will result in an excessive number of sequencing errors leading to a large number of unique sequences."

              I asked wrongly on PhiX concentration, however, it was intended with cluster density.
              Again, I do not know much of these terms, and traversing for the sake of my knowledge, and thus muttering these queries.

              Originally posted by GenoMax View Post
              Why do you think that would happen? phiX does compete for spots on the flowcell (but it should not out-compete specific amplicons) so you do end up losing some sequencing capacity.
              Last edited by bio_informatics; 06-03-2015, 10:28 AM.
              Bioinformaticscally calm

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                As @microgirl123 pointed out above there was a period in early MiSeq days when there were problems sequencing low complexity libraries. Most of these issues have been mitigated via software updates for MiSeq Control Software (MCS) over time.

                Comment

                • bio_informatics
                  Senior Member
                  • Nov 2013
                  • 182

                  #9
                  Hi Genomax,

                  That clears many doubts.
                  Merci!
                  Bioinformaticscally calm

                  Comment

                  • kerryp
                    Member
                    • Apr 2015
                    • 17

                    #10
                    are my phiX QC measures ok?

                    I've just used phiX (v3) for the first time on a miseq run. I was using the 500-cycle reagent kit (running 241*2 cycles), and using 1% PhiX.

                    According to my run summary:

                    aligned = 1.3%
                    error rate (read 1) = 1.66 (+/- 0.13)
                    error rate (read 4) = 2.19 (+/- 0.14)

                    NB for both read 1 and read 2, error rate doubles between cycles 35 - 100

                    >=Q30 was 83.7% overall

                    Would these phiX measures fall within acceptable parameters, or should I be concerned? I can't find any clear guidance online as to what they actually mean.

                    Thanks in advance for any help!

                    Comment

                    • Jessica_L
                      Senior Member
                      • Feb 2010
                      • 117

                      #11
                      The error rate sounds a little high to me, but that is dependent on other factors, including the cluster density for the run. Can you post the run summary chart and the intensity plot from the data by cycle section (IVC plot)?

                      Comment

                      • kerryp
                        Member
                        • Apr 2015
                        • 17

                        #12
                        thanks jessica_L
                        Here's the charts screen - is this what you were looking for?
                        Attached Files

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #13
                          Run looks to be within illumina spec. Have you had a chance to look at the data? Some of the libraries may have short inserts. A typical characteristic manifestation of this is dropping Q-scores towards the end of reads. I assume the run has completed without any problems.

                          Comment

                          • kerryp
                            Member
                            • Apr 2015
                            • 17

                            #14
                            Haven't looked at the data beyond running it on spades (where there seemed to decently few contigs). But when I ran the library on bioanalyser, the average fragment size for all samples were within the 600-700bp range.
                            The run finished without any problems.
                            Glad to hear it may be within illumina spec. Where could I find this information (recommended values/thresholds for QC measures)?
                            Thanks

                            Comment

                            • GenoMax
                              Senior Member
                              • Feb 2008
                              • 7142

                              #15
                              As you do more runs you will start to develop a feel for what looks good and what does not. There isn't hard and fast thresholds (but a more continuous gradient) as to what constitutes good/bad runs. If you using your MiSeq in your own lab then you will have more control over samples/libraries. This is not possible when the instrument is being used in a core.

                              In general if you find a significant deviation (in terms of % Pass Filter, Q-scores) beyond the published spec (link below) then you stand a good chance of getting free replacement reagents from Illumina (once Illumina tech support determines that the problem was instrument/reagent related).

                              You can find published performance specification for MiSeq here: http://www.illumina.com/systems/mise...fications.html

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              36 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              100 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              121 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              113 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...