Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low Diversity library ( 14 Ts) on HiSeq2000

    Hello all,
    I know that a similar thread was initiated by Simcom with a lot of replies, but I have a slightly different problem. We are sequecing polyA pulled down RNA based libraries to determine alternative poly adenylation. The way the libraries were constructed, the Read 1 on the HiSeq was to start with 6 Ns ( which would serve as unique molecular identifiers or UMIs) followed by 14 Ts and then the transcriptomic sequence. 5 indexed libraries were pooled, and after consultation with several sources including Illumina tech support, it was decided that we will "generate matrix and normalize" to a different lane with a good diversity, and not to spike in PhiX or other controls at a high concentration ( since it will just defeat the purpose of trying to get higher reads in the HiSeq instead of the GAIIx and apparently posed no advantage ) and to load at 8 pM. The library has a mean size of 250 bp. So the initial results from this lane were back and seemed like there were ~ 800K/mm2 clusters and 220 million reads, but with only 30 million that passed the "Chastity filter" calculated from the first 20 bps of or so. The graphics indicate that after 20 bp, mixed signals from all bases are appearing ( after what is clearly a T run). Everyone seems to agree that the low PF rate is likely from the T run, but not sure what to do next with the data - clearly it would be ideal to use more than just the 30 million that passed filter. I was wondering about the following:
    1. Any general ideas about what to do about this data-set post-run ? I have seen other discussions about analysing data from low diversity initial bps effectively with non-inhouse algorithms.
    2. If this is related to the T run, what is the likely reason for poor quality even though it was normalized to a different ( and succesful lane) ?
    3. Are the image files from HiSeq stored as TIFF that could be analyzed with next-phred or some other alternate base caller ?
    4. Can "deferred cluster calling" described by Felix Krueger's PLos One paper and what he described in the forum early last year, or something protocol that is similar probably helpful to this scenario ?
    Appreciate all help and apologize for naive statements/ assumptions above.

  • #2
    1) If the PHRED scores of the UMI and post-T segments look good, you could rerun the initial scripts from CASAVA to output all of the reads (passed and failed) as FASTQs while masking the T segment and demultiplexing on the UMI.
    2) Not sure, but if it's the T segment (as suspected) then the PHRED scores for those cycles are probably much lower than the flanking segments.
    3) The images are not saved, so 4) deferred basecalling is not an option at this point.

    Comment


    • #3
      Thank you, we will try those options first. I was under the impression that the PF is calculated on the "chastity scores" of the first 12-20 bases or so, does that directly correlate with the PHRED score for the base or is that a separate metric ?

      Comment


      • #4
        Chastity differs from PHRED score, and is calculated for the first 25 cycles. Per cycle PHRED scores can be visualized with Illumina's HCS or SAV software.

        Comment


        • #5
          So as an update, illumina and our NGS core both say they cannot rerun the scripts by masking the Ts ( bps 7 to 20). We do have the CIF files saved and I am guessing using a third party base caller would be the next logical step. There seems to be several available, but would there be an advantage of one vs another ( say those with no need for training sets like AYB, naivebayescall or OnlineCall vs IBIS )? And should the Ts try to be masked with these base callers ? I remain optimistic that the dataset is usable given the intensity files "looked good" per the illumina tehnical person himself but almost certainly the base calling is being thrown off by the T stretch.

          Comment


          • #6
            If you have a subset of bases which are causing a problem another option is to rerun the bcl conversion specifying --no-eamss. I'd also tell it to export QC filtered sequences as well (don't have the 1.8.1 manual to hand so can't remember the exact option to specify for this). You might find that the qualities of the poly-T stretch are poor, but that they recover once the low complexity sequence is over. Turning off EAMSS will allow the qualities to come back up again and you might return to usable sequence.

            Comment


            • #7
              Originally posted by simonandrews View Post
              If you have a subset of bases which are causing a problem another option is to rerun the bcl conversion specifying --no-eamss. I'd also tell it to export QC filtered sequences as well (don't have the 1.8.1 manual to hand so can't remember the exact option to specify for this).
              The option referenced by Simon is "--with-failed-reads" which will include reads failing the filter in the output file.

              Comment


              • #8
                You can combine the recommendations of Simon and Genomax with the flag --use-bases-mask I6n14Y* to mask the Ts and demultiplex on the first six bases.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X