Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with low-diversity barcodes on Hi-Seq

    Hi everyone,

    I am fairly new to NGS but I will try my best to explain my problem. My group has recently sent out ddRAD Seq libraries for sequencing on a Hi-Seq 2000 platform. Yesterday, we were contacted and told that after running 3 lanes of our samples, the platform's pass filter returned 0 usable reads, despite the cluster density being normal. They also told us another group had samples on the same flow cell and that they had good results, meaning the errors probably stems from our library preparation.

    The only thing we can think of that would cause this is a slight deviation we recently implemented in our protocol. Without realizing it, all of the indexes we ligated to our samples are of the same nucleotide length, and they are all followed by the same restriction enzyme cut site. Basically, nucleotides 6 to 10 in the first read are identical in all of our samples. We are concerned that this might be causing our problems as the platform is known to have issues with low-diversity sequences, especially when they are at the start of the read.

    I have read various threads about the use of high concentrations of PhiX to spike libraries, but people seem to have various success with this method. It is unfortunately too late for us to redo our libraries as well, and we do not have any other samples to submit that they could spike our DNA with.

    I have never used this platform myself, but I remember reading somewhere that is possible to perform chemistry-only (with no imaging) cycles on the Hi-Seq. Since we do not need nucleotides 6-10 in our reads (as they are all identical), I was wondering if it would be possible to program the machine to perform chemistry-only cycles during cycles 6-10. My thinking was that we would end up with enough nucleotide diversity in the reads, and hopefully everything would go through the pass filter.

    Is this a feasible solution? Perhaps this is not even a possibility, or maybe this would create a different problem.

    Thanks in advance!

  • #2
    Do you know if the lab had used phiX in your lanes? What was the cluster density for your libraries? One could underload these libraries to see if you can get data on a new run.

    As for this run, it may be possible to run bcl2fastq and ask it to include failed reads in output. This may or may not be an option in your case. You could ask the lab.

    Comment


    • #3
      Looks like the colony detection (which is done on the first cycles, do not remember how many) failed...

      That's why when people in my dpt want to make their own index, I always tell them to include 4 to 5 N in 5'... Then it works (at least for the few projects we did with home made libraries and indexes)

      Comment


      • #4
        Originally posted by GenoMax View Post
        Do you know if the lab had used phiX in your lanes? What was the cluster density for your libraries? One could underload these libraries to see if you can get data on a new run.

        As for this run, it may be possible to run bcl2fastq and ask it to include failed reads in output. This may or may not be an option in your case. You could ask the lab.
        We are still waiting on them to give us more details. They did not say what the cluster density was, nor do we know how much (if any PhiX) they added to our samples. We were contemplating asking them to redo these lanes with higher concentrations of PhiX and loading them at a lower density, but would like to have some confidence in this before we try it out since it will cost us extra to redo these lanes.

        I am not too familiar with bcl2fastq but I will definitely look into it. One thing I forgot to mention though is that our libraries were also prepared by adding 4 different P2 PCR indices for multiplexing (we had 20 different ligation P1 barcodes combined with 4 P2 PCR indices so that we could multiplex a maximum of 80 samples per lane). However, I'm worried that perhaps using only 4 P2 indices might not have been enough to create high-diversity on these reads as well. Do you think bcl2fastq will still be able to proccess our data in this case?

        Thanks for the suggestions!

        Comment


        • #5
          Originally posted by SylvainL View Post
          Looks like the colony detection (which is done on the first cycles, do not remember how many) failed...

          That's why when people in my dpt want to make their own index, I always tell them to include 4 to 5 N in 5'... Then it works (at least for the few projects we did with home made libraries and indexes)
          In the past, we used to combine barcodes of different legnths, which created offsetting at the start of our reads, thus creating more nucleotide diversity. Due to some issues we had last time using STACKS to de-multiplex, we decided to only use barcodes of the same length this time. It was certainly an oversight on our part and a mistake we will not be repeating. For now however, we are just trying to figure out a way to salvage our runs already in progress.

          Comment


          • #6
            Without knowing how diverse your indexes are and how long they are (both are only inline?) hard to comment on complexity. We don't know exactly at what step the sequencing/filtering failed. This type of debugging requires access to the original data. I am not sure we can help you beyond this point.

            Hopefully the facility has consulted with Illumina tech support to see if any additional information can be gleaned from the failures.
            Last edited by GenoMax; 02-10-2016, 07:34 AM.

            Comment


            • #7
              We still haven't been able to find out at what step the filtering failed. The only information I was able to find out is that they used 10% PhiX in our runs, so I'm doubtful that increasing this amount will yield anything productive. We are going to contact Illumina and have them work with the facility. We are also looking into the possibility of recovering the data with bcl2fastq. Thanks again for the advice!

              Comment


              • #8
                Is it possible to reuse your libraries as input for new library prep? Using degenerated primers this time? You will loose the equivalent of your initial primers but this can be compensated by longer runs...

                main problem doing so is you will add an extra PCR step (but doing very few cycles, just to kind of change the beginning of your reads...)

                Comment


                • #9
                  SylvainL,

                  I am wondering if you are proposing the incorporation of additional flow cell and sequencing primer site that will be separated from the previous sites by a variable number of N sites to incorporate diversity? If so, I wonder what the effect of site duplication would be on colony formation and sequencing primer annealing. Figure 1 in protocol_S1 of Peterson et al [journals.plos.org/plosone/article?id=10.1371/journal.pone.0037135#s5] has a good diagram of a sequencing ready ddRAD library in case it is helpful for this question.

                  thanks.

                  Comment


                  • #10
                    If the facility was able to recover phiX reads from these lanes then something could be wrong with the libraries that @Jean-Rene submitted. Got to keep this on the table as a possibility.

                    @Jean: Can you clarify what happened to the phiX from your sample lanes? Did that sequence look ok?

                    Comment


                    • #11
                      Hi again all,

                      I was also wondering if anyone has used Bareback http://www.bioinformatics.bbsrc.ac.u...ects/bareback/ from Krueger et al 2012 http://www.plosone.org/article/fetch...esentation=PDF or another program to help deal with nucleotide diversity issues? If so, do you know if Bareback is compatible with current Illumina HiSeq file outputs?

                      Thanks again.
                      Last edited by GenoMax; 02-11-2016, 06:29 AM. Reason: Fixed URL

                      Comment


                      • #12
                        Originally posted by ATϟGC View Post
                        Hi again all,

                        I was also wondering if anyone has used Bareback [http://www.bioinformatics.bbsrc.ac.u...cts/bareback/] from Krueger et al 2012 [http://www.plosone.org/article/fetch...sentation=PDF] or another program to help deal with nucleotide diversity issues? If so, do you know if Bareback is compatible with current Illumina HiSeq file outputs?

                        Thanks again.
                        This may no longer be an option. The "movie" data from HiSeq 2500 is in tens of terabytes so it is impractical to store it.

                        Comment


                        • #13
                          ATGC,

                          yes, I was wondering if using the actual libraries as matrix for a new library prep could be feasible... I actually have no idea. I guess it should work. 10 cycles should be enought to really enrich the new products incorporating new indexes (whith incorporation of N between the adapter and the new index...) Since the previous indexes main be sequenced as well, it should even be possible to not add new indexes... simply one more step in the pipeline....

                          I wonder if anyone had already tried....

                          Comment


                          • #14
                            Hi GenoMax,

                            I'll speak for Jean-Rene here as I am working with him and he is currently busy in the lab. It appears that there is not anyone in the office this week at our core facility that can give us any files or a more detailed description of what happened so we will have to wait until early next week to look at the phiX, cluster density etc and get their opinion as to what the most likely root cause of the problem is.

                            Hi SlyvainL,

                            Thanks for your idea. We can not afford the extra read-length so if sequence diversity (from either the P1-inline barcode or P2-Illumina Index end) appears to be the most likely cause we will likely try another run with something like the following:

                            In order to introduce diversity at both the P1 (inline barcode/restriction site) and P2 (Index) ends we will re-prepare the failed samples with barcodes of length 5 and 4 in order to offset the restriction site. Currently, all our samples on this failed run had barcode lengths of 6 leading no offsetting of the restriction site. We already have the barcode adapters synthesized and annealed so incorporating Ns would add a bit of cost and time and may not be necessary. I expect that offsetting will be sufficient. WE would pair these shorter barcodes with multiple unique indexes so that we can avoid some sequence de-multiplexing losses from the STACKS program we experienced on a previous successfull ddRAD run. We could then add these newly prepared libraries at ~30-50% to increase the sequence diversity while recovering the sequence data. This previous successful run had increased diversity from the use of 4, 5, and 6 nucleotide barcodes and was run on a lane with 50% mRNA libraries with 12 or more different indexes but the library preparation protocol was identical otherwise.

                            Thank you all for your speedy help. We realise how difficult it is to help troubleshooting with such limited information.

                            Comment


                            • #15
                              I know everyone is just trying to solve the problem here, but Illumina claims to no longer have an issue with low diversity sequence. See this pdf:



                              So it is not okay for them to claim that the diversity was too low in this sample if it is being sequenced on a HiSeq using HCS 2.2.38 or later.

                              Note, however, that in the .pdf the samples are clustered at a low density. So it may be necessary to back off the clustering density by quite a bit -- 30-50%.

                              --
                              Phillip

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X