Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HiSeq 3000 output FastQC parameters are bad, should I ask for resequencing?

    Recently we sent a batch of 24 samples of total RNA to a company for sequencing on a HiSeq 3000 platform. Both our own and the company's quality checks on the samples showed good integrity of RNA and very good purity, so they went ahead with library preparation and sequencing. After the wait we got a hard drive with the data. However, after running FastQC some of the plots show problematic issues which I'm hesitant to attribute to the samples, since they seem more of an issue with the flowcell and/or library preparation. The samples were sequenced for a downstream de-novo transcriptome assembly with Trinity, but I'm not sure if the sequences as they are now will be good enough for that, even after trimming off the adapter. I've attached the FastQC of two representative samples (for both forward and reverse reads, simplified file names). They were sequenced in different flow cells since all the samples were not sequenced together to have enough reads per sample when multiplexing. The most alarming things:

    - Poor per-tile quality. Some regions of the flow cells seem like they failed in the later cycles, and they are localized, as if something had failed in one particular spot and not a generalized problem. You can also see this impacting the quality per sequence plot, where there's a hump in the sample with the worse per-tile quality.

    -Adapter content. In some samples adapter starts showing up at around cycle 100, which to me suggests the fragmentation was a bit too aggressive and small fragments were used in the library preparation.

    We paid quite a bit of money to have this sequenced, and it doesn't feel like the run was up to standard. Should we go back and ask the company to re-do this?
    Attached Files

  • #2
    Hi pecanton,

    this is a subset of the data? How many reads did you get in total?
    There was indeed an issue with the flowcell, localized low quality regions causing the low quality data. Likely Illumina would replace the reagents in this case.
    The insert size is a more difficult question. By default RNA-seq libraries always contain a majority of short reads. It depends what you discussed with them before.

    Comment


    • #3
      Thank you for replying.

      The FastQC reports I showed are not subsets, they are with all the reads for those samples. On average we got around 30-33 million reads for each sample (some up to 43 million). For the entire set of 24 samples we have around 860 million reads.

      On the matter of fragment size, I do know Illumina libraries have smaller fragments. We asked for 2 X 150 bp sequencing. The fact that the adapter shows up in a detectable percentage of the reads to me means that in the library preparation the size selection of fragments to attach the adapters included pieces of much less than 150 bp, otherwise there wouldn't be a read through into the adapter sequence in the last cycles.

      Comment


      • #4
        Originally posted by pecanton View Post
        .....
        On the matter of fragment size, I do know Illumina libraries have smaller fragments. We asked for 2 X 150 bp sequencing. The fact that the adapter shows up in a detectable percentage of the reads to me means that in the library preparation the size selection of fragments to attach the adapters included pieces of much less than 150 bp, otherwise there wouldn't be a read through into the adapter sequence in the last cycles.
        Yes, this is certainly correct. However, RNA-seq libraries generated with most protocols have a strong bias towards smaller fragments - in contrast to genomic libraries. Please see the attached examples from Illumina and from NEB information. Shortening the fragmentation times mostly results in a more prominent tail of long fragments while retaining a majority of short fragments. Thus, moving the insert sizes to 250 and above requires severe size selection measures that will be accompanied by some loss of library complexity. We do indeed carry out such size selections for de novo transcriptome assembly purposes, but I believe we are the exception and most places will not do it. Since one throws the majority of the library with the size selection this warrants a discussion in my eyes.
        Attached Files
        Last edited by luc; 07-31-2018, 04:46 PM.

        Comment


        • #5
          I've already contacted the company that did the sequencing. However, in the worst case scenario, how can I proceed to do assembly with these reads? Should I filter all reads coming from the bad tiles or let the assembler evaluate the quality of the base in the read?

          Comment


          • #6
            I would certainly filter the reads based on average quality scores (not losing more than
            15 % of the reads) and do some very gentle quality trimming from the 3' end.

            Comment


            • #7
              I did adapter trimming and a soft quality trimming with Trim_Galore. However, I'm still uneasy about including the sequences from the bad tiles into the assembler. You know, trash in, trash out. Is there any tool you would recommend to remove them? Should that be done before of after trimming?

              Comment


              • #8
                You can use "filterbytile.sh" from BBMap suite.

                Has the sequence provider said anything about the possibility that there was a hardware/software problem with this run. If there was then they should re-run the samples for you for no charge. Generally Illumina provides free reagent replacements to providers when they have a maintenance contract on the sequencer (which most will).
                Last edited by GenoMax; 08-02-2018, 07:03 AM.

                Comment


                • #9
                  Thank you so much! Yes, that looks like it could do the work, and it seems we have the suite already set up in our University's cluster. I'll have to play around with the parameters, I am not too sure how strict to be given the per-tile plots I am getting.

                  I called the sequencing facility yesterday, the operations team is going over my inquiry, but they haven't gotten back. I'll get in touch again today. It is a big company, so they should have those quality assurances in place for what is clearly a technical problem on their part. On my part it is more about the time it will take to get that data (if they redo it), as we are already a little behind schedule.
                  Last edited by pecanton; 08-02-2018, 07:12 AM.

                  Comment


                  • #10
                    As you appropriately said above:

                    You know, trash in, trash out.
                    That needs to take priority.

                    Comment


                    • #11
                      I did try to use Filter by Tile, even with the aggressive parameters they suggest, and although reduced, I still had a number of bad tiles carrying over. Fortunately, Well, after some back and forth, the company will be resequencing the samples. I'll use the ones I have to start optimizing parameters with Trinity. I've done bioinformatics before, but haven't ever done assembly with this big of a dataset, so I'll have to read around a bit.

                      Thank you all for your answers!

                      Comment


                      • #12
                        Glad to hear they are doing the right thing and will re-sequence. Only thing you are out of is time.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-27-2024, 06:37 PM
                        0 responses
                        13 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-27-2024, 06:07 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        69 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X