SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Opinion on FastQC output for HiSeq 4000 PE sequencing run quokka Illumina/Solexa 2 08-05-2018 03:46 PM
First HiSeq 3000 data DNATECH Illumina/Solexa 22 05-20-2015 11:24 PM

Reply
 
Thread Tools
Old 07-30-2018, 11:07 AM   #1
pecanton
Member
 
Location: Florida, USA

Join Date: Jun 2011
Posts: 13
Default HiSeq 3000 output FastQC parameters are bad, should I ask for resequencing?

Recently we sent a batch of 24 samples of total RNA to a company for sequencing on a HiSeq 3000 platform. Both our own and the company's quality checks on the samples showed good integrity of RNA and very good purity, so they went ahead with library preparation and sequencing. After the wait we got a hard drive with the data. However, after running FastQC some of the plots show problematic issues which I'm hesitant to attribute to the samples, since they seem more of an issue with the flowcell and/or library preparation. The samples were sequenced for a downstream de-novo transcriptome assembly with Trinity, but I'm not sure if the sequences as they are now will be good enough for that, even after trimming off the adapter. I've attached the FastQC of two representative samples (for both forward and reverse reads, simplified file names). They were sequenced in different flow cells since all the samples were not sequenced together to have enough reads per sample when multiplexing. The most alarming things:

- Poor per-tile quality. Some regions of the flow cells seem like they failed in the later cycles, and they are localized, as if something had failed in one particular spot and not a generalized problem. You can also see this impacting the quality per sequence plot, where there's a hump in the sample with the worse per-tile quality.

-Adapter content. In some samples adapter starts showing up at around cycle 100, which to me suggests the fragmentation was a bit too aggressive and small fragments were used in the library preparation.

We paid quite a bit of money to have this sequenced, and it doesn't feel like the run was up to standard. Should we go back and ask the company to re-do this?
Attached Files
File Type: pdf crn_R2.pdf (294.9 KB, 24 views)
File Type: pdf crn_R1.pdf (295.1 KB, 15 views)
File Type: pdf bean_R2.pdf (290.6 KB, 7 views)
File Type: pdf bean_R1.pdf (290.2 KB, 8 views)
pecanton is offline   Reply With Quote
Old 07-30-2018, 07:53 PM   #2
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 337
Default

Hi pecanton,

this is a subset of the data? How many reads did you get in total?
There was indeed an issue with the flowcell, localized low quality regions causing the low quality data. Likely Illumina would replace the reagents in this case.
The insert size is a more difficult question. By default RNA-seq libraries always contain a majority of short reads. It depends what you discussed with them before.
luc is offline   Reply With Quote
Old 07-31-2018, 05:41 AM   #3
pecanton
Member
 
Location: Florida, USA

Join Date: Jun 2011
Posts: 13
Default

Thank you for replying.

The FastQC reports I showed are not subsets, they are with all the reads for those samples. On average we got around 30-33 million reads for each sample (some up to 43 million). For the entire set of 24 samples we have around 860 million reads.

On the matter of fragment size, I do know Illumina libraries have smaller fragments. We asked for 2 X 150 bp sequencing. The fact that the adapter shows up in a detectable percentage of the reads to me means that in the library preparation the size selection of fragments to attach the adapters included pieces of much less than 150 bp, otherwise there wouldn't be a read through into the adapter sequence in the last cycles.
pecanton is offline   Reply With Quote
Old 07-31-2018, 03:39 PM   #4
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 337
Default

Quote:
Originally Posted by pecanton View Post
.....
On the matter of fragment size, I do know Illumina libraries have smaller fragments. We asked for 2 X 150 bp sequencing. The fact that the adapter shows up in a detectable percentage of the reads to me means that in the library preparation the size selection of fragments to attach the adapters included pieces of much less than 150 bp, otherwise there wouldn't be a read through into the adapter sequence in the last cycles.
Yes, this is certainly correct. However, RNA-seq libraries generated with most protocols have a strong bias towards smaller fragments - in contrast to genomic libraries. Please see the attached examples from Illumina and from NEB information. Shortening the fragmentation times mostly results in a more prominent tail of long fragments while retaining a majority of short fragments. Thus, moving the insert sizes to 250 and above requires severe size selection measures that will be accompanied by some loss of library complexity. We do indeed carry out such size selections for de novo transcriptome assembly purposes, but I believe we are the exception and most places will not do it. Since one throws the majority of the library with the size selection this warrants a discussion in my eyes.
Attached Images
File Type: jpg NEB-RNA-seq-example.jpg (27.5 KB, 5 views)
File Type: png IlluminaRNA-seq-example.png (69.5 KB, 5 views)

Last edited by luc; 07-31-2018 at 04:46 PM.
luc is offline   Reply With Quote
Old 08-01-2018, 11:21 AM   #5
pecanton
Member
 
Location: Florida, USA

Join Date: Jun 2011
Posts: 13
Default

I've already contacted the company that did the sequencing. However, in the worst case scenario, how can I proceed to do assembly with these reads? Should I filter all reads coming from the bad tiles or let the assembler evaluate the quality of the base in the read?
pecanton is offline   Reply With Quote
Old 08-01-2018, 11:55 AM   #6
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 337
Default

I would certainly filter the reads based on average quality scores (not losing more than
15 % of the reads) and do some very gentle quality trimming from the 3' end.
luc is offline   Reply With Quote
Old 08-02-2018, 06:40 AM   #7
pecanton
Member
 
Location: Florida, USA

Join Date: Jun 2011
Posts: 13
Default

I did adapter trimming and a soft quality trimming with Trim_Galore. However, I'm still uneasy about including the sequences from the bad tiles into the assembler. You know, trash in, trash out. Is there any tool you would recommend to remove them? Should that be done before of after trimming?
pecanton is offline   Reply With Quote
Old 08-02-2018, 07:01 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

You can use "filterbytile.sh" from BBMap suite.

Has the sequence provider said anything about the possibility that there was a hardware/software problem with this run. If there was then they should re-run the samples for you for no charge. Generally Illumina provides free reagent replacements to providers when they have a maintenance contract on the sequencer (which most will).

Last edited by GenoMax; 08-02-2018 at 07:03 AM.
GenoMax is offline   Reply With Quote
Old 08-02-2018, 07:08 AM   #9
pecanton
Member
 
Location: Florida, USA

Join Date: Jun 2011
Posts: 13
Default

Thank you so much! Yes, that looks like it could do the work, and it seems we have the suite already set up in our University's cluster. I'll have to play around with the parameters, I am not too sure how strict to be given the per-tile plots I am getting.

I called the sequencing facility yesterday, the operations team is going over my inquiry, but they haven't gotten back. I'll get in touch again today. It is a big company, so they should have those quality assurances in place for what is clearly a technical problem on their part. On my part it is more about the time it will take to get that data (if they redo it), as we are already a little behind schedule.

Last edited by pecanton; 08-02-2018 at 07:12 AM.
pecanton is offline   Reply With Quote
Old 08-02-2018, 07:14 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

As you appropriately said above:

Quote:
You know, trash in, trash out.
That needs to take priority.
GenoMax is offline   Reply With Quote
Old 08-07-2018, 06:04 AM   #11
pecanton
Member
 
Location: Florida, USA

Join Date: Jun 2011
Posts: 13
Default

I did try to use Filter by Tile, even with the aggressive parameters they suggest, and although reduced, I still had a number of bad tiles carrying over. Fortunately, Well, after some back and forth, the company will be resequencing the samples. I'll use the ones I have to start optimizing parameters with Trinity. I've done bioinformatics before, but haven't ever done assembly with this big of a dataset, so I'll have to read around a bit.

Thank you all for your answers!
pecanton is offline   Reply With Quote
Old 08-07-2018, 06:19 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

Glad to hear they are doing the right thing and will re-sequence. Only thing you are out of is time.
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:32 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO