Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Evaluate the Illumina Data Quality?

    Hello everyone!

    I'm a new guy here, and this is my first post.

    I'm doing the de novo assembly work with the Illumina sequencing data (form GAIIx v1.3, pair-end reads 150bp x 2). I find the raw data I got from the bio-company is not so good(shown in the FastQC figure).

    I wonder if there is a criterion in Illunima Company or some other commonly used rules for evaluating the quality of GAIIx sequencing data.

    According to the criterion, if the data quality is bad, I will ask the company to sequence our genome again.

    Thank you very much!
    Last edited by Godevil; 04-03-2011, 04:54 PM. Reason: Re-upload the figure

  • #2
    Your image seems to have been removed from baidu so I can't see your plot so it's hard to comment specifically.

    150bp is still very long for an Illumina read. I don't think I've yet seen an Illumina run whose quality stayed high to 150bp, with the latest chemistry we're just starting to see runs of ~120bp with high quality to the end. That's not to say that the last 30bp of a 150bp run aren't useful - but you shouldn't be surprised to see the quality decline.

    Comment


    • #3
      Thank you for your reply. I've re-uploaded the FastQC figure.
      I wonder if there is a common used criterion to evaluate the Illummina data quality.

      Comment


      • #4
        Originally posted by Godevil View Post
        Thank you for your reply. I've re-uploaded the FastQC figure.
        I wonder if there is a common used criterion to evaluate the Illummina data quality.
        I still can't see your figure.

        There's no common set of standards available to evaluate any kind of sequence data. The cutoffs used in FastQC are simply derived from looking at lots of different runs and setting a cutoff where I'd have have wanted to have a closer look. There will be variations between different machines, library types and even chemistry versions which make setting universal cutoffs difficult. We're always open to suggestions for ways to improve things.

        Comment


        • #5
          Originally posted by simonandrews View Post
          I still can't see your figure.
          I tried again to use another album on the Internet. I hope you can see this figure now.

          I'm a new guy here. I don't know if I can upload a figure as an attachment in SEQanswers. Figure's URL from the internet album is always unstable. Which album do you often use?

          Figure's URL:



          or

          Comment


          • #6
            OK, I can see it now. It actually doesn't look bad at all. Your quality looks OK up to about 125bp and only drops right at the end. 150bp runs have only been supported relatively recently and it's not surprising to see a drop in the last few bases. We've only just started to see data which looked to be high quality past 100bp, and your data runs well past that.

            This actually looks like very nice data from a very ambitious run. I'd be very happy with my sequencing provider if they could routinely produce data this good.

            Comment


            • #7
              Originally posted by simonandrews View Post
              It actually doesn't look bad at all.
              Thank you for your reply
              The figure we discussed is the "per base sequence quality" which is not good enough through the evaluation of FastQC.

              The following figure is the statistic results from the FastQC.


              This software tells me that the sequencing result is not so good, so I wonder if there is a common criterion for evaluating the sequencing data quality.

              I also believe it would be OK if I only use the first 100bp in the reads. However, my problem is that our lab paid extra money for the 150bp reads. It is expensive to get the long reads than 100bp reads. So, I wonder if this sequncing data is OK, or we should ask the company to do the sequencing again.
              Last edited by Godevil; 03-30-2011, 11:30 PM. Reason: re-upload figure

              Comment


              • #8
                For the quality plot FastQC is warning you that the quality at the end of the read is poor - which it is. What I was trying to do was to put this into context which was that you were running the longest supported read length which Illumina provide. At this length we'd not be surprised to see quality values falling at the end of the read since you're really pushing the limits of the chemistry.

                So I suppose the message is that at the end the quality is poor - but it's probably as good as you were ever going to get (at least for now).

                I can't really comment on the other warnings/failures without seeing the plots which go with them.

                Comment


                • #9
                  Originally posted by simonandrews View Post
                  I can't really comment on the other warnings/failures without seeing the plots which go with them.
                  Ok, I will show all the figure wich has a red "x" or yellow "!" marker in the FastQC report. Maybe, you can give me some advice.

                  Thank you very much!









                  Comment


                  • #10
                    OK, so the Kmers and the Base plots suggest that in a significant proportion of your library you're reading through your insert and into the adapter on the other end. You'll therefore want to use an adapter trimmer to remove this sequence.

                    You have some low level sequence duplication, which could be the result of saturating your library, or could be PCR amplification bias. If you're assembling a small genome then the saturation seems more likely, if it's a large genome then it might be PCR bias. You might want to remove duplicates before assembling, since they'll just slow down the assembly process and won't add any new information to your library.

                    So to summarise, if I were you I'd:

                    1) Trim all of my sequences back to about 125bp to remove the poor quality sequence from the end.

                    2) Run the remaining sequence through an adapter trimmer to remove any remaining adapter (you could do this first)

                    3) Remove duplicate sequences - not because your library is heavily duplicated, but it's probably not a bad idea before any assembly.

                    Hope this helps

                    Simon.

                    Comment


                    • #11
                      Originally posted by simonandrews View Post
                      1) Trim all of my sequences back to about 125bp to remove the poor quality sequence from the end.

                      2) Run the remaining sequence through an adapter trimmer to remove any remaining adapter (you could do this first)
                      It really useful! Thank you very much!

                      Actually, I've already found ~10% reads have the adapter sequence in their 3' ends. I decide to remove the adapter sequences and perform quality filter with "Cutadapt" software (This software can trim the low quality reads using the same rule as BWA, and then, remove the adapter sequences).


                      Originally posted by simonandrews View Post
                      3) Remove duplicate sequences - not because your library is heavily duplicated, but it's probably not a bad idea before any assembly.
                      But, I don't know which software is the best for me to remove duplicate sequences. Do you have some advice?


                      Your reply is so helpful!

                      Comment


                      • #12
                        Originally posted by Godevil View Post
                        But, I don't know which software is the best for me to remove duplicate sequences. Do you have some advice?
                        I've not used it myself, but the picard markduplicates program does this.

                        Comment


                        • #13
                          Take a look at PRINSEQ as well. I would suggest that you use the standalone version if you have large files.



                          Comment


                          • #14
                            Originally posted by Godevil View Post
                            Ok, I will show all the figure wich has a red "x" or yellow "!" marker in the FastQC report. Maybe, you can give me some advice.
                            End of read quality: Since you have shortish inserts with long reads, and you see adapters, many reads will need to be trimmed. But i wouldn't advise just cutting each read at a fixed point (unless you have oodles of data to spare) - even if 25% of the reads are usable out to 150bp, those good quality long reads are gold-dust in denovo projects. I'd recommend adapter trimming, N base / B quality trimming, followed by a sliding window quality filter across a few bases.

                            Duplication: Some of this is likely adapter-only reads, or even just the adapter parts of otherwise usable reads. It could also be biological, a contaminant, mitochondria or chloroplast sequence. And even if some reads are identical, this can happen due to common repeats within the target genome without being 'wrong'. Assemblers often rely on such coverage hints to help find repeats.

                            For scaffolding however, especially with mate pair data, remove duplicates, or many copies of a single (possibly wrong) link can be used to link contigs in error.

                            Comment


                            • #15
                              *BUMP*

                              I'm using FastQC as well (great program - thanks!) and I have noticed that I have significant contaminations of TruSeq adapters. Is there a program that will trim these without me having to specify the adapter sequences themselves? I.e. a program that uses the same sort of information as FastQC to automatically identify and trim them?

                              Thanks - and sorry for the bump!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X