Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why is the base quality score so different between FastQC and FastX?

    Dear Sequencing Community,

    I am new to this so please be gently.

    The ChIP-seq was done by a company on Illumina sequencer, ~ 60 Mb per sample, 4 samples, only 17% - 22 % could be mapped to the mouse genome. There is some known salmon sperm contamination, but there should still be enough reads for the sample.

    I did some checks base quality scores using FastQC and was shocked:

    http://www.freeimagehosting.net/t4wx8

    But the comapany replied that FastQC does sample the first ~200.000 reads. I do know that this is done for e.g. base overrepresentation. I thought the base quality score is calculated from all the reads. .

    So I checked with FastX from Galaxy:

    http://www.freeimagehosting.net/713i5

    Why do the base quality score outputs from FastQC and FastX differ so much? If FastQC only samples a fraction of the reads wouldn't it become useless for bigger runs?

    Thanks for you help.
    Last edited by TheStudent; 06-05-2012, 02:10 AM.

  • #2
    Did you add the -Q 33 option to your fastx command? Illumina encoding changed from a -64 to a -33 encoding and fastx defaults to -64.
    Did FastQC make the right choice when it used 1.9 encoding? I've never seen it make a mistake like that however.

    Comment


    • #3
      Originally posted by pbluescript View Post
      Did you add the -Q 33 option to your fastx command? Illumina encoding changed from a -64 to a -33 encoding and fastx defaults to -64.
      Did FastQC make the right choice when it used 1.9 encoding? I've never seen it make a mistake like that however.
      Doesn't seem like it, then they'd be skewed but not qualitatively different like they are. Look at the quality manually for a few reads - they should all have a bad quality at pos 20 if FastQC is right.
      Did you try to see if you get the same result with a subset of your file? Is it very big and could thus overflow some counters?
      AFAIK FastQC should use all the reads for the quality plot.

      Comment


      • #4
        Are there actually quality scores above 40 in your file? That should indicate which one is more likely to be correct.

        Comment


        • #5
          That does look odd and I can't immediately think how you'd end up with biases which were that different. Were there any other oddities in the sample - such as the sample containing lots of 'N' calls? Does FastX do any filtering of the bases or reads it uses for the quality plot (FastQC doesn't, unless run with the --casava option). As others have said it's not going to be an offset detection problem, there's something else going on.

          The very high qualities in the FastX results seem a bit odd, most sequencers don't produce reads of Q40. The drop and recovery in the FastQC results doesn't look like a technical effect, but then I can't see how that woudn't show up in the FastX result.

          If you can put the FastQ file you were using somewhere I can see it I'd be happy to take a look to see which of the results seems to match up.

          Comment


          • #6
            @ pbluescript
            The FastX was run after a conversion to Sanger Format.

            @ arvid
            "AFAIK FastQC should use all the reads for the quality plot."
            Thank you, that is very helpful

            @simonandrews
            Thank you for your kind offer

            After a lot of checks I think I figured out what is wrong: Apparently because of some bug or malformatted fastq file FastQ only took the first ~5.000 reads. Those were really bad due to some technical issues.

            Thank you all for your thoughts and help.

            Comment


            • #7
              The only thing I know of which would allow FastQC to read only a subset of a file without throwing an error is if the file were composed of multiple gzipped sections. In this case the java parser used in FastQC stops after the first compressed block - however I included a work round for this in the last release of the program (v0.10.1) so this shouldn't happen any more.

              Can you let me know if this still happens with the latest release? If it does I'd like to look into this some more to try to figure out what went wrong.

              Thanks

              Simon.

              Comment


              • #8
                Hi Simon,

                thank you very much for the very quick reply. I did try it with the recent version v0.10.1 and it still did the same. I will send you the file in a PM.

                Thank you.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Working...
                X