Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FASTQC Interpretation

    Hi all (first post here)!

    I am performing my first RNA-seq analysis (prokaryote, 50bp SE) from an Illumina TruSeq library that was sequenced on a Hiseq 2000. The FastQC is showing great read quality, but I have a few concerns that I am having difficulty interpreting.

    Should I be concerned about these kmer charts? There are about 30 sharply overrepresented sequences for any given sample. I have attached one sample's duplication levels graphic— it is representative of the others.

    Thank you in advance for any thoughts you can offer; I will report back if I have a moment of clarity.


    Click image for larger version

Name:	Kmer D1.png
Views:	1
Size:	90.4 KB
ID:	308686

    Click image for larger version

Name:	Kmer WT2.png
Views:	1
Size:	94.1 KB
ID:	308688

    Click image for larger version

Name:	Overrepresented.png
Views:	1
Size:	87.9 KB
ID:	308690

    Click image for larger version

Name:	Duplication levels.png
Views:	1
Size:	60.0 KB
ID:	308689
    Attached Files

  • #2
    On positive side the overrepresented sequences are not adapters (no hits )

    Have you tried a trimming program (BBDuk from BBMap or trimmomatic) to see if majority of reads survive?

    You should go forward with the analysis and see how the alignments look.

    Comment


    • #3
      I am just about to perform a trimmomatic run on the fastq files, so we'll see.

      Looking at one of the overrepresented sequences, I found it to be from ssrA (a 10S RNA), so I'm assuming the issue has something to do with bias in the steps leading up to and during library prep.

      Comment


      • #4
        Had you done anything to enrich mRNA/remove non-coding RNA?

        Comment


        • #5
          Yes. The Ribo-Zero kit was used, and our electropherogram afterwards indicated about 3% rRNA in any given sample.

          Separately, and I'm a little embarrassed to ask, but am I supposed to trim Illumina multiplexing barcodes prior to mapping my reads? I'm almost positive the answer is yes, but the distinction between Illumina adaptor and multiplex barcode seems muddled in the threads I have read.
          Last edited by Jossef; 02-13-2015, 06:44 PM.

          Comment


          • #6
            Illumina barcodes are read independently and are never part of the sequence (you will see the tag read sequence in each fastq read ID, it was used for the demultiplexing).

            Here is a video from Illumina that illustrates this: https://www.youtube.com/watch?v=womKfikWlxM

            Only thing you need to worry about is possible contamination of adapters (specially if your inserts are smaller than you thought they were).
            Last edited by GenoMax; 02-13-2015, 06:56 PM.

            Comment


            • #7
              Ah, I see where I was getting confused— I had been reading the FASTQ file incorrectly. A silly oversight on my part, but thanks.

              Comment


              • #8
                Jossef - I am having the same problem. Could you please explain what exactly was going wrong (you said you had been reading the FASTQ file incorrectly), and what the solution was?
                Many thanks!

                Comment


                • #9
                  @Julia_S: I think Jossef was only referring to not correctly interpreting the fastq headers. Not a problem with reading the fastq file itself.

                  Have you done any trimming/adapter scans on your data? Can you post images of what the problem looks like in your case?
                  Last edited by GenoMax; 03-24-2015, 08:54 AM.

                  Comment


                  • #10
                    FASTQC shows no adapter content and no overrepresented sequences; per base sequence content is also ok (except the first few bases).
                    I have 24 samples of human paired-end RNA seq, and for all of them, the kmer pictures look similar to the ones attached.
                    I am a newbie and completely at loss, so any help would be really appreciated!
                    Attached Files

                    Comment


                    • #11
                      I am going to suggest that you go ahead with trimming of data and further downstream analysis. You can re-check data post-trimming with FastQC to see if the k-mer over-representation goes away. Remember to use a paired-end aware trimming program (bbduk from BBMap suite, trimmomatic, cutadapt).

                      If you are worried about the data take a few sequences and spot check by blast at NCBI to make sure that the data aligns well to human genome.

                      Comment


                      • #12
                        Originally posted by GenoMax View Post
                        I am going to suggest that you go ahead with trimming of data and further downstream analysis. You can re-check data post-trimming with FastQC to see if the k-mer over-representation goes away. Remember to use a paired-end aware trimming program (bbduk from BBMap suite, trimmomatic, cutadapt).

                        If you are worried about the data take a few sequences and spot check by blast at NCBI to make sure that the data aligns well to human genome.
                        @GenoMax: Thank you! The k-mer overrepresentation is not generally at the start or end of the reads, so I would guess trimming is unlikely to affect it.
                        Speaking of trimming (and again sorry if this is a stupid question, this is the first time I am analysing RNAseq data) - I have no adapter contamination and the quality of all the bases is in the green area. In that case, I would have thought no (additional) trimming is necessary?

                        Comment


                        • #13
                          If you don't have adapter contamination then a pass through the trimming program would leave the data intact but if you do have some then you want that part removed anyway.

                          Your are perhaps right that trimming may not change the k-mer result but the main thing you want to know is how well your data maps. One could have perfect data (great Q scores, no k-mer enrichment) but if it does not map well then it is not useful.

                          BTW: k-mer module in FastQC only tracks 2% of the total data.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          22 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          19 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          50 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X