Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastqc results small RNA run

    Hi,

    I got a data set of small RNAs. They did polysome profiling followed by sequencing of the regions covered by the ribosomes.

    Unfortunately the results from Fastqc are not as expected.
    The problem is, I am not exactly sure how to interpret the data and what to say about the quality of it either than good/bad.

    I am happy to get any advices as to what went wrong or what ca be done better.

    Is it a problem of the library creation, the method of preparation or else?

    I added here the images I found very disturbing.

    Click image for larger version

Name:	per_base_sequence_content.jpg
Views:	1
Size:	67.3 KB
ID:	307525

    Click image for larger version

Name:	per_base_gc_content.jpg
Views:	1
Size:	29.6 KB
ID:	307526

    Click image for larger version

Name:	per_sequence_gc_content.jpg
Views:	1
Size:	37.4 KB
ID:	307527

    The total quality of the sequences is quite good as you can see from the per_base_quality image.

    Click image for larger version

Name:	per_base_quality.jpg
Views:	1
Size:	31.2 KB
ID:	307528

    Another problem I have is the overrepresented sequences. I have one read in my library in over 33% of the reads. Than I have some more reads, but with much lower concentration (7% downwards). the kmer content show also a strange behavior.

    Click image for larger version

Name:	kmer_profiles.jpg
Views:	1
Size:	33.7 KB
ID:	307529

    I will be grateful for any suggestions of improvements or possible explanations for this results.

    Thanks for any help

    Assa

  • #2
    All of the unusual profiles are the result of the overrepresented sequence in your library. Having the same sequence make up 33% of the library will affect the overall base composition, Kmer composition and overall GC content.

    As you said, the quality looks OK so there's no technical problem with the sequencing. The duplication level plot will tell you whether your problem is a small number of isolated sequences, or a generally high level of duplication in your library.

    What you do about this will largely depend on what the overrepresented sequence is. If it's a small RNA then it just means you original sample is really biased, but if it's something like an adapter or primer then you may be able to improve your sample prep to get rid of it in future runs.

    Comment


    • #3
      simon, I think you are the developer of fastqc?

      It would be awesome to have sample good fastqc plots for the regular applications: dna re-sequencing, rna-seq, chip-seq, miRNA-seq... etc just to get a good idea for comparison, and your expert comments would definitely help as well!
      --
      bioinfosm

      Comment


      • #4
        This is actually something we've been looking into. Setting up a repository with example datasets from different techniques and platforms, along with QC reports and annotations of any known problems which were found. Still trying to figure out the practicalities of hosting this though...

        Comment


        • #5
          FASTQC on my small RNA sequences identifies several overrepresented sequences. It might be because of the adapter sequences. I do a trimming for the adapter ('ACTA') using the command
          >fastx_clipper -C -v -i SRR519779.fastq -Q 33 -a ACTA -o SRR519779_trimmed.fastq
          The out put for this is:
          Clipping Adapter: ACTA Min. Length: 5 Clipped reads - discarded. Input: 4484151 reads. Output: 4440775 reads. discarded 0 too-short reads. discarded 0 adapter-only reads. discarded 0 clipped reads. discarded 43376 N reads.

          Seems there is no effect of this trimming, the FASTQC shows similar results on the trimmed sequence.
          Am I doing something wrong? Please suggest.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X