SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
questions of illumina pe reads fastqc results arrchi Bioinformatics 1 12-01-2011 03:07 PM
DESeq results give extremely small p-values? chris Bioinformatics 11 08-29-2011 06:33 AM
Need help for FastQC results. Thanks!! byou678 Bioinformatics 18 08-23-2011 01:53 PM
initial amount of total RNA or enriched small rna for small rna seq Deli Çoban Sample Prep / Library Generation 4 06-04-2011 03:47 PM
Is it normal for cufflinks produce different results in multiple run superligang Bioinformatics 3 03-03-2011 01:40 AM

Reply
 
Thread Tools
Old 12-03-2010, 06:41 AM   #1
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Unhappy Fastqc results small RNA run

Hi,

I got a data set of small RNAs. They did polysome profiling followed by sequencing of the regions covered by the ribosomes.

Unfortunately the results from Fastqc are not as expected.
The problem is, I am not exactly sure how to interpret the data and what to say about the quality of it either than good/bad.

I am happy to get any advices as to what went wrong or what ca be done better.

Is it a problem of the library creation, the method of preparation or else?

I added here the images I found very disturbing.

per_base_sequence_content.jpg

per_base_gc_content.jpg

per_sequence_gc_content.jpg

The total quality of the sequences is quite good as you can see from the per_base_quality image.

per_base_quality.jpg

Another problem I have is the overrepresented sequences. I have one read in my library in over 33% of the reads. Than I have some more reads, but with much lower concentration (7% downwards). the kmer content show also a strange behavior.

kmer_profiles.jpg

I will be grateful for any suggestions of improvements or possible explanations for this results.

Thanks for any help

Assa
frymor is offline   Reply With Quote
Old 12-04-2010, 01:46 AM   #2
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

All of the unusual profiles are the result of the overrepresented sequence in your library. Having the same sequence make up 33% of the library will affect the overall base composition, Kmer composition and overall GC content.

As you said, the quality looks OK so there's no technical problem with the sequencing. The duplication level plot will tell you whether your problem is a small number of isolated sequences, or a generally high level of duplication in your library.

What you do about this will largely depend on what the overrepresented sequence is. If it's a small RNA then it just means you original sample is really biased, but if it's something like an adapter or primer then you may be able to improve your sample prep to get rid of it in future runs.
simonandrews is offline   Reply With Quote
Old 12-04-2010, 09:31 PM   #3
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

simon, I think you are the developer of fastqc?

It would be awesome to have sample good fastqc plots for the regular applications: dna re-sequencing, rna-seq, chip-seq, miRNA-seq... etc just to get a good idea for comparison, and your expert comments would definitely help as well!
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 12-05-2010, 02:04 AM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

This is actually something we've been looking into. Setting up a repository with example datasets from different techniques and platforms, along with QC reports and annotations of any known problems which were found. Still trying to figure out the practicalities of hosting this though...
simonandrews is offline   Reply With Quote
Old 10-24-2013, 10:21 AM   #5
debjit_ray
Junior Member
 
Location: USA

Join Date: Oct 2013
Posts: 2
Default

FASTQC on my small RNA sequences identifies several overrepresented sequences. It might be because of the adapter sequences. I do a trimming for the adapter ('ACTA') using the command
>fastx_clipper -C -v -i SRR519779.fastq -Q 33 -a ACTA -o SRR519779_trimmed.fastq
The out put for this is:
Clipping Adapter: ACTA Min. Length: 5 Clipped reads - discarded. Input: 4484151 reads. Output: 4440775 reads. discarded 0 too-short reads. discarded 0 adapter-only reads. discarded 0 clipped reads. discarded 43376 N reads.

Seems there is no effect of this trimming, the FASTQC shows similar results on the trimmed sequence.
Am I doing something wrong? Please suggest.
debjit_ray is offline   Reply With Quote
Reply

Tags
fastqc, illumina, ribosome profiling

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:15 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO