Hi guys,
i am very new at NGS data procesing.
First, a small introduction to my current situation:
- Samples were the small RNA fraction (<200 nt, MIRVANA MIRNA ISOLATION kit).
- Library and template preparation, as well as the sequencing protocoll were performed by an external company. The company used the ion torrent software to trimm adaptes and filter reads by quality.
- RNA sample quality was RIN >8 (assessed by analising the remaining RNA fraction, >200 nt).
- Original output quality was assessed by FastQC
- Followed by another FastQC analysis after filtering sequences by size (16-27nt) and QV=17.
Ok, my problem is the following:
- Analyses showed a good quality scores, however, it indicated an apparent contamination of the library. In the FastQC file appears:
[PASS] Basic Statistics
[PASS] Per base sequence quality
[PASS] Per sequence quality scores
[FAIL] Per base sequence content
[FAIL] Per base GC content
[WARNING] Per sequence GC content
[PASS] Per base N content
[WARNING] Sequence Length Distribution
[FAIL] Sequence Duplication Levels
[FAIL] Overrepresented sequences
[FAIL] Kmer Content
I have attached the files
Should I do new run of my sample?
Is it possible that the library/template preparation was performed incorrectly?
Are there chances that overepresented reads are actually highly biologically significant?
If i could get some feed back on this, it would help a lot,
Thanks
i am very new at NGS data procesing.
First, a small introduction to my current situation:
- Samples were the small RNA fraction (<200 nt, MIRVANA MIRNA ISOLATION kit).
- Library and template preparation, as well as the sequencing protocoll were performed by an external company. The company used the ion torrent software to trimm adaptes and filter reads by quality.
- RNA sample quality was RIN >8 (assessed by analising the remaining RNA fraction, >200 nt).
- Original output quality was assessed by FastQC
- Followed by another FastQC analysis after filtering sequences by size (16-27nt) and QV=17.
Ok, my problem is the following:
- Analyses showed a good quality scores, however, it indicated an apparent contamination of the library. In the FastQC file appears:
[PASS] Basic Statistics
[PASS] Per base sequence quality
[PASS] Per sequence quality scores
[FAIL] Per base sequence content
[FAIL] Per base GC content
[WARNING] Per sequence GC content
[PASS] Per base N content
[WARNING] Sequence Length Distribution
[FAIL] Sequence Duplication Levels
[FAIL] Overrepresented sequences
[FAIL] Kmer Content
I have attached the files
Should I do new run of my sample?
Is it possible that the library/template preparation was performed incorrectly?
Are there chances that overepresented reads are actually highly biologically significant?
If i could get some feed back on this, it would help a lot,
Thanks
Comment