SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Strange fastqc per base sequence content 3'end kirstyn Bioinformatics 16 01-05-2017 09:58 AM
Strange looking fastQC per base sequence content, duplications, overrepresented, KMER aurorasea1 Bioinformatics 4 05-13-2016 04:16 AM
Strange FastQC "Per base sequence content report" tu.le Bioinformatics 10 12-23-2013 04:09 PM
FastQC,kmer content, per base sequence content: is this good enough mgg Bioinformatics 10 11-06-2013 10:45 PM
FastQC - strange 'per base sequence content' graph gconcepcion Bioinformatics 11 10-31-2011 12:39 AM

Reply
 
Thread Tools
Old 02-28-2017, 07:31 PM   #1
windnature03
Junior Member
 
Location: Hong Kong

Join Date: Nov 2014
Posts: 3
Default Strange per base sequence content from FastQC report

Hi all ,

I am new to bioinformatics. I have encountered some problems with the data analysis(RNA-seq for human, Pair-end 101cycle). The per base sequence content and Kmer content look pretty strange from what i expected.
Here are my questions (please use the attached file for reference)

1.There is a sudden rise of %A around 50bp. Would it be adapter contamination? but the adapter content keeps low throughout the whole run.

2. What is the possible cause of the A/T imbalance?

3. What is the possible cause of peaks around 40-49bp from the Kmer content?

4. why the base quality drops after 50bp?

Can anyone give me some clue on these questions, it's been puzzling me for a week.

Thank you
Attached Images
File Type: jpeg Adapter content.jpeg (109.7 KB, 7 views)
File Type: jpeg GC content.jpeg (145.6 KB, 8 views)
File Type: jpeg K mer content.jpeg (134.4 KB, 10 views)
File Type: jpeg per sequence base quality.jpeg (107.7 KB, 17 views)
File Type: jpeg per sequence content.jpeg (125.8 KB, 29 views)
windnature03 is offline   Reply With Quote
Old 02-28-2017, 08:37 PM   #2
wdecoster
Member
 
Location: Antwerp, Belgium

Join Date: Oct 2015
Posts: 97
Default

For question 1, the increase in %A is not too bad. I think it can be explained that for some reads you start running in the polyA tail, which you might want to trim off.
wdecoster is offline   Reply With Quote
Old 02-28-2017, 09:20 PM   #3
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,232
Default

Would you know what kit was used for library prep. Some kits adapter are different from the ones that FastQC can detect.
nucacidhunter is offline   Reply With Quote
Old 03-01-2017, 12:26 AM   #4
windnature03
Junior Member
 
Location: Hong Kong

Join Date: Nov 2014
Posts: 3
Default

Hi nucacidhunter and wdecoster,

TruSeq RNA Library Prep Kit v2 was used, the link below shows the overpresented adapters sequence

http://imgur.com/a/QI2JV


This is the result from bioanalyzer, the peak lies around 250-300bp. Subtracting the length of the adapter (60bp), the insert should be around 120-130bp. In my opinion, it is less likely for a adatper sequence to be read at 50 cycle .

http://imgur.com/a/LGJvb


Thank you

Last edited by windnature03; 03-01-2017 at 12:51 AM. Reason: correcting image link
windnature03 is offline   Reply With Quote
Old 03-01-2017, 02:50 AM   #5
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,232
Default

Quote:
Originally Posted by windnature03 View Post
Hi all ,

I am new to bioinformatics. I have encountered some problems with the data analysis(RNA-seq for human, Pair-end 101cycle). The per base sequence content and Kmer content look pretty strange from what i expected.
Here are my questions (please use the attached file for reference)

1.There is a sudden rise of %A around 50bp. Would it be adapter contamination? but the adapter content keeps low throughout the whole run.

2. What is the possible cause of the A/T imbalance?

3. What is the possible cause of peaks around 40-49bp from the Kmer content?

4. why the base quality drops after 50bp?
1 and 2- One possible explanation is 3' bias due to input RNA low quality which has increased polyA representation.

3- Sequences TATGCCG and CGTATGC are over-represented Kmers with TATGC overlap. You might check to see if they are from a particular highly expressed gene or spike in RNA if it was used.

4- It does not seem to be library related. You can ask the sequencing centre for an explanation. They can look at other lanes in the same flow cell to see if sequencing reagent or sequencer had any issues.
nucacidhunter is offline   Reply With Quote
Old 03-01-2017, 03:28 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

Quote:
Originally Posted by windnature03 View Post

4. why the base quality drops after 50bp?
It is possible that inserts in your library are smaller than what you had expected. This generally causes adapter read-through and results in Q-score drops.

Have you scanned/trimmed this data for presence of adapters? I recommend you try bbduk.sh from BBMap suite for that purpose. There are threads on SeqAnswers that will guide you on how to use bbduk. You can also use bbmerge.sh or bbmap.sh (if you have a reference genome) from the same suite to estimate your library insert size.
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO