SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Fastqc results small RNA run frymor Bioinformatics 4 10-24-2013 10:21 AM
Where is FastQC? sklages General 10 02-05-2012 11:46 PM
fastQC papori RNA Sequencing 3 02-04-2012 01:48 PM
questions of illumina pe reads fastqc results arrchi Bioinformatics 1 12-01-2011 03:07 PM
Fastqc error Seq84 Bioinformatics 0 04-27-2011 07:22 AM

Reply
 
Thread Tools
Old 08-19-2011, 02:48 PM   #1
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Question Need help for FastQC results. Thanks!!

Hi All,

My data (paired end reads) come from Illumina GA II using RNAseq technology. There are four data files: 5_1, 5_2, 6_1 and 6_2. 5_1 and 5_2 are the pair end reads from Lane 5 on the flowcell. And 6_1 and 6_2 is the pair from Lane 6. The results of each pair(5_1 comparing to 5_2; 6_1 comparing to 6_2) look similar after FASTQC using the raw data.

But after quality trimming and adapter trimming (I use the same adapters in this step), the FASTQC results of 5_1 vs 5_2 look different, especially in the FASTQC modules "Per Base Sequence Content" and "Overrepresented Kmers". 6_1 vs 6_2 have the same problem. Because they are paired end reads and be trimmed by the same way, generally the FASTQC results should also look similar. Could anybody give me some reasonable explanations? Any reply will be greatly appreciated.

Last edited by byou678; 08-22-2011 at 06:51 AM.
byou678 is offline   Reply With Quote
Old 08-19-2011, 11:45 PM   #2
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Without seeing your actual data it's really difficult to make any sensible suggestions as to what might be different. If you're seeing differences in the sequence content plots then these will bias the results of the Kmer plots. If you could post 2 of your sequence content plots which look different we might be able to offer more concrete suggestions.
simonandrews is offline   Reply With Quote
Old 08-22-2011, 07:13 AM   #3
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

Any other ideas? Thanks in advance.
byou678 is offline   Reply With Quote
Old 08-23-2011, 05:41 AM   #4
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

[IMG]C:\Users\whittier.2\Desktop[/IMG]

Last edited by byou678; 08-23-2011 at 05:44 AM.
byou678 is offline   Reply With Quote
Old 08-23-2011, 05:42 AM   #5
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

byou678 is offline   Reply With Quote
Old 08-23-2011, 05:47 AM   #6
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

The links you tried to post point to a file on your desktop which we can't see. You either need to put the images on a public facing webserver, or add them as attachments to your post. To add an attachment go into the 'advanced' posting options and click on the paper clip icon.
simonandrews is offline   Reply With Quote
Old 08-23-2011, 06:06 AM   #7
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

Thanks simonandrews!

The picture of " Per Base Sequence Content " of 5_1 is in the attachment.

Quote:
Originally Posted by simonandrews View Post
The links you tried to post point to a file on your desktop which we can't see. You either need to put the images on a public facing webserver, or add them as attachments to your post. To add an attachment go into the 'advanced' posting options and click on the paper clip icon.
Attached Images
File Type: png per_base_sequence_content.png (37.4 KB, 177 views)
byou678 is offline   Reply With Quote
Old 08-23-2011, 06:09 AM   #8
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

The Picture of " Per Base Sequence of Content" of 5_2 in the following attachment. Please take a look and compare to 5_1 and give me some ideas. Thanks !
Attached Images
File Type: png per_base_sequence_content.png (35.7 KB, 75 views)
byou678 is offline   Reply With Quote
Old 08-23-2011, 06:13 AM   #9
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

The Picture of "Overrepresented Kmers" of 5_1 is in the attachment.
Attached Images
File Type: png kmer_profiles.png (96.8 KB, 113 views)
byou678 is offline   Reply With Quote
Old 08-23-2011, 06:15 AM   #10
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

The Picture of "Overrepresented Kmers" of 5_2 is in the attachment.

All the four pictures above are FASTQC results after quality trimming and adapter trimming.
Attached Images
File Type: png Screen shot 2011-08-23 at 10.40.51 AM.png (89.9 KB, 78 views)

Last edited by byou678; 08-23-2011 at 06:42 AM.
byou678 is offline   Reply With Quote
Old 08-23-2011, 07:02 AM   #11
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

It looks like in the 5_1 sample you are reading through your diverse insert sequence into some kind of adapter. You can actually read part of the adapter sequence from the Kmer plot - it starts with GAGCGGTCA. I've had a quick look but I couldn't match that to any of the standard illumina adapter sequences (it does occur in pBT-6 if that rings any bells), but you should check to see if it matches any of the primers or vectors you've used during your library construction.

If you can figure out the source of the sequence and you're happy the rest of your library is OK then you can rerun the adapter trimmer with this new sequence to remove the additional sequence and your libraries should look more like each other again.
simonandrews is offline   Reply With Quote
Old 08-23-2011, 07:38 AM   #12
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

Thanks again.

The adapter sequences are as below:
5' TACACTCTTTCCCTACACGACGCTCTTCCGATCT
5 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
5' GACGGCATACGAGCTCTTCCGATCT
5' AGATCGGAAGAGCTCGTATGCCGTC

I use "Trim.pl" to do quality trimming, some scripts are showed below

Options:
--type <num> 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming. Default 0
--qual-threshold <num> quality threshold for trimming, default 20
--length-threshold <num> length threshold for trimming, default 20
--qual-type <num> 0=sanger qualities, 1=illumina qualities pipeline>=1.3, 2=illumina qualities pipeline<1.3. Default 0.
--pair1 <paired end input filename> fastq, paired end file. Must have same number of records as pair2. Required.
--pair2 <paired end input filename> fastq, paired end file. Must have same number of records as pair1. Required.
--outpair1 <paired end output file> Required.
--outpair2 <paired end output file> Required.
--single <single end output file> Required.


I choose the appropriate characters and values to run quality trimming. Here I use --type 2 ( windowed adaptive trimming); --qual-type 1 (illumina qualities pipeline>=1.3); default values for--qual-threshold <num> and length-threshold <num>.
Put the corresponding sequence data names after --pair1 and --pair2 . In addition, name the output files of --outpair1, --output2 and --single.

I choose software "cutadapt" for adaptor trimming, I use the following scripts in Terminal to run the 5.1 file (the 5.1 file after quality trimming). The output will be saved in file 5.1_adaptortrim.fastq.
$ cutadapt -b TACACTCTTTCCCTACACGACGCTCTTCCGATCT b AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA b GACGGCATACGAGCTCTTCCGATCT b AGATCGGAAGAGCTCGTATGCCGTC 5.1_trimmed.fastq > 5.1_adaptortrim.fastq

Any more suggestions and many thanks!!



Quote:
Originally Posted by simonandrews View Post
It looks like in the 5_1 sample you are reading through your diverse insert sequence into some kind of adapter. You can actually read part of the adapter sequence from the Kmer plot - it starts with GAGCGGTCA. I've had a quick look but I couldn't match that to any of the standard illumina adapter sequences (it does occur in pBT-6 if that rings any bells), but you should check to see if it matches any of the primers or vectors you've used during your library construction.

If you can figure out the source of the sequence and you're happy the rest of your library is OK then you can rerun the adapter trimmer with this new sequence to remove the additional sequence and your libraries should look more like each other again.
byou678 is offline   Reply With Quote
Old 08-23-2011, 07:49 AM   #13
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

So the problem is that you have a sequence in your library which isn't one of the adapters you passed to cutadapt. I can't immediately see where it's come from, but since cutadapt didn't know about it it didn't remove it, and your trimmed library is still biased. I'd suspect that if you looked at the size distribution of your two libraries after trimming you'll see that one has been trimmed significantly more than the other.

You need to figure out as much of this mystery sequence as you can (either by finding the sequence in one of your primers or by looking at some of your sequences and seeing where the common sequence at the end stops). You can then pass this as an extra sequence to cutadapt which can remove it from your library.
simonandrews is offline   Reply With Quote
Old 08-23-2011, 08:43 AM   #14
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

byou678: Have you tried aligning the reads to your reference? Since this is an RNA-seq sample are we seeing just the standard "random primer effect" at the beginning of the read?
GenoMax is offline   Reply With Quote
Old 08-23-2011, 09:42 AM   #15
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

Yes, align using BWA. Could you explain your second question in detail? Thanks for your reply.



Quote:
Originally Posted by GenoMax View Post
byou678: Have you tried aligning the reads to your reference? Since this is an RNA-seq sample are we seeing just the standard "random primer effect" at the beginning of the read?
byou678 is offline   Reply With Quote
Old 08-23-2011, 10:23 AM   #16
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

You are right, simonandrews. The size distribution of the two libraries after adaptor trimming are significantly different. The "Basic Statistics" also show the difference as below. Attached is the picture of " Sequence Length Distribution" of 5_1

Measure Value
Filename 5.1_adaptortrim.fastq
File type Conventional base calls
Encoding Illumina 1.5
Total Sequences 33183607
Sequence length 8-76
%GC 44

Measure Value
Filename 5.2_adaptortrim.fastq
File type Conventional base calls
Encoding Illumina 1.5
Total Sequences 33183607
Sequence length 0-76
%GC 42

Quote:
Originally Posted by simonandrews View Post
So the problem is that you have a sequence in your library which isn't one of the adapters you passed to cutadapt. I can't immediately see where it's come from, but since cutadapt didn't know about it it didn't remove it, and your trimmed library is still biased. I'd suspect that if you looked at the size distribution of your two libraries after trimming you'll see that one has been trimmed significantly more than the other.

You need to figure out as much of this mystery sequence as you can (either by finding the sequence in one of your primers or by looking at some of your sequences and seeing where the common sequence at the end stops). You can then pass this as an extra sequence to cutadapt which can remove it from your library.
Attached Images
File Type: png sequence_length_distribution.png (24.3 KB, 37 views)
byou678 is offline   Reply With Quote
Old 08-23-2011, 10:36 AM   #17
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

Thanks a lot, simnandrews. We can see 5_2 get more trimming than 5_1. Does that mean 5_1 has the mystery ( or contaminated) sequence which didn't get trimmed during the adapter trimming? And 5_2 doesn't have that sequence? So for 5_1, we need to find it out and put it in cutadapt scripts to remove it. In addition, how could i explain this reason and solution in simple words to my boss. Look forward to any kind response.

Attached is the picture of " Sequence Length Distribution " of 5_2


Quote:
Originally Posted by simonandrews View Post
So the problem is that you have a sequence in your library which isn't one of the adapters you passed to cutadapt. I can't immediately see where it's come from, but since cutadapt didn't know about it it didn't remove it, and your trimmed library is still biased. I'd suspect that if you looked at the size distribution of your two libraries after trimming you'll see that one has been trimmed significantly more than the other.

You need to figure out as much of this mystery sequence as you can (either by finding the sequence in one of your primers or by looking at some of your sequences and seeing where the common sequence at the end stops). You can then pass this as an extra sequence to cutadapt which can remove it from your library.
Attached Images
File Type: png sequence_length_distribution.png (23.8 KB, 20 views)

Last edited by byou678; 08-23-2011 at 12:47 PM.
byou678 is offline   Reply With Quote
Old 08-23-2011, 11:29 AM   #18
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

See the discussion here: http://bioinfo-core.org/index.php/9t...8_October_2010 (4th figure specifically).

This paper may be useful: http://nar.oxfordjournals.org/content/38/12/e131.full

Are your alignments with bwa looking ok?

Quote:
Originally Posted by byou678 View Post
Yes, align using BWA. Could you explain your second question in detail? Thanks for your reply.
GenoMax is offline   Reply With Quote
Old 08-23-2011, 01:53 PM   #19
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

Thanks for the Info you offered. The last step of bwa alignment doesn' move smothly, it has taken a long time which is not expected, and it is still running now.

Could you take a look at the above threads again and more ideas will be greatly appreciated!

Quote:
Originally Posted by GenoMax View Post
See the discussion here: http://bioinfo-core.org/index.php/9t...8_October_2010 (4th figure specifically).

This paper may be useful: http://nar.oxfordjournals.org/content/38/12/e131.full

Are your alignments with bwa looking ok?
byou678 is offline   Reply With Quote
Reply

Tags
fastqc, rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO