![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fastqc results small RNA run | frymor | Bioinformatics | 4 | 10-24-2013 11:21 AM |
Where is FastQC? | sklages | General | 10 | 02-06-2012 12:46 AM |
fastQC | papori | RNA Sequencing | 3 | 02-04-2012 02:48 PM |
questions of illumina pe reads fastqc results | arrchi | Bioinformatics | 1 | 12-01-2011 04:07 PM |
Fastqc error | Seq84 | Bioinformatics | 0 | 04-27-2011 08:22 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
Hi All,
My data (paired end reads) come from Illumina GA II using RNAseq technology. There are four data files: 5_1, 5_2, 6_1 and 6_2. 5_1 and 5_2 are the pair end reads from Lane 5 on the flowcell. And 6_1 and 6_2 is the pair from Lane 6. The results of each pair(5_1 comparing to 5_2; 6_1 comparing to 6_2) look similar after FASTQC using the raw data. But after quality trimming and adapter trimming (I use the same adapters in this step), the FASTQC results of 5_1 vs 5_2 look different, especially in the FASTQC modules "Per Base Sequence Content" and "Overrepresented Kmers". 6_1 vs 6_2 have the same problem. Because they are paired end reads and be trimmed by the same way, generally the FASTQC results should also look similar. Could anybody give me some reasonable explanations? Any reply will be greatly appreciated. Last edited by byou678; 08-22-2011 at 07:51 AM. |
![]() |
![]() |
![]() |
#2 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
Without seeing your actual data it's really difficult to make any sensible suggestions as to what might be different. If you're seeing differences in the sequence content plots then these will bias the results of the Kmer plots. If you could post 2 of your sequence content plots which look different we might be able to offer more concrete suggestions.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
Any other ideas? Thanks in advance.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
[IMG]C:\Users\whittier.2\Desktop[/IMG]
Last edited by byou678; 08-23-2011 at 06:44 AM. |
![]() |
![]() |
![]() |
#5 |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]() |
![]() |
![]() |
![]() |
#6 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
The links you tried to post point to a file on your desktop which we can't see. You either need to put the images on a public facing webserver, or add them as attachments to your post. To add an attachment go into the 'advanced' posting options and click on the paper clip icon.
|
![]() |
![]() |
![]() |
#7 | |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
Thanks simonandrews!
The picture of " Per Base Sequence Content " of 5_1 is in the attachment. Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
The Picture of " Per Base Sequence of Content" of 5_2 in the following attachment. Please take a look and compare to 5_1 and give me some ideas. Thanks !
|
![]() |
![]() |
![]() |
#9 |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
The Picture of "Overrepresented Kmers" of 5_1 is in the attachment.
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
The Picture of "Overrepresented Kmers" of 5_2 is in the attachment.
All the four pictures above are FASTQC results after quality trimming and adapter trimming. Last edited by byou678; 08-23-2011 at 07:42 AM. |
![]() |
![]() |
![]() |
#11 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
It looks like in the 5_1 sample you are reading through your diverse insert sequence into some kind of adapter. You can actually read part of the adapter sequence from the Kmer plot - it starts with GAGCGGTCA. I've had a quick look but I couldn't match that to any of the standard illumina adapter sequences (it does occur in pBT-6 if that rings any bells), but you should check to see if it matches any of the primers or vectors you've used during your library construction.
If you can figure out the source of the sequence and you're happy the rest of your library is OK then you can rerun the adapter trimmer with this new sequence to remove the additional sequence and your libraries should look more like each other again. |
![]() |
![]() |
![]() |
#12 | |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
Thanks again.
The adapter sequences are as below: 5' TACACTCTTTCCCTACACGACGCTCTTCCGATCT 5 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA 5' GACGGCATACGAGCTCTTCCGATCT 5' AGATCGGAAGAGCTCGTATGCCGTC I use "Trim.pl" to do quality trimming, some scripts are showed below Options: --type <num> 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming. Default 0 --qual-threshold <num> quality threshold for trimming, default 20 --length-threshold <num> length threshold for trimming, default 20 --qual-type <num> 0=sanger qualities, 1=illumina qualities pipeline>=1.3, 2=illumina qualities pipeline<1.3. Default 0. --pair1 <paired end input filename> fastq, paired end file. Must have same number of records as pair2. Required. --pair2 <paired end input filename> fastq, paired end file. Must have same number of records as pair1. Required. --outpair1 <paired end output file> Required. --outpair2 <paired end output file> Required. --single <single end output file> Required. I choose the appropriate characters and values to run quality trimming. Here I use --type 2 ( windowed adaptive trimming); --qual-type 1 (illumina qualities pipeline>=1.3); default values for--qual-threshold <num> and length-threshold <num>. Put the corresponding sequence data names after --pair1 and --pair2 . In addition, name the output files of --outpair1, --output2 and --single. I choose software "cutadapt" for adaptor trimming, I use the following scripts in Terminal to run the 5.1 file (the 5.1 file after quality trimming). The output will be saved in file “5.1_adaptortrim.fastq”. $ cutadapt -b TACACTCTTTCCCTACACGACGCTCTTCCGATCT –b AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA –b GACGGCATACGAGCTCTTCCGATCT –b AGATCGGAAGAGCTCGTATGCCGTC 5.1_trimmed.fastq > 5.1_adaptortrim.fastq Any more suggestions and many thanks!! Quote:
|
|
![]() |
![]() |
![]() |
#13 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
So the problem is that you have a sequence in your library which isn't one of the adapters you passed to cutadapt. I can't immediately see where it's come from, but since cutadapt didn't know about it it didn't remove it, and your trimmed library is still biased. I'd suspect that if you looked at the size distribution of your two libraries after trimming you'll see that one has been trimmed significantly more than the other.
You need to figure out as much of this mystery sequence as you can (either by finding the sequence in one of your primers or by looking at some of your sequences and seeing where the common sequence at the end stops). You can then pass this as an extra sequence to cutadapt which can remove it from your library. |
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
byou678: Have you tried aligning the reads to your reference? Since this is an RNA-seq sample are we seeing just the standard "random primer effect" at the beginning of the read?
|
![]() |
![]() |
![]() |
#15 |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
Yes, align using BWA. Could you explain your second question in detail? Thanks for your reply.
|
![]() |
![]() |
![]() |
#16 | |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
You are right, simonandrews. The size distribution of the two libraries after adaptor trimming are significantly different. The "Basic Statistics" also show the difference as below. Attached is the picture of " Sequence Length Distribution" of 5_1
Measure Value Filename 5.1_adaptortrim.fastq File type Conventional base calls Encoding Illumina 1.5 Total Sequences 33183607 Sequence length 8-76 %GC 44 Measure Value Filename 5.2_adaptortrim.fastq File type Conventional base calls Encoding Illumina 1.5 Total Sequences 33183607 Sequence length 0-76 %GC 42 Quote:
|
|
![]() |
![]() |
![]() |
#17 | |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
Thanks a lot, simnandrews. We can see 5_2 get more trimming than 5_1. Does that mean 5_1 has the mystery ( or contaminated) sequence which didn't get trimmed during the adapter trimming? And 5_2 doesn't have that sequence? So for 5_1, we need to find it out and put it in cutadapt scripts to remove it. In addition, how could i explain this reason and solution in simple words to my boss. Look forward to any kind response.
Attached is the picture of " Sequence Length Distribution " of 5_2 Quote:
Last edited by byou678; 08-23-2011 at 01:47 PM. |
|
![]() |
![]() |
![]() |
#18 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
See the discussion here: http://bioinfo-core.org/index.php/9t...8_October_2010 (4th figure specifically).
This paper may be useful: http://nar.oxfordjournals.org/content/38/12/e131.full Are your alignments with bwa looking ok? |
![]() |
![]() |
![]() |
#19 | |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
Thanks for the Info you offered. The last step of bwa alignment doesn' move smothly, it has taken a long time which is not expected, and it is still running now.
Could you take a look at the above threads again and more ideas will be greatly appreciated! Quote:
|
|
![]() |
![]() |
![]() |
Tags |
fastqc, rnaseq |
Thread Tools | |
|
|