SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastQC per base sequence content analyst Bioinformatics 14 02-15-2017 06:25 AM
Kmer content subuhikhan General 9 03-05-2012 12:05 AM
kmer content in the first bases of Illumina sequence brachysclereid Bioinformatics 2 01-09-2012 02:54 PM
FastQC - strange 'per base sequence content' graph gconcepcion Bioinformatics 11 10-31-2011 12:39 AM
FastQC "Per Base Sequence Content": systematic deviation at 3' end of reads d f Illumina/Solexa 4 09-28-2010 09:46 AM

Reply
 
Thread Tools
Old 02-08-2012, 02:09 PM   #1
mgg
Member
 
Location: London, UK

Join Date: Nov 2011
Posts: 12
Default FastQC,kmer content, per base sequence content: is this good enough

Hi,

I'd appreciate some advice on processing some Illumina libraries

Initial FastQC runs showed the data as not great. I've used cutadapt to trim off adapters and FastQC shows improvements to all libraries.

One remains of concern, because it still retains kmer and other issues (I've attached files for kmer content & per base sequence content for both the original and the processed data)

My question is simple: is this good enough? (my next step is assembly with velvet) Does this data need some further processing before Velvet? If so, with what? I've considered trimming off the first 10nuc to remove the anomalous per_base_sequence_content trace, but that would do little for the persistent kmers.

If this were your data, what would you do before velvet assembly?

thanks
mgg

for the record my cutadapt commands are below

PHP Code:
# trim reads/2
cutadapt -b AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --minimum-length=10  --overlap=--quality-base=64 --quality-cutoff=--match-read-wildcards infile_2.fq -o processed/outfile_2.fq --wildcard-file=processed/outfile_2.fq.wildcard

# trim reads/1
cutadapt -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC --minimum-length=10  --overlap=--quality-base=64 --quality-cutoff=--match-read-wildcards infile_1.fq -o processed/outfile_1.fq --wildcard-file=processed/outfile_1.fq.wildcard 
Attached Images
File Type: png raw.kmer_profiles.png (82.7 KB, 442 views)
File Type: png raw.per_base_sequence_content.png (30.5 KB, 280 views)
File Type: png processed-kmer_profiles.png (41.2 KB, 310 views)
File Type: png processed-per_base_sequence_content.png (20.9 KB, 265 views)
mgg is offline   Reply With Quote
Old 08-25-2012, 07:09 PM   #2
minoru_harvest
Junior Member
 
Location: China

Join Date: Aug 2012
Posts: 5
Default

yeah. i got the same question
i have a very similar graph with your prosessed-per-base-sequencecontent
minoru_harvest is offline   Reply With Quote
Old 08-25-2012, 09:33 PM   #3
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Looks like you have some base pair bias issues going on from bases 1-10 in your reads. You should trim those off.
Wallysb01 is offline   Reply With Quote
Old 11-04-2013, 12:32 PM   #4
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 237
Default

Hello everybody,

I come back to this topic which fits well to my interrogation: I would like your point of view on my RNA-Seq data (paired-ends, 100bp) generated by an Illumina HiSeq 2000 machine.
I attached the "Per Base sequence Quality" and "Kmer Content" for 3 examples. In the first one, the library was prepared using polyA method. The 2 next examples were performed by ribodepletion. I would like to know if my data are "good enough" despite these 2 last profiles and if there is an explanation for this increase of A/T sequence along the read?

I have the feeling from these examples and some others that the "Kmer Content profile" depends on the library preparation (ribodepletion vs polyA), the run (samples from a same run show a similar profile) and the sample itself (I observed similar profiles for a same sample ran on 2 different runs). Is this true?

Thank you,
Jane
Jane M is offline   Reply With Quote
Old 11-04-2013, 12:57 PM   #5
fahmida
Member
 
Location: Australia

Join Date: Aug 2010
Posts: 54
Default

Quote:
Originally Posted by mgg View Post
Hi,

I'd appreciate some advice on processing some Illumina libraries

Initial FastQC runs showed the data as not great. I've used cutadapt to trim off adapters and FastQC shows improvements to all libraries.

One remains of concern, because it still retains kmer and other issues (I've attached files for kmer content & per base sequence content for both the original and the processed data)

My question is simple: is this good enough? (my next step is assembly with velvet) Does this data need some further processing before Velvet? If so, with what? I've considered trimming off the first 10nuc to remove the anomalous per_base_sequence_content trace, but that would do little for the persistent kmers.

If this were your data, what would you do before velvet assembly?

thanks
mgg

for the record my cutadapt commands are below

PHP Code:
# trim reads/2
cutadapt -b AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --minimum-length=10  --overlap=--quality-base=64 --quality-cutoff=--match-read-wildcards infile_2.fq -o processed/outfile_2.fq --wildcard-file=processed/outfile_2.fq.wildcard

# trim reads/1
cutadapt -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC --minimum-length=10  --overlap=--quality-base=64 --quality-cutoff=--match-read-wildcards infile_1.fq -o processed/outfile_1.fq --wildcard-file=processed/outfile_1.fq.wildcard 
Are these reads from mate pair libraries? You may also want to check the read duplication levels in that case.
fahmida is offline   Reply With Quote
Old 11-05-2013, 10:44 AM   #6
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 237
Default

I come back to my previous question because I still have doubts concerning the quality of my data. Any feedback would be appreciated
Jane M is offline   Reply With Quote
Old 11-05-2013, 10:47 AM   #7
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

I didn't see the attachment. But from what you describe, it sounds ok.
Wallysb01 is offline   Reply With Quote
Old 11-05-2013, 11:03 AM   #8
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 237
Default

Quote:
Originally Posted by Wallysb01 View Post
I didn't see the attachment. But from what you describe, it sounds ok.
Oups, I forgot to attach the file!
Attached Files
File Type: pdf FastQC_plots.pdf (430.1 KB, 307 views)
Jane M is offline   Reply With Quote
Old 11-06-2013, 12:41 PM   #9
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 237
Default

Quote:
Originally Posted by Jane M View Post
Oups, I forgot to attach the file!
Any comment with the attachment?
Jane M is offline   Reply With Quote
Old 11-06-2013, 01:01 PM   #10
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Looks good enough for mapping. Might want to see if you have some adapter contamination in the first one. I've often found weird suden spikes of particular kmers are the adapters.
Wallysb01 is offline   Reply With Quote
Old 11-06-2013, 10:45 PM   #11
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 237
Default

Thank you Wallysb01.

Isn't it suprising to see an increase of AAAAA and TTTTT all along the read? It shoulb be constant, right?, like in the first case.
Why is there such a difference between polyA and ribodepletion?

Do all the "normal/good profiles" of these 2 methods always differ?
Jane M is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:06 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO