SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastQC per base sequence content analyst Bioinformatics 14 02-15-2017 07:25 AM
[FastQC]Strange Per Sequence GC Content Kaidy Illumina/Solexa 9 03-23-2015 02:46 PM
[FASTQC] Biases in GC whole sequence content kazi1 Bioinformatics 9 06-03-2014 12:20 AM
FastQC,kmer content, per base sequence content: is this good enough mgg Bioinformatics 10 11-06-2013 11:45 PM
FastQC Report: A horn in Per sequence GC content ?! leekb Illumina/Solexa 3 10-23-2013 10:07 PM

Reply
 
Thread Tools
Old 11-11-2014, 03:12 AM   #1
standonn
Member
 
Location: UK

Join Date: Nov 2014
Posts: 14
Default FastQC: 2 peak per sequence GC content

Dear all,

I have some genomic pair-end data from a nematode. I ran FASTQC to have an overview of the data.

I was surprised to see a "Per sequence GC content" graph with 2 peaks (see image attached).
I ran trimmomatic but the graph of per sequence GC content remained the same.
Do you know why I get this profile?

Best,
Sophie
Attached Images
File Type: png Screen Shot 2014-11-11 at 11.05.46.png (70.5 KB, 65 views)
standonn is offline   Reply With Quote
Old 11-11-2014, 03:59 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,992
Default

It *may* be indicative of contamination from an unrelated species/source. Have you tried to analyze the data? Is this a simple WGS experiment?
GenoMax is offline   Reply With Quote
Old 11-12-2014, 01:49 AM   #3
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

What should be the normal GC content? 41? Is there anything within the genome, which could have the other GC content?

I had once also 2 peaks in some samples.
Was a low GC bacterium (30%). The second peak (50%) turned out to be totally from the rRNA operons within this bacterium. Our guess was that the GC bias of the adapter ligation kicked somehow in, and ruined the dataset. The supplier doesn't know what happened.
I'm not sure if that could be the case here, because I don't know if you have biological differences within the DNA in your sample, but is probably worth checking.
bastianwur is offline   Reply With Quote
Old 11-12-2014, 04:51 AM   #4
standonn
Member
 
Location: UK

Join Date: Nov 2014
Posts: 14
Default

Hello GenoMax and bastianwur,

Thanks a lot for your answers.

We donīt know what the GC content is for this species. We do think it is around 35-40% as in other worms.

After talking to the people in my lab, the second peak around 70% could very much be due to a bacterium present in the gut of the worm.

Otherwise, the strain used is inbred but I believe still presents biological differences. I wouldnīt say that would explain the 2nd peak though.

Do you think it is still possible to do a genome assembly on this data?

Anyhow, thanks for your answers,
Sophie
standonn is offline   Reply With Quote
Old 11-12-2014, 05:03 AM   #5
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

We're normally assembling here meta-genomes and -transcriptomes, and haven't encountered many problems with the different species.
One of my colleagues has a paper in submission, where they investigated that and got very little false assemblies.
-> assembling 2 totally different organisms from this dataset shouldn't be a problem.
You might have to do some QA though, to ensure that everything gets corretly assigned/separated.
bastianwur is offline   Reply With Quote
Old 11-12-2014, 12:50 PM   #6
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 504
Default

Hi Sophie,

We observed a similar bimodal distribution from C. elegans samples contaminated with Streptomyces (and the relative height of the high-GC peak varied with the degree of contamination). You could BLAST a sampling of the GC-rich reads and see if they match any known species.
HESmith is offline   Reply With Quote
Old 11-12-2014, 05:14 PM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,992
Default

If you know what that bacterium (present in the gut) is (and if a genome is available for that species or a close relative) you could try to separate your reads into two pools before trying assembly.

You can do that easily with BBSplit.
GenoMax is offline   Reply With Quote
Old 11-21-2014, 10:32 AM   #8
standonn
Member
 
Location: UK

Join Date: Nov 2014
Posts: 14
Default

Dear all,

Sorry for the late reply.
Thanks a lot for your answers! They were much appreciated.

Unfortunately, I donīt know the gut bacterium of this nematode. But Iīll try doing what HESmith suggested and see if its sequenced Iīll do what GenoMax suggested.

To Genomax: thanks for telling me about BBSplit! I didnīt know about that tool.

To Bastianwur: Your message made me very happy! It is very good to know that there shouldnīt be problems assembling this peculiar data. Good luck for the publishing!

Cheers,
Sophie
standonn is offline   Reply With Quote
Reply

Tags
fastqc, gc content, genomic data

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:23 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO