Seqanswers Leaderboard Ad

**Brian Bushnell** · 05-28-2014, 02:53 PM

Have you BLASTed the libraries to look for contamination?

Also, could that shoulder simply be poly-A sequences? You might try trimming poly-A tails and then rerunning FastQC.

**kazi1** · 05-28-2014, 04:22 PM

Although there was some TruSeq adapter contamination in this sample intially (2% of reads), dumping the low quality reads and trimming off the 5' ends got rid of it. There weren't any overrepresented sequences or enriched k-mers whatsoever (like poly-A sequences) after QC. I'm not sure what else I would be BLASTing in the libraries.

However, after doing a bit of brainstorming, I might have a hypothesis for what the shoulder is. A lot of insect species (including D. melanogaster and A. aegypi) can be infected by a bacteria called Wolbachia (especially common in laboratory stocks). I checked the GC content for the A. aegypti transcriptome (which is the sample I posted here) and it's about ~50%, which corresponds to the main peak of the graphs I posted. The GC content of the Wolbachia genome is ~35%, which would match the second peak/shoulder. If this is the case, I'd find a bunch of Wolbachia-specific genes when I assemble the transcriptome. I could potentially mask out the Wolbachia contamination later when I start performing expression counting.

(But I'm not quite to that point in my analysis yet, so I'll let you know what happens and post back here when I do. I'm going to be pretty amused if it turns out all 3 laboratories have a massive Wolbachia problem...)

**Brian Bushnell** · 05-28-2014, 04:57 PM

Right, BLAST would potentially let you know if it was bacterial, and if so what species, so you can better filter it.

**pmiguel** · 05-29-2014, 06:42 AM

Hi Kazi1,
Seems like you are default presuming that a transcriptome GC% profile should not be bimodal. Why would you presume that? I mean other than FastQC giving you a big red "X" next to "Per sequence GC content".

--
Phillip

**kazi1** · 05-29-2014, 11:10 AM

It's true, I've made the assumption that it shouldn't be bimodal simply on the basis of the "big red X" in FastQC. I haven't done that much bioinformatics work before, so I've been working through and trying to figure out what each of the QC flags mean. I got 3 red flags from FastQC right now: "per base sequence content" (from the random hexamer priming), "sequence duplication" (from the high level of coverage), and the "per sequence GC content". The "per sequence GC content" is the only one I can't explain.

I know that FastQC is optimized for genomic DNA reads, so perhaps its just sending up that flag unnecessarily when dealing with RNA-Seq data? It'd be great if that's just the way transcriptomic data looks normally. I just wanted some second opinions (from people with more experience with FastQC/RNA-Seq).

**GenoMax** · 05-29-2014, 11:52 AM

The big red "x" in FastQC are not an immediate indication of that step failing completely. Since you expect to see coexistence of an unrelated species (wolbachia), seeing strange GC distribution would be acceptable for your data.

**pmiguel** · 05-30-2014, 06:45 AM

Even in genomic DNA libraries I occasionally see bimodal (or trimodal) distributions of GC% in that plot. Although contamination (or infestation) of the sample with another species is possible, I see no reason to presume it is the case.

Still no harm pulling out a few thousand representative reads from the two peaks and blasting them to see if you get lots of best hits to a different phylum or kingdom. But it could be a waste of time and might even lead you to throwing out data that actually should be kept.

The big red "X" issue is one that plagues us occasionally. You just need to take it in stride. It is just a program. You don't want to turn off your brain when using it.
--
Phillip

**kazi1** · 06-02-2014, 03:02 PM

Ok good to keep in mind! Thanks to all for your advice!

**mikep** · 06-02-2014, 11:20 PM

Originally posted by kazi1 View Post

I'm going to be pretty amused if it turns out all 3 laboratories have a massive Wolbachia problem...)

Alot of insects have Wolbachia integrated into their chromosomes. Might not be contamination/infection.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

[FASTQC] Biases in GC whole sequence content

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News