SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove subset of good reads to improve genome assembly NYGen De novo discovery 1 04-14-2015 02:46 PM
Bad assembly or bad sequence data? cyanoevo Metagenomics 7 01-31-2015 09:00 AM
Good RIN, bad mRNA? SeqVicious Sample Prep / Library Generation 4 01-05-2015 02:20 PM
Would RNA quality/quantity be good enough for sequencing when the liver tumor samples woodydon RNA Sequencing 2 04-03-2014 04:34 AM

Reply
 
Thread Tools
Old 09-16-2015, 03:18 PM   #1
Elsie
Member
 
Location: Australia

Join Date: Mar 2011
Posts: 85
Default Genome assembly - 2 similar samples - one good, one bad

Hi, currently doing genome assemblies on 2 very similar samples.

- 1 assembled brilliantly very quickly.
- 1 is highly fragmented

Any reason why one sample should behave so differently from another? - same sequence (Illumina HiSeq), same heterozygosity and repeat content, both screened for contaminants, both collected together, FastQC very similar for both, same assembly methodology, adapter trimmed.

Thoughts:
- adapters in the middle of reads?
- could a virus have inserted itself?

Any comments welcomed.
Elsie is offline   Reply With Quote
Old 09-18-2015, 10:49 AM   #2
Smurali
Junior Member
 
Location: Houston

Join Date: Mar 2013
Posts: 4
Default

Hmm.. interesting.

Some thoughts:
1. Was one more inbred than the other perhaps?
2. Did you check the insert sizes of the libraries? I'm thinking perhaps the mate pair library for the poor assembly resulting one wasn't as good as the other one.
3. Also, I have seen adapters in the middle of reads. You can quickly check for this if you know the adapter sequence.
4. I'm thinking if there was a virus, the virus sequence's kmer coverage would've been high enough for the assembler (de-bruijn graph based ones) to screen it out.
Smurali is offline   Reply With Quote
Old 09-20-2015, 03:18 PM   #3
Elsie
Member
 
Location: Australia

Join Date: Mar 2011
Posts: 85
Default

Thanks Smurali.

These are field samples, not inbred. Everything is pointing towards a virus being integrated into the chromosome. Will do some more work on this and if I find out anything useful, will add another post.
Elsie is offline   Reply With Quote
Old 09-21-2015, 08:16 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by Elsie View Post
Thoughts:
- adapters in the middle of reads?
- could a virus have inserted itself?

Any comments welcomed.
Even if there were adapters in the middle of reads, they still would have been trimmed. And I don't see why a virus would cause a poor assembly, unless it randomly inserted itself into a different place in every cell. If it inserted itself once, then the cell replicated, you'd still get a good assembly.

It sounds more like cancer to me (depending on the organism), or degraded DNA. Have you looked at the insert size distribution and actual error rates of mapped reads (as opposed to just the quality scores)? Also, what is the read length, target insert size, and specific Illumina platform (e.g. HS2500) and run mode, and what kind of organism is it? Diploid or haploid? ...etc.

Last edited by Brian Bushnell; 09-21-2015 at 08:19 AM.
Brian Bushnell is offline   Reply With Quote
Old 09-21-2015, 01:13 PM   #5
Elsie
Member
 
Location: Australia

Join Date: Mar 2011
Posts: 85
Default

Thanks for the comments Brian.
100bp PE, belong to the Hymenoptera order. Current evidence is pointing towards Polydnaviruses.
Elsie is offline   Reply With Quote
Old 09-22-2015, 10:41 AM   #6
Smurali
Junior Member
 
Location: Houston

Join Date: Mar 2013
Posts: 4
Default

Wow. this is certainly interesting.
I can only think of something external that somehow passed through your contamination screening and made it to the sequencing so the virus looks highly possible here.
When we sequenced and assembled a bunch of arthropods before, the final assembly sometimes did have a lot of contamination from Homo sapiens (on blood feeders), plants and viruses, so this is expected.
However, I am still intrigued by the fact that it is causing the assembly to be so highly fragmented. Are you going to try and assemble after removing the reads belonging to the virus, Elsie?
Smurali is offline   Reply With Quote
Reply

Tags
genome assembly

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:37 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO