SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
need help to solve bowtie2-inspect error Shishir Bioinformatics 1 07-18-2013 09:31 AM
repeat sequences/large files in galaxy Giles Bioinformatics 2 06-27-2011 11:08 AM
Repeat masking Amy S Bioinformatics 2 06-02-2011 03:32 PM
PubMed: Great expectations: using massively parallel sequencing to solve inherited di Newsbot! Literature Watch 0 10-23-2010 02:01 AM

Reply
 
Thread Tools
Old 11-26-2013, 06:46 AM   #1
ebioman
Member
 
Location: Switzerland

Join Date: Aug 2013
Posts: 41
Default How to solve a large repeat

Hello
I am having trouble to solve a large repeat which is present in plant chloroplasts and wonder what would be the easiest or best strategy to solve that ?

The problem is that the repeat seems to be around 20kb and one can easily spot it, since my sam file shows 2-3 more coverage for that region.

For chloroplasts this large repeat is known but I dont know how to tackle the problem.
ebioman is offline   Reply With Quote
Old 11-26-2013, 01:55 PM   #2
jimmybee
Senior Member
 
Location: Adelaide, Australia

Join Date: Sep 2010
Posts: 119
Default

Have a look at any paper that constructs whole chloroplasts and there should be methods to make them. From memory, I think this paper uses a particular assembly strategy to make whole chloroplasts in Eucalyptus (haven't completely been through it though but worth a read -> http://www.ncbi.nlm.nih.gov/pubmed/23876290)

I know this won't sound helpful, but have you considered not using the inverted repeat region? If you're using it for a phylogenetic method, then it might be worth using the non-repeat region for analyses. Many people do this
jimmybee is offline   Reply With Quote
Old 11-26-2013, 11:49 PM   #3
ebioman
Member
 
Location: Switzerland

Join Date: Aug 2013
Posts: 41
Default

Thanks jimmybee for the quick response !

Indeed, I am wondering as well whether it actually even make sense to try to finalize it.
The problem stems a little bit from our client who wanted me to get the final version. Upon your suggestion I hesitated again and analyzed the repeat of a very close reference - voila, there is 0% sequence difference !
I think I will give it today a last try and otherwise tell them that it likely wont be possible without designing particular primers for that region and check again.
ebioman is offline   Reply With Quote
Old 11-27-2013, 05:38 AM   #4
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Quote:
Originally Posted by ebioman View Post
I think I will give it today a last try and otherwise tell them that it likely wont be possible without designing particular primers for that region and check again.
The more expensive option is to add PacBio data - these reads now should be able to span the repeat. See this paper: http://www.biomedcentral.com/1471-2164/14/670
flxlex is offline   Reply With Quote
Old 11-27-2013, 11:08 PM   #5
ebioman
Member
 
Location: Switzerland

Join Date: Aug 2013
Posts: 41
Default

@flxlex you will laugh now, but that is what I am actually dealing with.

I am only facing the problem that I got so far only the ALLORA module running for the chloroplast. I have an entire genome sequenced and only filtered in a second step for chloroplast reads ending up with fasta files which are unsuitable for the HGAP module. ALLORA was not sensitive enough to get that repeat solved by itself and even though I tried many parameters it always mapped all my long reads onto that repeat. I think one problem is the high error rate of PacBio and the low diversity between these regions (if there is even any). On top of it, this region it 20kb long - and my average reads are "only" 2kb.

Luckily, I have also Illumina reads and will try to take all reads which map there and then try to assemble again with very strict settings. Maybe that will help.

Side note: For other regions of the chloroplasts for which Illumina had trouble to solve it, PacBio was doing great !
ebioman is offline   Reply With Quote
Old 11-29-2013, 02:40 AM   #6
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Yes, depending on the version of the PacBio chemistry, or how well the library preparation and sequencing went, the number of reads that actually will span your very long repeat will vary...
flxlex is offline   Reply With Quote
Old 11-29-2013, 09:31 AM   #7
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

If your Illumina reads are paired-end, then you might be able to get away with looking at how they map to your contig edges, and from there you can resolve which repeat goes where. This would assume that you're pair distance is large enough to span any missing sequence in your contigs, and has a tight size distribution, but it can give you a decent start at trying to at least placing the repeats in the correct place. Then you might be able to map your PB reads back to figure out the missing bits and polish it up with the Illumina?
mcnelson.phd is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:24 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO