SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
miRNA target prediction for non-model-organism MichalGordon Bioinformatics 2 06-21-2013 01:44 AM
De novo assembly of transcripts originating from specific target regions rboettcher Bioinformatics 5 10-23-2012 10:19 AM
miRNA Target filtering from miRNA-list Palgrave Bioinformatics 0 04-07-2012 07:04 AM
filtering out reads from abundant transcripts before using velevt Marco Bioinformatics 2 02-03-2010 10:54 AM

Reply
 
Thread Tools
Old 10-09-2013, 12:03 PM   #1
jbono
Junior Member
 
Location: Colorado

Join Date: May 2013
Posts: 4
Default Filtering out transcripts from non target organism

Hi,

I am assembling a transcriptome for a Drosophila species without a reference genome (my species diverged from the most closely related with genome about 15 mya). I used Trinity for the assembly, which constructed over 65K components (which I assume is sort of like a gene). I'm guessing that a lot of the sequences are from non target species (e.g. bacteria, yeasts, cactus) as larvae were taken directly from their food source. Is there an easy way to identify and get rid of the bulk of the transcripts that come from non target species (e.g. using BLAST or something else)? All trinity transcripts are currently in FASTA format. I'm not particularly savvy with bioinformatics, so I'm sure if there is an easy pipeline I could use? Thanks!
jbono is offline   Reply With Quote
Old 10-09-2013, 03:47 PM   #2
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default

you might have assembled chimeric transcripts by using all the reads from different sources. Kind of like a metagenomic assembly, so you might want to read some papers that contain information on handling this sort of data.

I would set up a 'contaminant' database containing all your non-target species, use a short read mapper (bowtie, bwa) to filter out reads that align to this database (i.e. take only the reads that didn't align to the contaminant database), and rerun trinity with only reads that didn't align.

Otherwise as you mention you could make a blast database of your non-target sequences, and align your current assembly to it and take only those that did not align. I'm not aware of any pipeline that would automate this. You'll need to take all the component IDs that did align to this non-target database, and then subtract this from your original fasta file. This would best be done in command line, using bash or perl.
The problem with this method is that there is a chance that your target sequences might also align to this non-target database, so you'll need to decide on some thresholds.
Kennels is offline   Reply With Quote
Old 10-09-2013, 04:14 PM   #3
jbono
Junior Member
 
Location: Colorado

Join Date: May 2013
Posts: 4
Default

Thanks for the quick response!
I think the difficulty is that I have no idea what the non target organisms are so I don't think I could easily set up a database (it could be anything in rotting cactus). I assume this would be a typical issue with de novo assemblies but I haven't been able to find much information on how people are dealing with it, though I am continuing to look. My main goal is to look at differential expression, but I was hoping to create a transcriptome that is mostly free of contaminants before mapping reads back to it.
jbono is offline   Reply With Quote
Old 10-09-2013, 04:25 PM   #4
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default

How close is your target species to D.mel ? Could you alternatively align your reads to this with relaxed parameters, and use those that aligned to do a de novo assembly?

It also could be the contamination is at a minimal level. You could pick a few possible non-target organisms, and see what % of reads mapped to each, and decide if this is an acceptable level. Plus if your contaminant sequences are quite different to your Drosophila (bacteria vs plant vs fly), the assembler can still do a good job distinguishing and assembling the sequences. I.e. you might not even need to worry about it too much.
Kennels is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO