SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
[NGS - analysis of gene expression data] Machine Learning + RNAseq data Chuckytah Bioinformatics 7 03-05-2012 03:16 AM
How to assemble gene from NGS data ynwh Bioinformatics 13 01-03-2012 07:33 AM
PubMed: Next generation massively parallel sequencing of targeted exomes to identify Newsbot! Literature Watch 0 01-29-2011 10:50 AM
NGS papers:Microindel detection in short-read sequence data KevinLam Bioinformatics 0 03-16-2010 02:01 AM
About identify the diffenential expression gene ruby SOLiD 6 06-19-2009 01:57 PM

Reply
 
Thread Tools
Old 08-10-2011, 09:03 AM   #1
aner
Member
 
Location: italy

Join Date: Mar 2011
Posts: 11
Default How to identify a gene targeted by external sequence into NGS data?

Dear all
I am working on a Trascriptome project (Illumina GAIIx, paired ends).
I have 2 samples of Arabidopsis Thaliana. Within their sequences, an external sequence of a different genome has been inserted (transfection should be the word): the reason behind this choice is that we would able to identify the gene (prior unknown) that this sequence links, so that we can easily look at this gene expression.
My problem is that I need to identify which gene the sequence links.
Is there any software that achieves my goal?

I did classical RNA-seq steps:
- QC
- Alignment (Tophat - Bowtie)
- DEG (Cufflinks/diff)

I cannot search the external sequence into the aligned samples since the external sequence does not align with the samples' reference genome.

Thank you!!
Best
Andrea
aner is offline   Reply With Quote
Old 08-10-2011, 11:15 AM   #2
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

I'm not entirely clear on what the insertion is. Is it a T-DNA from Agrobacterium? That seems the most likely candidate since you are working with Arabidopsis.

The other problem is, do you know whether or not the insert is actually in a transcribed region?

If the sequence of insert is known, say if it is a T-DNA, then there are methods like TAIL-PCR that can be used to identify the flanking sequences of the insert.

Last edited by chadn737; 08-10-2011 at 11:17 AM.
chadn737 is offline   Reply With Quote
Old 08-10-2011, 02:34 PM   #3
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

I would probably just create a database from your possible insertion material & search it with your favorite short read search tool (e.g. BWA, Bowtie). That will identify any reads carrying the inserted DNA. If you can find a read pair (or do you not have them) with one read in the insertion and one in an Arabidopsis message, then you are home.

There are some tools out there specifically to look for these, but I forget the names.
krobison is offline   Reply With Quote
Old 08-18-2011, 10:59 PM   #4
aner
Member
 
Location: italy

Join Date: Mar 2011
Posts: 11
Default

Thank you all for your messages!

Quote:
Is it a T-DNA from Agrobacterium?
Quote:
The other problem is, do you know whether or not the insert is actually in a transcribed region?
to chadn737: the insertion is GUS and the method is the gene trap. This time we tried without labeling a specific gene but letting GUS attach to all available genes.

Quote:
There are some tools out there specifically to look for these, but I forget the names.
Quote:
(...) just create a database from your possible insertion material & search it (...)
to krobison: your suggestion is really nice; if you figure out those tools, let me know ;-) Do you know how to create a database of reads? How can I call BWA/Bowtie to search for this insertion? Is it a Blast?

Last edited by aner; 08-18-2011 at 11:03 PM.
aner is offline   Reply With Quote
Old 08-28-2011, 05:52 AM   #5
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

By database of possible insertion material I simply mean a FASTA format file with the possible insertions. The documentation for BWA, Bowtie & BLAST will instruct you how to index that file for searching (alas, each requires its own indexing).

BTW, a paper on finding such large insertions is

Identifying insertion mutations by whole-genome sequencing
krobison is offline   Reply With Quote
Old 08-30-2011, 01:58 AM   #6
aner
Member
 
Location: italy

Join Date: Mar 2011
Posts: 11
Default

Quote:
By database of possible insertion material I simply mean a FASTA format file with the possible insertions. The documentation for BWA, Bowtie & BLAST will instruct you how to index that file for searching (alas, each requires its own indexing).
BTW, a paper on finding such large insertions is
Identifying insertion mutations by whole-genome sequencing
Thank you very much. I will test this method and I will let you know results.
aner is offline   Reply With Quote
Old 08-30-2011, 10:58 AM   #7
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

as keith suggested, its straight forward to make a bwa formatted reference sequence (for the external sequence) and see what reads map to it... I hope you did paired reads, and the pair of the mapping read maps to the arabidopsis genome, so you can ascertain where the sequence links on the genome....
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 08-31-2011, 12:17 AM   #8
aner
Member
 
Location: italy

Join Date: Mar 2011
Posts: 11
Default

Quote:
I hope you did paired reads
Thank you bioinfosm. Yes, I have paired reads; I will see soon if the pairs maps to Arabidopsis genome. ...I hope to be lucky...!
aner is offline   Reply With Quote
Old 08-31-2011, 09:24 AM   #9
dcfargo
Member
 
Location: Chapel Hill

Join Date: Aug 2008
Posts: 22
Default

You might try building a Bowtie index that includes the external sequence as an additional 'chromosome' and then running TopHat Fusion.

http://tophat-fusion.sourceforge.net...ion-poster.pdf
dcfargo is offline   Reply With Quote
Old 08-31-2011, 11:20 PM   #10
aner
Member
 
Location: italy

Join Date: Mar 2011
Posts: 11
Default

Quote:
You might try building a Bowtie index that includes the external sequence as an additional 'chromosome' and then running TopHat Fusion.
Thank you dcfargo. I read the poster and I found the relative article (so far draft). I will try also this approach!
aner is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:50 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO