SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
[NGS - analysis of gene expression data] Machine Learning + RNAseq data Chuckytah Bioinformatics 7 03-05-2012 03:16 AM
How to assemble two different length Solexa data? anyone1985 Bioinformatics 14 10-06-2011 07:50 AM
How to identify a gene targeted by external sequence into NGS data? aner Bioinformatics 9 08-31-2011 11:20 PM
Filter SOLiD reads for gene of interest then assemble john1923 SOLiD 2 04-12-2011 02:47 AM
transcriptome assemble of SOLiD data RObin_Zhou SOLiD 0 06-23-2010 04:06 PM

Reply
 
Thread Tools
Old 11-05-2011, 04:15 PM   #1
ynwh
Member
 
Location: MD

Join Date: Jul 2011
Posts: 16
Default How to assemble gene from NGS data

I have a known gene from Arabidopsis thaliana and illumina reads of some new plant genome. I need to find its orhtologous gene in the new genome. In particular, I want to know if the orthologous gene can be fully assembled from the reads using the Arabidopsis gene as reference and how to do this?

I'm new in NGS analysis, can anyone give me some advices. Many thanks.
ynwh is offline   Reply With Quote
Old 11-08-2011, 05:30 AM   #2
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

If i understand you correct you need to know if there is any orthologue in your new sequence to your known Arabidopsis gene. If so just do a reciprocal blast to get a first impression. That means blast your reads against the sequence of the arabidopsis gene and vice versa. If you get (significant) hits in both directions, people tend to say they are orthologues.

Hope that helps...
sphil is offline   Reply With Quote
Old 11-08-2011, 05:33 AM   #3
ynwh
Member
 
Location: MD

Join Date: Jul 2011
Posts: 16
Default

Many thanks sphil. In fact, I want to get the structure of the orthologous gene. That is, fully reconstruct the gene from reads.
ynwh is offline   Reply With Quote
Old 11-08-2011, 06:48 AM   #4
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

What do you mean by "the structure"? If you really 'only' want to know if your reads can cover the arabidopsis gene you could also only blast 'one way'. The reciprocal is only essential for orthologe detection.
sphil is offline   Reply With Quote
Old 11-08-2011, 07:18 AM   #5
ynwh
Member
 
Location: MD

Join Date: Jul 2011
Posts: 16
Default

"the structure" means that I want to know the sequence of the gene in new genome. From ATG to TGA, including all exon and intron.
ynwh is offline   Reply With Quote
Old 11-08-2011, 07:42 AM   #6
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

Hey,

now i got it.


The problem is that blast will give you similarities in sequences in the new genome to the arabidopsis one BUT blast isn't able to (re)construct intron exon structures. For that you can use BLAT or TopHat. However, these are ngs tools layouted for hundres of thousands of reads. It will work but it looks to me like somekind of overkill. Maybe anyone else here knows a different tool. Nevertheless, Blat or Tophat will do the job!


best

Philip
sphil is offline   Reply With Quote
Old 11-08-2011, 07:51 AM   #7
ynwh
Member
 
Location: MD

Join Date: Jul 2011
Posts: 16
Default

Many thanks Philip.
ynwh is offline   Reply With Quote
Old 11-08-2011, 08:30 AM   #8
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

No problem you are welcome! feel free to ask again, if there is need!
sphil is offline   Reply With Quote
Old 11-08-2011, 08:53 AM   #9
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

I would also suggest tophat or blat using just the sequence for the arabidopsis gene as your reference which should speed it up.
chadn737 is offline   Reply With Quote
Old 11-08-2011, 10:20 AM   #10
ynwh
Member
 
Location: MD

Join Date: Jul 2011
Posts: 16
Default

One more question. I'm interested in TEs in genes. The Arabidopsis gene does not contain TE, but its orthologous gene in the newly sequenced genome may have some lineage specific insertion. In this case, whether using Arabidopsis gene as reference will cause problem of missing the TE? Does anyone have idea how to solve this?
ynwh is offline   Reply With Quote
Old 11-08-2011, 11:54 PM   #11
arvid
Senior Member
 
Location: Berlin

Join Date: Jul 2011
Posts: 156
Default

Did you try to de novo assemble the reads from your new plant? Then you could try to align the Arabidopsis gene to your contigs (with BLAST) and see if you get simple alignment or inserted stretches.
It isn't clear from your posts if you have genomic or cDNA fragment sequenced, what is the case?

Last edited by arvid; 11-09-2011 at 06:13 AM. Reason: typo
arvid is offline   Reply With Quote
Old 11-09-2011, 04:51 AM   #12
ynwh
Member
 
Location: MD

Join Date: Jul 2011
Posts: 16
Default

What I have are genomic DNA reads. And the coverage is only ~10X. Maybe this low coverage data is not enough to do de novo assemble.

I'm trying two strategies:
(1) first align reads to Arabidopsis gene, then assemble them (this may cause missing TE)
(2) first assemble reads, then compare contigs to the Arabidopsis gene (this requires reads with high coverage)
ynwh is offline   Reply With Quote
Old 11-09-2011, 06:19 AM   #13
arvid
Senior Member
 
Location: Berlin

Join Date: Jul 2011
Posts: 156
Default

Yes, with 10X coverage it can be difficult to get big enough contigs for strategy 2.

If you use an aligner which allows indels you should be able to tell whether the structure changed slightly, otherwise the shape (sharp drop-offs) of the coverage plot might help you to determine the positions where you have bigger differences.

Anyway, if you're mainly interested in this one gene, just by the alignment strategy you should be able to get enough data for primer design for cloning the genomic sequence of that gene and sequencing it traditionally...
arvid is offline   Reply With Quote
Old 01-03-2012, 07:33 AM   #14
bbsinfo
Member
 
Location: Mars

Join Date: Apr 2011
Posts: 19
Default

Hello, you guys, I think this post answered some of my questions. I am also relatively new to NGS. I have used this technique to sequence organelle genomes of some algae. I got it from reading the post that one can use BLAT or TopHat to find introns on a gene.
So for example, if I have a choroplast genome assembled, I can load all the genome sequence into BLAT or Tophat and asked the program to find introns?

I also is very confused about describing the 2nd structure of tRNAs on the genomes. Do you use MFOLD or there are better softwares?

I am not sure whether any of you guys here have ever annotate rRNAs on orangelle genome, I am not sure how to exactly find the stard and end points of rRNAs? I know from protein coding genes they normally start with a ATG codon, and ends with certain codon, but I am not sure whether there are any codons to look for rRNAs?


Thanks. Please help.
bbsinfo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:37 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO