Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to assemble gene from NGS data

    I have a known gene from Arabidopsis thaliana and illumina reads of some new plant genome. I need to find its orhtologous gene in the new genome. In particular, I want to know if the orthologous gene can be fully assembled from the reads using the Arabidopsis gene as reference and how to do this?

    I'm new in NGS analysis, can anyone give me some advices. Many thanks.

  • #2
    If i understand you correct you need to know if there is any orthologue in your new sequence to your known Arabidopsis gene. If so just do a reciprocal blast to get a first impression. That means blast your reads against the sequence of the arabidopsis gene and vice versa. If you get (significant) hits in both directions, people tend to say they are orthologues.

    Hope that helps...

    Comment


    • #3
      Many thanks sphil. In fact, I want to get the structure of the orthologous gene. That is, fully reconstruct the gene from reads.

      Comment


      • #4
        What do you mean by "the structure"? If you really 'only' want to know if your reads can cover the arabidopsis gene you could also only blast 'one way'. The reciprocal is only essential for orthologe detection.

        Comment


        • #5
          "the structure" means that I want to know the sequence of the gene in new genome. From ATG to TGA, including all exon and intron.

          Comment


          • #6
            Hey,

            now i got it.


            The problem is that blast will give you similarities in sequences in the new genome to the arabidopsis one BUT blast isn't able to (re)construct intron exon structures. For that you can use BLAT or TopHat. However, these are ngs tools layouted for hundres of thousands of reads. It will work but it looks to me like somekind of overkill. Maybe anyone else here knows a different tool. Nevertheless, Blat or Tophat will do the job!


            best

            Philip

            Comment


            • #7
              Many thanks Philip.

              Comment


              • #8
                No problem you are welcome! feel free to ask again, if there is need!

                Comment


                • #9
                  I would also suggest tophat or blat using just the sequence for the arabidopsis gene as your reference which should speed it up.

                  Comment


                  • #10
                    One more question. I'm interested in TEs in genes. The Arabidopsis gene does not contain TE, but its orthologous gene in the newly sequenced genome may have some lineage specific insertion. In this case, whether using Arabidopsis gene as reference will cause problem of missing the TE? Does anyone have idea how to solve this?

                    Comment


                    • #11
                      Did you try to de novo assemble the reads from your new plant? Then you could try to align the Arabidopsis gene to your contigs (with BLAST) and see if you get simple alignment or inserted stretches.
                      It isn't clear from your posts if you have genomic or cDNA fragment sequenced, what is the case?
                      Last edited by arvid; 11-09-2011, 07:13 AM. Reason: typo

                      Comment


                      • #12
                        What I have are genomic DNA reads. And the coverage is only ~10X. Maybe this low coverage data is not enough to do de novo assemble.

                        I'm trying two strategies:
                        (1) first align reads to Arabidopsis gene, then assemble them (this may cause missing TE)
                        (2) first assemble reads, then compare contigs to the Arabidopsis gene (this requires reads with high coverage)

                        Comment


                        • #13
                          Yes, with 10X coverage it can be difficult to get big enough contigs for strategy 2.

                          If you use an aligner which allows indels you should be able to tell whether the structure changed slightly, otherwise the shape (sharp drop-offs) of the coverage plot might help you to determine the positions where you have bigger differences.

                          Anyway, if you're mainly interested in this one gene, just by the alignment strategy you should be able to get enough data for primer design for cloning the genomic sequence of that gene and sequencing it traditionally...

                          Comment


                          • #14
                            Hello, you guys, I think this post answered some of my questions. I am also relatively new to NGS. I have used this technique to sequence organelle genomes of some algae. I got it from reading the post that one can use BLAT or TopHat to find introns on a gene.
                            So for example, if I have a choroplast genome assembled, I can load all the genome sequence into BLAT or Tophat and asked the program to find introns?

                            I also is very confused about describing the 2nd structure of tRNAs on the genomes. Do you use MFOLD or there are better softwares?

                            I am not sure whether any of you guys here have ever annotate rRNAs on orangelle genome, I am not sure how to exactly find the stard and end points of rRNAs? I know from protein coding genes they normally start with a ATG codon, and ends with certain codon, but I am not sure whether there are any codons to look for rRNAs?


                            Thanks. Please help.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Today, 06:37 PM
                            0 responses
                            7 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Today, 06:07 PM
                            0 responses
                            7 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            66 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X