Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De-Novo Transcript assembly for RNA-Seq

    Hi all,

    I have RNA-Seq data with me. I have run tophat and cufflinks on it. My ultimate goal is to make transcript fasta file using the cufflinks assembly. I have the gtf output now and am stuck at that point. Please help how to continue from here.

    Thank you in advance.
    Deepak

  • #2
    You should check out bedtools to go from a set of coordinates like in a gtf file to a set of fasta sequences.

    Comment


    • #3
      hi..
      i did not understand what does bedtools meant by and can you please name some of them..??
      Thank you..
      Deepak

      Comment


      • #4
        Sure. Bedtools is a program for comparing and manipulating genomic coordinates in various ways.



        It works with gtf files as well and the command fastaFromBed will take a set of coordinates and extract the specific sequences from another fasta file. In your case, you'd use the gtf file from cufflinks and the genome fasta file you used for mapping with Tophat.

        Although, if you are really trying to do a de novo assembly, a program like Trinity or Oases might be better depending on your organism, genome size and complexity, read depth, etc.

        Comment


        • #5
          Thanks alot..
          i will go through oases and the link you mentioned.
          And repost if i have any further doubt.

          Comment


          • #6
            hi i tried using bedtools fastafromBed to make transcripts from gtf file and when i give the fasta file of genome. it gives an error
            index file supercontigs.fa.fai not found, generating...
            ERROR: mismatched line lengths at line 11214 within sequence Contig200
            File not suitable for fasta index generation.
            Please help with this
            thank you..
            Deepak

            Comment


            • #7
              You should post this on the Bedtools discussion group here:

              Comment


              • #8
                Cufflinks package has a very good binary called "gffread" to extract transcript sequences. Ths most common command would be
                "gffread YOURFILE.gtf -g GENOME.fa -s CHROM.size -w YOURFILE.fa"

                Here the CHROM.size file simply contains information about each chromosome name and its size in bp (tab separated). eg
                chr1 2345671
                chr2 6765516

                YOURFILE.fa is your output file containing the sequences of the transcripts. Give a look to the options of gffread for further help "gffread --help".

                Comment


                • #9
                  Hi swaraj,
                  Thank you for the info.
                  I have done as directed. But now i get an error saying.

                  No fasta index found for smed_contigs.fa. Rebuilding, please wait..
                  Error: sequence lines in a FASTA record must have the same length!
                  can anyone please help me address this..
                  Thank you in advance

                  Comment


                  • #10
                    It is a problem with the formatting of your fasta file. You should stick to using the genome fasta file downloaded from UCSC for your organism. If the genome is not available try to format your fasta file where each line in each sequence should have the same number of bases. The problem arises when you have a situation like

                    >SeqA
                    ATTTCAGGGG
                    ATTCGGCGGGATT
                    AGGGCTCTCT
                    >SeqB
                    ATTTCGGAATT
                    ATTCCGGATAG
                    ATTGCTCC

                    Try to use Bioperl SeqIO to reformat your file.

                    Comment


                    • #11
                      I have transcripts from trinity for human data and i also have transcripts from tophat and cufflinks for the same human data.I have to find the novel transcripts from these?how can i proceed can some one help me.

                      Comment


                      • #12
                        Try to use cuffcompare utility from the cufflinks package to compare transcripts gtf file against a gtf of known proteins. The cuffcompare binary gives a tracking file as result which can be parsed to identify the novel transcripts. Look into the cuffcompare documentation for more details.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 11:49 AM
                        0 responses
                        15 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-24-2024, 08:47 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        62 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        60 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X