Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • oxydeepu
    Member
    • Jul 2011
    • 41

    De-Novo Transcript assembly for RNA-Seq

    Hi all,

    I have RNA-Seq data with me. I have run tophat and cufflinks on it. My ultimate goal is to make transcript fasta file using the cufflinks assembly. I have the gtf output now and am stuck at that point. Please help how to continue from here.

    Thank you in advance.
    Deepak
  • pbluescript
    Senior Member
    • Nov 2009
    • 224

    #2
    You should check out bedtools to go from a set of coordinates like in a gtf file to a set of fasta sequences.

    Comment

    • oxydeepu
      Member
      • Jul 2011
      • 41

      #3
      hi..
      i did not understand what does bedtools meant by and can you please name some of them..??
      Thank you..
      Deepak

      Comment

      • pbluescript
        Senior Member
        • Nov 2009
        • 224

        #4
        Sure. Bedtools is a program for comparing and manipulating genomic coordinates in various ways.



        It works with gtf files as well and the command fastaFromBed will take a set of coordinates and extract the specific sequences from another fasta file. In your case, you'd use the gtf file from cufflinks and the genome fasta file you used for mapping with Tophat.

        Although, if you are really trying to do a de novo assembly, a program like Trinity or Oases might be better depending on your organism, genome size and complexity, read depth, etc.

        Comment

        • oxydeepu
          Member
          • Jul 2011
          • 41

          #5
          Thanks alot..
          i will go through oases and the link you mentioned.
          And repost if i have any further doubt.

          Comment

          • oxydeepu
            Member
            • Jul 2011
            • 41

            #6
            hi i tried using bedtools fastafromBed to make transcripts from gtf file and when i give the fasta file of genome. it gives an error
            index file supercontigs.fa.fai not found, generating...
            ERROR: mismatched line lengths at line 11214 within sequence Contig200
            File not suitable for fasta index generation.
            Please help with this
            thank you..
            Deepak

            Comment

            • pbluescript
              Senior Member
              • Nov 2009
              • 224

              #7
              You should post this on the Bedtools discussion group here:

              Comment

              • swaraj
                Member
                • Feb 2012
                • 50

                #8
                Cufflinks package has a very good binary called "gffread" to extract transcript sequences. Ths most common command would be
                "gffread YOURFILE.gtf -g GENOME.fa -s CHROM.size -w YOURFILE.fa"

                Here the CHROM.size file simply contains information about each chromosome name and its size in bp (tab separated). eg
                chr1 2345671
                chr2 6765516

                YOURFILE.fa is your output file containing the sequences of the transcripts. Give a look to the options of gffread for further help "gffread --help".

                Comment

                • oxydeepu
                  Member
                  • Jul 2011
                  • 41

                  #9
                  Hi swaraj,
                  Thank you for the info.
                  I have done as directed. But now i get an error saying.

                  No fasta index found for smed_contigs.fa. Rebuilding, please wait..
                  Error: sequence lines in a FASTA record must have the same length!
                  can anyone please help me address this..
                  Thank you in advance

                  Comment

                  • swaraj
                    Member
                    • Feb 2012
                    • 50

                    #10
                    It is a problem with the formatting of your fasta file. You should stick to using the genome fasta file downloaded from UCSC for your organism. If the genome is not available try to format your fasta file where each line in each sequence should have the same number of bases. The problem arises when you have a situation like

                    >SeqA
                    ATTTCAGGGG
                    ATTCGGCGGGATT
                    AGGGCTCTCT
                    >SeqB
                    ATTTCGGAATT
                    ATTCCGGATAG
                    ATTGCTCC

                    Try to use Bioperl SeqIO to reformat your file.

                    Comment

                    • Jayu
                      Member
                      • Mar 2011
                      • 14

                      #11
                      I have transcripts from trinity for human data and i also have transcripts from tophat and cufflinks for the same human data.I have to find the novel transcripts from these?how can i proceed can some one help me.

                      Comment

                      • swaraj
                        Member
                        • Feb 2012
                        • 50

                        #12
                        Try to use cuffcompare utility from the cufflinks package to compare transcripts gtf file against a gtf of known proteins. The cuffcompare binary gives a tracking file as result which can be parsed to identify the novel transcripts. Look into the cuffcompare documentation for more details.

                        Comment

                        Latest Articles

                        Collapse

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        24 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        29 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        39 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 12:03 PM
                        0 responses
                        61 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...