Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unable to find ORF for fasta file

    Hi,

    I have a list of ids for which I want to extract the corresponding nucleotide sequences from transcritptome.fasta file. I loaded the both my ids and transcriptome into R and later extracted the the nucleotide sequences from transcriptome.fasta file. Later I appended the fasta symbol(>) for all ids and exported it to file.

    When I uploaded the file to predict the ORFs, I am unable to predict anything for it. I used the frameDP and getORF from EMBOSS. FrameDP prints a pepdb.fa file without any translated sequences while getORF from EMBOSS throws error saying that all the sequences has zero length. I dont where the problem lies.

    Could anyone help me in figuring this out. Kindly help me.

    Here is the R code which i used to extract the sequences and print it to file.

    Code:
    library(seqinr)
    ids=as.character(read.delim("path/to/ids/file.txt"))
    dd=read.fasta("path/to/transcriptome.fasta",seqtype="DNA",as.string=T)
    fasta_seq=unlist(dd[names(dd) %in% ids])
    names(fasta_seq)=paste(">",names(fasta_seq),sep="")
    write.table(fasta_seq,file=paste(dir,"/",name,sep=""))
    here the output from which has been generated from R. Initially it had "" from the and later I removed by replaced

    "fasta_seq"
    ">dd_smedV4_1188_0_1" "atttgttccattcataaataaaagtagacggctgaaacagtatataaagctataaaaaattcaaacgtatcactgaaataaaatgatatcatgcagattttgttttcaagtaatctttggattccttttagtattgttccactcagatctagtaatctcgagatattttttgcctccagcagactggacaaattccaatgtttttaacaaaagagacaaacctgcaaacggagtgtcgaatgaattgtcgagagaggtgttcaattgtttacagttttgtgccgaatgtagctatgcgtatggtccgtattttaatgtcttcaaatgcggtagagcgtgtagcagcggtgtcatcaacaacaacaaatccaaggagtgtaagtcaaacataatttaagagagctcgtcgttggagcgagatattttgaggtgtccgcttttcgtgaataaattt"
    ">dd_smedV4_120_0_1" "tgattgaatggctgcaattatatttcaagtaatttcaattaatatcctaaatgggaaaattagtcagaaaattcgattacattatgaaattcaattattatgagtcctcagtaaaatcatttttattgcccagttatgctataaatacagtcccgacaatcaatattcagtcaaccatgaaattcttaattttagccagtattgcctgtataattctgatgcttactttcgaagcacgatcagatagtccaactggtagccaatcgacttctaccgcttcatcaggcacctcagctagttcacgcaatactgccggttcacgtaatactgccaatccaagtaatgctgctagttcaaacaatactgctagttcaagcaatgctgctagttcaagcaatggtgccagttcaactgcaagtactgaatcgaataacgctggggaaggtgaagatgataattaagaaaataaagaaacatgacaaagataaaaataaaaataaacgttgaaaaaaaaaaaaaaaaaaaaa"
    ">dd_smedV4_12111_0_1" "aatttatatattaaattgaattaaacgtttaatttttatcaattttattaagttatcaaatataagtattttataaacacgagaaaatatgatttttattttcaaggatattacatttaaatttttgttggttttattgtcatcgctctattgtttttcgtcgacaatttggatcaatgatccgtctgacgaatcagaaatctgtccaaatgggtgtcatgtatgttgtctagttagttcgtttgtactctatcagtcgtacg"
    Last edited by dena.dinesh; 02-18-2015, 04:10 AM.

  • #2
    Could you post a few lines from the FASTA file you produced?

    Comment


    • #3
      Originally posted by sarvidsson View Post
      Could you post a few lines from the FASTA file you produced?
      Hi,
      I have added a few lines of my fasta file. Please take a look

      Best
      dena

      Comment


      • #4
        Posting the final file (after all replacements, cleanup etc.) would be easier to debug (e.g. as an attachment).

        Make sure that the first line ("fasta_seq") is removed and that there is a linebreak between the ID and the sequence (difficult to tell whether this is the case). Additionally, some tools expect fixed-length sequence lines - you can use the "fold" command line utility to fix that.

        Comment


        • #5
          You don't want to use write.table(). Well, you can, but then you'd need sep="\n" and quote=F. A better method would be the write.fasta() command.

          Comment


          • #6
            Hi Ryan,

            Thanks for your comment. I tried the "write.fasta" for the file but it prints out only the first sequence with all character in a single line. Its is not printing out the other sequences. I think the file must be in different format. I have attached the file for your reference. Kindly guide me.
            Attached Files

            Comment


            • #7
              Originally posted by sarvidsson View Post
              Could you post a few lines from the FASTA file you produced?
              I have attached the file which was generated by above R command for your reference.
              Attached Files

              Comment


              • #8
                That file should be OK (it is proper FASTA). As I previously said, some tools like to have folded sequence lines (just run the Unix command fold on it).

                I ran your file on the frameDP web resource from INRA (https://iant.toulouse.inra.fr/FrameDP/), and that worked fine.

                Comment


                • #9
                  The problem is that you mucked up the output of read.fasta. Your code should be something like:
                  Code:
                  library(seqinr)
                  ids=as.character(read.delim("path/to/ids/file.txt"))
                  dd=read.fasta("path/to/transcriptome.fasta",seqtype="DNA",as.string=T)
                  dd = dd[names(dd) %in% ids)
                  write.fasta(dd, names(dd), file=paste(dir,"/",name,sep=""))
                  There's no need to muck around with prepending ">" to the names.

                  Comment


                  • #10
                    Thanks Ryan. It worked but when I gave nbchar=70, it doesnt seems to work. rather it prints the entire sequence in a single line. Thanks once again for your help

                    Comment


                    • #11
                      Originally posted by sarvidsson View Post
                      That file should be OK (it is proper FASTA). As I previously said, some tools like to have folded sequence lines (just run the Unix command fold on it).

                      I ran your file on the frameDP web resource from INRA (https://iant.toulouse.inra.fr/FrameDP/), and that worked fine.
                      Thank you very much. it worked

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      59 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      57 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      56 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X