Hi,
I have a list of ids for which I want to extract the corresponding nucleotide sequences from transcritptome.fasta file. I loaded the both my ids and transcriptome into R and later extracted the the nucleotide sequences from transcriptome.fasta file. Later I appended the fasta symbol(>) for all ids and exported it to file.
When I uploaded the file to predict the ORFs, I am unable to predict anything for it. I used the frameDP and getORF from EMBOSS. FrameDP prints a pepdb.fa file without any translated sequences while getORF from EMBOSS throws error saying that all the sequences has zero length. I dont where the problem lies.
Could anyone help me in figuring this out. Kindly help me.
Here is the R code which i used to extract the sequences and print it to file.
here the output from which has been generated from R. Initially it had "" from the and later I removed by replaced
"fasta_seq"
">dd_smedV4_1188_0_1" "atttgttccattcataaataaaagtagacggctgaaacagtatataaagctataaaaaattcaaacgtatcactgaaataaaatgatatcatgcagattttgttttcaagtaatctttggattccttttagtattgttccactcagatctagtaatctcgagatattttttgcctccagcagactggacaaattccaatgtttttaacaaaagagacaaacctgcaaacggagtgtcgaatgaattgtcgagagaggtgttcaattgtttacagttttgtgccgaatgtagctatgcgtatggtccgtattttaatgtcttcaaatgcggtagagcgtgtagcagcggtgtcatcaacaacaacaaatccaaggagtgtaagtcaaacataatttaagagagctcgtcgttggagcgagatattttgaggtgtccgcttttcgtgaataaattt"
">dd_smedV4_120_0_1" "tgattgaatggctgcaattatatttcaagtaatttcaattaatatcctaaatgggaaaattagtcagaaaattcgattacattatgaaattcaattattatgagtcctcagtaaaatcatttttattgcccagttatgctataaatacagtcccgacaatcaatattcagtcaaccatgaaattcttaattttagccagtattgcctgtataattctgatgcttactttcgaagcacgatcagatagtccaactggtagccaatcgacttctaccgcttcatcaggcacctcagctagttcacgcaatactgccggttcacgtaatactgccaatccaagtaatgctgctagttcaaacaatactgctagttcaagcaatgctgctagttcaagcaatggtgccagttcaactgcaagtactgaatcgaataacgctggggaaggtgaagatgataattaagaaaataaagaaacatgacaaagataaaaataaaaataaacgttgaaaaaaaaaaaaaaaaaaaaa"
">dd_smedV4_12111_0_1" "aatttatatattaaattgaattaaacgtttaatttttatcaattttattaagttatcaaatataagtattttataaacacgagaaaatatgatttttattttcaaggatattacatttaaatttttgttggttttattgtcatcgctctattgtttttcgtcgacaatttggatcaatgatccgtctgacgaatcagaaatctgtccaaatgggtgtcatgtatgttgtctagttagttcgtttgtactctatcagtcgtacg"
I have a list of ids for which I want to extract the corresponding nucleotide sequences from transcritptome.fasta file. I loaded the both my ids and transcriptome into R and later extracted the the nucleotide sequences from transcriptome.fasta file. Later I appended the fasta symbol(>) for all ids and exported it to file.
When I uploaded the file to predict the ORFs, I am unable to predict anything for it. I used the frameDP and getORF from EMBOSS. FrameDP prints a pepdb.fa file without any translated sequences while getORF from EMBOSS throws error saying that all the sequences has zero length. I dont where the problem lies.
Could anyone help me in figuring this out. Kindly help me.
Here is the R code which i used to extract the sequences and print it to file.
Code:
library(seqinr) ids=as.character(read.delim("path/to/ids/file.txt")) dd=read.fasta("path/to/transcriptome.fasta",seqtype="DNA",as.string=T) fasta_seq=unlist(dd[names(dd) %in% ids]) names(fasta_seq)=paste(">",names(fasta_seq),sep="") write.table(fasta_seq,file=paste(dir,"/",name,sep=""))
"fasta_seq"
">dd_smedV4_1188_0_1" "atttgttccattcataaataaaagtagacggctgaaacagtatataaagctataaaaaattcaaacgtatcactgaaataaaatgatatcatgcagattttgttttcaagtaatctttggattccttttagtattgttccactcagatctagtaatctcgagatattttttgcctccagcagactggacaaattccaatgtttttaacaaaagagacaaacctgcaaacggagtgtcgaatgaattgtcgagagaggtgttcaattgtttacagttttgtgccgaatgtagctatgcgtatggtccgtattttaatgtcttcaaatgcggtagagcgtgtagcagcggtgtcatcaacaacaacaaatccaaggagtgtaagtcaaacataatttaagagagctcgtcgttggagcgagatattttgaggtgtccgcttttcgtgaataaattt"
">dd_smedV4_120_0_1" "tgattgaatggctgcaattatatttcaagtaatttcaattaatatcctaaatgggaaaattagtcagaaaattcgattacattatgaaattcaattattatgagtcctcagtaaaatcatttttattgcccagttatgctataaatacagtcccgacaatcaatattcagtcaaccatgaaattcttaattttagccagtattgcctgtataattctgatgcttactttcgaagcacgatcagatagtccaactggtagccaatcgacttctaccgcttcatcaggcacctcagctagttcacgcaatactgccggttcacgtaatactgccaatccaagtaatgctgctagttcaaacaatactgctagttcaagcaatgctgctagttcaagcaatggtgccagttcaactgcaagtactgaatcgaataacgctggggaaggtgaagatgataattaagaaaataaagaaacatgacaaagataaaaataaaaataaacgttgaaaaaaaaaaaaaaaaaaaaa"
">dd_smedV4_12111_0_1" "aatttatatattaaattgaattaaacgtttaatttttatcaattttattaagttatcaaatataagtattttataaacacgagaaaatatgatttttattttcaaggatattacatttaaatttttgttggttttattgtcatcgctctattgtttttcgtcgacaatttggatcaatgatccgtctgacgaatcagaaatctgtccaaatgggtgtcatgtatgttgtctagttagttcgtttgtactctatcagtcgtacg"
Comment