SEQanswers

Go Back   SEQanswers > Introductions



Similar Threads
Thread Thread Starter Forum Replies Last Post
Unable to find flag in SAM with bowtie2 - but can with BWA yekwah Bioinformatics 3 10-18-2013 09:20 AM
ORF from Multifasta File mhadidi2002 Bioinformatics 1 03-07-2012 10:05 AM
Find all occurrences of a sequence in a fasta file dphansti Bioinformatics 3 12-06-2011 07:11 AM
Unable to create index file in soapaligner lxm_85_1023@sina.com Bioinformatics 1 06-16-2011 05:01 AM
Any pipeline to find automatically ORF in consensus sequences? Christopher Sauvage Bioinformatics 6 05-21-2010 06:09 AM

Reply
 
Thread Tools
Old 02-18-2015, 04:00 AM   #1
dena.dinesh
Member
 
Location: Europe

Join Date: Feb 2013
Posts: 58
Default Unable to find ORF for fasta file

Hi,

I have a list of ids for which I want to extract the corresponding nucleotide sequences from transcritptome.fasta file. I loaded the both my ids and transcriptome into R and later extracted the the nucleotide sequences from transcriptome.fasta file. Later I appended the fasta symbol(>) for all ids and exported it to file.

When I uploaded the file to predict the ORFs, I am unable to predict anything for it. I used the frameDP and getORF from EMBOSS. FrameDP prints a pepdb.fa file without any translated sequences while getORF from EMBOSS throws error saying that all the sequences has zero length. I dont where the problem lies.

Could anyone help me in figuring this out. Kindly help me.

Here is the R code which i used to extract the sequences and print it to file.

Code:
library(seqinr)
ids=as.character(read.delim("path/to/ids/file.txt"))
dd=read.fasta("path/to/transcriptome.fasta",seqtype="DNA",as.string=T)
fasta_seq=unlist(dd[names(dd) %in% ids])
names(fasta_seq)=paste(">",names(fasta_seq),sep="")
write.table(fasta_seq,file=paste(dir,"/",name,sep=""))
here the output from which has been generated from R. Initially it had "" from the and later I removed by replaced

"fasta_seq"
">dd_smedV4_1188_0_1" "atttgttccattcataaataaaagtagacggctgaaacagtatataaagctataaaaaattcaaacgtatcactgaaataaaatgatatcatgcagattttgttttcaagtaatctttggattccttttagtattgttccactcagatctagtaatctcgagatattttttgcctccagcagactggacaaattccaatgtttttaacaaaagagacaaacctgcaaacggagtgtcgaatgaattgtcgagagaggtgttcaattgtttacagttttgtgccgaatgtagctatgcgtatggtccgtattttaatgtcttcaaatgcggtagagcgtgtagcagcggtgtcatcaacaacaacaaatccaaggagtgtaagtcaaacataatttaagagagctcgtcgttggagcgagatattttgaggtgtccgcttttcgtgaataaattt"
">dd_smedV4_120_0_1" "tgattgaatggctgcaattatatttcaagtaatttcaattaatatcctaaatgggaaaattagtcagaaaattcgattacattatgaaattcaattattatgagtcctcagtaaaatcatttttattgcccagttatgctataaatacagtcccgacaatcaatattcagtcaaccatgaaattcttaattttagccagtattgcctgtataattctgatgcttactttcgaagcacgatcagatagtccaactggtagccaatcgacttctaccgcttcatcaggcacctcagctagttcacgcaatactgccggttcacgtaatactgccaatccaagtaatgctgctagttcaaacaatactgctagttcaagcaatgctgctagttcaagcaatggtgccagttcaactgcaagtactgaatcgaataacgctggggaaggtgaagatgataattaagaaaataaagaaacatgacaaagataaaaataaaaataaacgttgaaaaaaaaaaaaaaaaaaaaa"
">dd_smedV4_12111_0_1" "aatttatatattaaattgaattaaacgtttaatttttatcaattttattaagttatcaaatataagtattttataaacacgagaaaatatgatttttattttcaaggatattacatttaaatttttgttggttttattgtcatcgctctattgtttttcgtcgacaatttggatcaatgatccgtctgacgaatcagaaatctgtccaaatgggtgtcatgtatgttgtctagttagttcgtttgtactctatcagtcgtacg"

Last edited by dena.dinesh; 02-18-2015 at 04:10 AM.
dena.dinesh is offline   Reply With Quote
Old 02-18-2015, 04:06 AM   #2
sarvidsson
Senior Member
 
Location: Berlin, Germany

Join Date: Jan 2015
Posts: 137
Default

Could you post a few lines from the FASTA file you produced?
sarvidsson is offline   Reply With Quote
Old 02-18-2015, 04:11 AM   #3
dena.dinesh
Member
 
Location: Europe

Join Date: Feb 2013
Posts: 58
Default

Quote:
Originally Posted by sarvidsson View Post
Could you post a few lines from the FASTA file you produced?
Hi,
I have added a few lines of my fasta file. Please take a look

Best
dena
dena.dinesh is offline   Reply With Quote
Old 02-18-2015, 04:19 AM   #4
sarvidsson
Senior Member
 
Location: Berlin, Germany

Join Date: Jan 2015
Posts: 137
Default

Posting the final file (after all replacements, cleanup etc.) would be easier to debug (e.g. as an attachment).

Make sure that the first line ("fasta_seq") is removed and that there is a linebreak between the ID and the sequence (difficult to tell whether this is the case). Additionally, some tools expect fixed-length sequence lines - you can use the "fold" command line utility to fix that.
sarvidsson is offline   Reply With Quote
Old 02-18-2015, 04:36 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You don't want to use write.table(). Well, you can, but then you'd need sep="\n" and quote=F. A better method would be the write.fasta() command.
dpryan is offline   Reply With Quote
Old 02-18-2015, 06:43 AM   #6
dena.dinesh
Member
 
Location: Europe

Join Date: Feb 2013
Posts: 58
Default

Hi Ryan,

Thanks for your comment. I tried the "write.fasta" for the file but it prints out only the first sequence with all character in a single line. Its is not printing out the other sequences. I think the file must be in different format. I have attached the file for your reference. Kindly guide me.
Attached Files
File Type: txt fasta_sequences.txt (13.2 KB, 3 views)
dena.dinesh is offline   Reply With Quote
Old 02-18-2015, 06:44 AM   #7
dena.dinesh
Member
 
Location: Europe

Join Date: Feb 2013
Posts: 58
Default

Quote:
Originally Posted by sarvidsson View Post
Could you post a few lines from the FASTA file you produced?
I have attached the file which was generated by above R command for your reference.
Attached Files
File Type: txt fasta_sequences.txt (13.2 KB, 1 views)
dena.dinesh is offline   Reply With Quote
Old 02-18-2015, 06:51 AM   #8
sarvidsson
Senior Member
 
Location: Berlin, Germany

Join Date: Jan 2015
Posts: 137
Default

That file should be OK (it is proper FASTA). As I previously said, some tools like to have folded sequence lines (just run the Unix command fold on it).

I ran your file on the frameDP web resource from INRA (https://iant.toulouse.inra.fr/FrameDP/), and that worked fine.
sarvidsson is offline   Reply With Quote
Old 02-18-2015, 06:56 AM   #9
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The problem is that you mucked up the output of read.fasta. Your code should be something like:
Code:
library(seqinr)
ids=as.character(read.delim("path/to/ids/file.txt"))
dd=read.fasta("path/to/transcriptome.fasta",seqtype="DNA",as.string=T)
dd = dd[names(dd) %in% ids)
write.fasta(dd, names(dd), file=paste(dir,"/",name,sep=""))
There's no need to muck around with prepending ">" to the names.
dpryan is offline   Reply With Quote
Old 02-19-2015, 01:40 AM   #10
dena.dinesh
Member
 
Location: Europe

Join Date: Feb 2013
Posts: 58
Default

Thanks Ryan. It worked but when I gave nbchar=70, it doesnt seems to work. rather it prints the entire sequence in a single line. Thanks once again for your help
dena.dinesh is offline   Reply With Quote
Old 02-19-2015, 01:41 AM   #11
dena.dinesh
Member
 
Location: Europe

Join Date: Feb 2013
Posts: 58
Default

Quote:
Originally Posted by sarvidsson View Post
That file should be OK (it is proper FASTA). As I previously said, some tools like to have folded sequence lines (just run the Unix command fold on it).

I ran your file on the frameDP web resource from INRA (https://iant.toulouse.inra.fr/FrameDP/), and that worked fine.
Thank you very much. it worked
dena.dinesh is offline   Reply With Quote
Reply

Tags
dna sequence, emboss; fastq file format, orf, translation

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:38 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO