Seqanswers Leaderboard Ad

**Bruins** · 12-17-2010, 03:08 AM

Hi,

I noticed this post because someone in my office had a similar problem (miRNAkey on biolinux, basically Ubuntu, refused to open fasta files created by CLC on Windows).

So the problem might be with the end-of-line character used, with the text encoding or with the way CLC writes a fasta file and the way formatdb reads it. In theory, dos2unix should take care of the first point. You could also try to open it in gedit (or another Linux text editor). I know that TextWrangler on my osx has an option to show 'hidden characters', I don't know about gedit? You could try opening the file in gedit and then saving it explicitly in the correct encoding (UTF-8 I think).

In the end, the problem with miRNAkey had nothing to do with newlines or encoding. It expected the sequences to be on one line alone (which makes sense, they're miRNAs). CLC on the other hand used a more 'standard' way of writing fasta files: add a newline after so many bases (75? 60? dunno). However I doubt that formatdb has a problem with this. Check the docs to be save.

Another problem this person faced was the way the files were copied. He used VMware instead of a dual boot, basically running linux in windows. When he copied the files from Win to Linux, he couldn't open them. When he copied them to USB, remounted the USB to the linux and the copied them, he was fine.

Hope that helps,
cheers

**mfursov** · 12-17-2010, 04:15 AM

Originally posted by Wiseone View Post

Does anyone have a solution whereby I can make my fasta file from windows open usable for a blast in linux?
Thanks!

Could you share the your FASTA file with us? The problem looks interesting to me (I'm one of UGENE developers) and I think that as the result of the investigation I will be able both to test the tool and help you solving your problem.

**RDW** · 12-20-2010, 06:27 AM

Just to rule out any issues that aren't OS-specific, is the file compatible with the Windows version of formatdb?

ftp://ftp.ncbi.nlm.nih.gov/blast/exe...elease/LATEST/

**Wiseone** · 12-23-2010, 01:47 PM

So, everything is now working. Bruins was correct. By opening the fatsa file in G Edit and saving with Linux line endings I was able to use the file.

**boetsie** · 12-23-2010, 02:24 PM

I see you fixed your problem, however, I don't think you want to open a huge file if you have a lot of contigs. I also had this problem befor and used a perl script to solve it. I converted my file using the s/\r\n/\n/ function. Here is a perl script to convert your contig file;

Code:

my contigfile = $ARGV[0];

open(IN,contigfile) || die "Can't open contigfile -- fatal\n";
my ($seq, $prevhead) = ('','');
while(<IN>){
  s/\r\n/\n/;
  chomp;
  $seq.= $_ if(eof(IN));
  if (/\>(\S+)/ || eof(IN)){
    my $head=$_;
    if($seq ne ""){
      print "$prevhead\n$seq\n";
    }
    $prevhead = $head;
    $seq = '';
  }else{
    $seq .= $_;
  }
}
close IN;

Boetsie.

**skycreative** · 01-02-2011, 04:23 AM

Originally posted by boetsie View Post

I see you fixed your problem, however, I don't think you want to open a huge file if you have a lot of contigs. I also had this problem befor and used a perl script to solve it. I converted my file using the s/\r\n/\n/ function. Here is a perl script to convert your contig file;

Code:

my contigfile = $ARGV[0];

open(IN,contigfile) || die "Can't open contigfile -- fatal\n";
my ($seq, $prevhead) = ('','');
while(<IN>){
  s/\r\n/\n/;
  chomp;
  $seq.= $_ if(eof(IN));
  if (/\>(\S+)/ || eof(IN)){
    my $head=$_;
    if($seq ne ""){
      print "$prevhead\n$seq\n";
    }
    $prevhead = $head;
    $seq = '';
  }else{
    $seq .= $_;
  }
}
close IN;

Boetsie.

it is simple, and if add output function will be more effective.
my $txt;

for (my $i=0;$i*50<length($seq) ; ){

$txt.=substr($seq,$i*50,50)."\n";

$i++;

}

print $head,"\n";
print $txt;

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 23 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

File Conversion / Usage with Windows and linux

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News