![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
generating iit file for GSNAP | efoss | Bioinformatics | 1 | 07-03-2012 11:08 AM |
generating a .ma file | cjose | SOLiD | 0 | 08-08-2011 07:47 AM |
Looking process to convert gff3 format into ace format or sam format | andylai | Bioinformatics | 1 | 05-17-2011 03:09 AM |
SSAHA2 / SMALT and Fastq quality format | Kerensa | Bioinformatics | 0 | 05-12-2011 05:42 AM |
Fastq to sam issues | Wallysb01 | Bioinformatics | 1 | 02-02-2011 11:55 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Montevideo, Urguay Join Date: Dec 2009
Posts: 7
|
![]()
Hi
I'm trying to map transcriptome data on a refference genome and generate the output in SAM format. The problem: when I run it, says: Code:
SSAHA2: Reading hashtable reference into memory. SSAHA2: Hashtable size : 0.001007 GB SSAHA2: Number of sequences : 1 SSAHA2: K-mer length : 13 SSAHA2: Skip step : 3 SSAHA2: Max. sequence length: 755119 position_table - Query probably contains unusual characters! : Success ![]() What does this mean? and How can I fix this? Thanks P.D.: I'm running the 2.5 version. |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: Montevideo, Urguay Join Date: Dec 2009
Posts: 7
|
![]()
More data:
When I run using normal output, it stops while analizing reads like this: Code:
@F35ERS101ARI6V GTAATAATGATGATATTAGTAGTAGTAGTAGTAGTAGTAGTGCTAGTAGTAGTAGTAGTAGTAGTAGTAGTAGTGCGAGTAGTAGTAGCACTAATAGTAGAAGTAAAGGAAAAAGTAAAA + HIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCB;9---,,63,,,, And the line corresponding to that read in the .qual file that I used to build the .fastq file is: Code:
>F35ERS101ARI6V 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 34 34 33 26 24 12 12 12 11 11 21 18 11 11 11 11 The script I used to generate the fastq file is this: Code:
#!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::Seq::Quality; use Getopt::Long; die "pass a fasta and a fasta-quality file\n" unless @ARGV; my ($seq_infile,$qual_infile) = (scalar @ARGV == 1) ?($ARGV[0], "$ARGV[0].qual") : @ARGV; ## Create input objects for both a seq (fasta) and qual file my $in_seq_obj = Bio::SeqIO->new( -file => $seq_infile, -format => 'fasta', ); my $in_qual_obj = Bio::SeqIO->new( -file => $qual_infile, -format => 'qual', ); my $out_fastq_obj = Bio::SeqIO->new( -format => 'fastq' ); while (1){ ## create objects for both a seq and its associated qual my $seq_obj = $in_seq_obj->next_seq || last; my $qual_obj = $in_qual_obj->next_seq; die "foo!\n" unless $seq_obj->id eq $qual_obj->id; ## Here we use seq and qual object methods feed info for new BSQ ## object. my $bsq_obj = Bio::Seq::Quality-> new( -id => $seq_obj->id, -seq => $seq_obj->seq, -qual => $qual_obj->qual, ); ## and print it out. $out_fastq_obj->write_fastq($bsq_obj); } If it's the bioperl script, is there another format converter arround? because I don't want to discard reads if there is another way to fix this...HELP!!! |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Cambridge, UK Join Date: Jul 2008
Posts: 146
|
![]()
The quality line in fastq looks fine - with maximum character of "I" indicating Q40, as the .qual file states.
I can't see anything obviously wrong with it. Maybe Ssaha2 thinks it doesn't look like a standard DNA sequence due to the almost complete lack of C, but that would be harsh. |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: Montevideo, Urguay Join Date: Dec 2009
Posts: 7
|
![]()
Thanks for the answer.
After I saw your post, I used the reads in fasta format and the program ran smoothly so it seems the sequences themselves are not the problem... Another question: Does SSAHA2 consider quality data while calculating the alignments scores or anything that could alter the results? I ask this because if not, I colud use the results obtained with the reads in fasta format an put the quality data in the .sam file... |
![]() |
![]() |
![]() |
Thread Tools | |
|
|