Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues generating sam file format in ssaha2?

    Hi
    I'm trying to map transcriptome data on a refference genome and generate the output in SAM format. The problem: when I run it, says:
    Code:
    SSAHA2: Reading hashtable reference into memory.
    SSAHA2: Hashtable size      : 0.001007 GB
    SSAHA2: Number of sequences : 1
    SSAHA2: K-mer length        : 13
    SSAHA2: Skip step           : 3
    SSAHA2: Max. sequence length: 755119
    position_table - Query probably contains unusual characters!
    : Success
    and the sam file is empty.
    What does this mean? and How can I fix this?

    Thanks

    P.D.: I'm running the 2.5 version.

  • #2
    More data:
    When I run using normal output, it stops while analizing reads like this:
    Code:
    @F35ERS101ARI6V
    GTAATAATGATGATATTAGTAGTAGTAGTAGTAGTAGTAGTGCTAGTAGTAGTAGTAGTAGTAGTAGTAGTAGTGCGAGTAGTAGTAGCACTAATAGTAGAAGTAAAGGAAAAAGTAAAA
    +
    HIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCB;9---,,63,,,,
    I don't know to much about fastq format, but comparing with reads that don't cause problems, the last line looks quite suspicious (related to de quality data, rigth).

    And the line corresponding to that read in the .qual file that I used to build the .fastq file is:
    Code:
    >F35ERS101ARI6V
    39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 34 34 33 26 24 12 12 12 11 11 21 18 11 11 11 11

    The script I used to generate the fastq file is this:
    Code:
    #!/usr/bin/perl -w
     
    use strict;
     
    use Bio::SeqIO;
    use Bio::Seq::Quality;
     
    use Getopt::Long;
     
    die "pass a fasta and a fasta-quality file\n"
      unless @ARGV;
     
     
    my ($seq_infile,$qual_infile)
      = (scalar @ARGV == 1) ?($ARGV[0], "$ARGV[0].qual") : @ARGV;
     
    ## Create input objects for both a seq (fasta) and qual file
     
    my $in_seq_obj =
      Bio::SeqIO->new( -file   => $seq_infile,
    		   -format => 'fasta',
    		 );
     
    my $in_qual_obj =
      Bio::SeqIO->new( -file   => $qual_infile,
    		   -format => 'qual',
    		 );
     
    my $out_fastq_obj =
      Bio::SeqIO->new( -format => 'fastq'
    		 );
     
    while (1){
      ## create objects for both a seq and its associated qual
      my $seq_obj  = $in_seq_obj->next_seq || last;
      my $qual_obj = $in_qual_obj->next_seq;
     
      die "foo!\n"
        unless
          $seq_obj->id eq
    	$qual_obj->id;
     
      ## Here we use seq and qual object methods feed info for new BSQ
      ## object.
      my $bsq_obj =
        Bio::Seq::Quality->
    	new( -id   => $seq_obj->id,
    	     -seq  => $seq_obj->seq,
    	     -qual => $qual_obj->qual,
    	   );
     
      ## and print it out.
      $out_fastq_obj->write_fastq($bsq_obj);
    }
    So, what's the problem: my reads? the bioperl script? both?
    If it's the bioperl script, is there another format converter arround? because I don't want to discard reads if there is another way to fix this...HELP!!!

    Comment


    • #3
      The quality line in fastq looks fine - with maximum character of "I" indicating Q40, as the .qual file states.

      I can't see anything obviously wrong with it. Maybe Ssaha2 thinks it doesn't look like a standard DNA sequence due to the almost complete lack of C, but that would be harsh.

      Comment


      • #4
        Thanks for the answer.
        After I saw your post, I used the reads in fasta format and the program ran smoothly so it seems the sequences themselves are not the problem...

        Another question: Does SSAHA2 consider quality data while calculating the alignments scores or anything that could alter the results? I ask this because if not, I colud use the results obtained with the reads in fasta format an put the quality data in the .sam file...

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X