Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fasta formatting issues

    I am trying to align a multifasta containing various regions of interest from a reference genome and align them to a contig assembled from pacbio reads generated from a related strain, just to get a picture of what regions are present and the degree of mismatch/rearrangement.

    Ive been trying to use nucmer, but am getting a strange error which I've put into the mummer-help mailing list and am still awaiting a response:
    "postnuc: tigrinc.cc:337: int Read_String(FILE*, char*&, long int&, char*, int): Assertion `Len > 0 && Line [Len - 1] == '\n'' failed."

    The syntax of the error message made me suspect there was a formatting issue with my sequences, but I could find no empty sequences, and no missing newline characters.

    I switched to just trying to run the alignment via blastn with the align 2 sequences option, but received the following error:
    "Message: Message: NCBI C++ Exception:# "local_db_adapter.cpp", line 123: Error: ncbi::blast::s_CheckForBlastSeqSrcErrors() - NCBI C++ Exception:# "blast_setup.hpp", line 190: Error: Sequence contains no data##"

    Once again it appears to be a formatting error in the fasta files, but I cannot find any empty sequences.

    Looking into it, I thought it might be an issue with the line length in the sequences. I've run nucmer before on sequences of 85 kbp all in one line in the fasta file and had no issues before, but I figured I would try it.

    So I transformed the fasta files to only have 80 characters per line in the sequences (max limit I read somewhere per line, but this does not seem be a standard rule).

    These transformed files gave the exact same error messages in both nucmer and nblast.

    I must be missing something, and if these error messages are anything to go by it should be something obvious but I just can't seem to find it.

    Couldn't attach the files due to size, so I've uploaded them here:
    Features, 80 char per line: http://pastebin.com/E1MyBstr
    Features: http://pastebin.com/rfsiiyKJ
    Contig (view as Raw):http://pastebin.com/tyVsEP8g
    Contig, 80 char per line: http://pastebin.com/DmZBCtiv


    Any help would be vastly appreciated!

    P.S. I thought I could use bwa to do this but apparently that is just for aligning fastq reads to a reference?

  • #2
    Not answering your question directly ...

    I remember from a past thread that someone had suggested using "mauve" (http://gel.ahabs.wisc.edu/mauve/) for this type of analysis.

    Comment


    • #3
      After looking into this, Mauve really might be just the right tool for the job since it can apparently export sets of positionally orthologous features (genes, CDS, tRNA, and so on), thus getting me all the features in which I was interested.

      Thanks for the advice!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X