Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fasta formatting issues

    I am trying to align a multifasta containing various regions of interest from a reference genome and align them to a contig assembled from pacbio reads generated from a related strain, just to get a picture of what regions are present and the degree of mismatch/rearrangement.

    Ive been trying to use nucmer, but am getting a strange error which I've put into the mummer-help mailing list and am still awaiting a response:
    "postnuc: tigrinc.cc:337: int Read_String(FILE*, char*&, long int&, char*, int): Assertion `Len > 0 && Line [Len - 1] == '\n'' failed."

    The syntax of the error message made me suspect there was a formatting issue with my sequences, but I could find no empty sequences, and no missing newline characters.

    I switched to just trying to run the alignment via blastn with the align 2 sequences option, but received the following error:
    "Message: Message: NCBI C++ Exception:# "local_db_adapter.cpp", line 123: Error: ncbi::blast::s_CheckForBlastSeqSrcErrors() - NCBI C++ Exception:# "blast_setup.hpp", line 190: Error: Sequence contains no data##"

    Once again it appears to be a formatting error in the fasta files, but I cannot find any empty sequences.

    Looking into it, I thought it might be an issue with the line length in the sequences. I've run nucmer before on sequences of 85 kbp all in one line in the fasta file and had no issues before, but I figured I would try it.

    So I transformed the fasta files to only have 80 characters per line in the sequences (max limit I read somewhere per line, but this does not seem be a standard rule).

    These transformed files gave the exact same error messages in both nucmer and nblast.

    I must be missing something, and if these error messages are anything to go by it should be something obvious but I just can't seem to find it.

    Couldn't attach the files due to size, so I've uploaded them here:
    Features, 80 char per line: http://pastebin.com/E1MyBstr
    Features: http://pastebin.com/rfsiiyKJ
    Contig (view as Raw):http://pastebin.com/tyVsEP8g
    Contig, 80 char per line: http://pastebin.com/DmZBCtiv


    Any help would be vastly appreciated!

    P.S. I thought I could use bwa to do this but apparently that is just for aligning fastq reads to a reference?

  • #2
    Not answering your question directly ...

    I remember from a past thread that someone had suggested using "mauve" (http://gel.ahabs.wisc.edu/mauve/) for this type of analysis.

    Comment


    • #3
      After looking into this, Mauve really might be just the right tool for the job since it can apparently export sets of positionally orthologous features (genes, CDS, tRNA, and so on), thus getting me all the features in which I was interested.

      Thanks for the advice!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 11:49 AM
      0 responses
      15 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-24-2024, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X