Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mira3 output

    Hi I am a first year phd student trying my hand at bioinformatcs. I assembled about 190,000 454 reads with mira3 and I need some help going through the output.

    The assembly info file indicated that I have 21,769 contigs. But the result fasta file shows around 24,000 sequences. I am assuming that the fasta sequences with the header name ending in 'c' (for example, >myProject_c1203) indicates a contig that was assembed. What are sequences that have header names ending in 's' or 'lrc' (>myproject_lrc2938).

    Does mira3 discard sequences that it thinks is of low quality or too short? I noticed that the number of reads assembled is 119,794 (2508 singlets), whereas the number of reads I fed into mira3 was around 190,000.

  • #2
    I think "c" means a contig, "s" means singleton, and "lrc" is also a contig but where there were problems with repeats. Try Google:

    Comment


    • #3
      Hi Damiankao! I would stronlgy recommend signing up for the mira_talk mailing list ([email protected]) as well. The community is very friendly and Mira's author, Bastien Chevreux, answers questions everyday.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X