Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing SAM Files in Megan

    Hello there,

    to the most of you my problem is maybe trivial but somehow I'am still struggeling with this one. So here is the problem.

    I' have the data from some Illumina sequencing runs. Somewhat like 7 or 8 samples which contain between 20 - 30 million reads. I want to blastX these raw reads after clip and merge against viral nr becouse Im mainly interested in viruses. In a perfect world I want to work with diamond becouse it is so much faster than blast for aligning short reads against a reference sequence. But Diamond is also my worst nightmare when trying to import the output into MEGAN. When I try to import the output from a normal blastX it works perfectly. I import the blast output, the reads and the gi mapping file and get a nice tree. But with the Diamond output none of the reads could be assigned to at least one taxa.

    Here is my Diamond comand:

    ./diamond blastx -d viralprot -q ~/D4439_megan_analyse/D4439merged.fa --sam ~/diamond_output/diamondout.sam

    As an output I get a SAM file and when I try to import this into MEGAN the terror begins (it's also not working with the blast output from diamond).

    Quote from the manual:

    MEGAN can now parse files in SAM format [10]. Note, however, that SAM files usually do not contain the names of the taxa associated with the reference sequences and so one must supply suitable mapping files that map identifiers used for the reference sequences to NCBI taxa, KEGG, COG and/or SEED identifiers, see below.

    I get that I must provide a mapping file which tells MEGAN which identifiers belong to which NCBI taxa. But I hve no idea what file I hava to load. Am I just not getting the point here? Where can I find a suitable file? It would be very nice if someone clould help me with this. Im struggeling for a long time with this one. And working with Diamond would be such an improvement.

    With kind regards and thanks in advance,
    Julian

  • #2
    MEGAN manual says

    At startup, MEGAN automatically loads a copy of the complete NCBI and then displays the taxonomy as a rooted tree. The taxonomy is stored in an NCBI tree file and an NCBI mapping file, which are supplied with the program.
    Mapping files are available on the download page: http://ab.inf.uni-tuebingen.de/data/...d/welcome.html

    Have you tried the files there?

    Comment


    • #3
      Until now I think i tried all combinations without success.
      When you choose to import something from blast the porgram asks you for the file which in my case ist the SAM output from Diamond. A file containing the reads with which I did the blast. When you click on the taxonomy rider you have to choose as you can see in the picture.

      When I use the Blast output it works very well when I select the GI Map and load the GI mapping file from the website you mentioned. But for the other two options, synonyms and RefSeq mapping file I'am not shure what to use. And if I use the ncbi.map file for either of these it is also not working.

      From the Diamond manual it states:

      Reads imported into MEGAN lack taxo
      nomic or functional assignment. MEGAN requires mapping files which need to be downloaded separately at the MEGAN website and configured to be used.

      But the only mapping file for taxonomy that I see there is the GI file which Im allready using. So there must be something else.
      Attached Files

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X