Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with custom database of a genera and count of hits (NGS, 454)

    Hello biologists and bioinformaticians around the world!

    I'm having serious problems to accomplish a task.

    I need to perform a megablast search from metagenomic reads (454) in order to determine the number of hits and extract the reads that mached the reference sequences of a specific genera.

    Performing this, I need to compare the results with Bowtie2.

    The first goal is choose between megablast and Bowtie, through number of hits.

    The second goal is remove these sequences, which belong to this specific genera, from the original set of reads, in order to analyze the community with and without this genera.

    First of all I tried to construct my custom database for megablast.

    I downloaded from NCBI the list of fasta sequences and the list of GI's from my genera.

    Next step was to try run the formatdb for custom databases. My code was:

    1. formatdb -i nt -o T -p F ---> format the original database with parsing active --> it's OK

    2. formatdb -F microcystis_gis.txt -B mic_gis.gi --> format my GI list to binary format --> it's OK

    3. formatdb -i nt -p F -L mic_gi -F mic_gis.gi -t my_mic_db --> format database generating a alias for my GI list

    Here I have no success, the error was: [formatdb] FATAL ERROR: Unable to find mic_gis.gi

    4. formatdb -i microcystis_fasta.fasta -p F -n mic_db --> no success again

    Then I tried fastacmd:

    5. fastacmd -d nt -p F -i microcystis_gis.txt -D 1 -o microcystis.fasta ---> after a long time of processing, the result was a file with all the sequences, there was'nt the GI's filtering.

    Then I tried makeblastdb:

    6. makeblastdb -in microcystis_fasta.fasta -dbtype 'nucl' –parse_seqids -title microcystis_db

    and the error was: Error: Too many positional arguments (1), the offending value: –parse_seqids

    I think that my problem can be the format of my input files. Maybe the header of my microcystis_fasta.fasta isn't correct, here it is:

    >gi|469475955|gb|KC166868.1| Shewanella putrefaciens strain DCH-5 16S ribosomal RNA gene, partial sequence
    GTTACCTACAGAAGAAGGACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTCCGAGCGTTA
    ATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTGTTAAGCGAGATGTGAAAGCCCTGGGCTC......

    and this file is with a empty line between each new sequence.

    My GI list and my fasta file are the ones downloaded from NCBI, I dont understand how they can be in a wrong format.... Maybe it is necessary to change this files in some way? I can do this by Perl script, but I was wondering if you can tell something more effective before.

    By constructing my database for Bowtie2, it worked, but with this warning:

    Warning: Encountered empty reference sequence

    But, as usually, Bowtie2 performed fine and worked.....

    I think that this warning is a indication that there is something wrong with the format of my reference fasta file.

    Thank you everyone, Brazilian greetings!!

  • #2
    Hi Marlaux,

    I am having a similar issue. I am trying to align synthetic sequences to a library of sequences. I made a fasta file, and just like you I have a space between 2 sequences. When I index, I get the message 'Warning: encountered empty reference sequence'. But Bowtie works in using this index to map to my reference.

    I was wondering if you figured out what this error message means, and what the solution to this is. Any input from you is appreciated.

    Thanks in advance!

    Comment


    • #3
      Hello Anna, look it here:



      Good luck!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      66 views
      0 likes
      Last Post seqadmin  
      Working...
      X