Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with blastx and thousands of equences

    Hi all, I'm running a blastx in my server. Iw had works everytime but now for a large file of sequences is doing something strange.

    The query is the following:

    nohup ../ncbi-blast-2.2.27+/bin/blastx -db ../bin/data/uniprot_kb_2012_06.fasta -query 42000seq.fa -evalue 0.05 -max_target_seqs 5 -outfmt 5 -num_threads 10 -out ouBlastxXML &

    where uniprot_kb_2012... is a dataset containing all the protein taken from ncbi.
    42000seq.fa is a file containing 42thousands sequences in fasta format

    the ouput I want is in XML...

    I runned it one week ago and obtained a completely empty file!

    Now with nohup it's writing this:

    Selenocysteine (U) at position 73 replaced by X
    Selenocysteine (U) at position 40 replaced by X
    Selenocysteine (U) at position 52 replaced by X
    Selenocysteine (U) at position 48 replaced by X
    Selenocysteine (U) at position 37 replaced by X
    Selenocysteine (U) at position 40 replaced by X
    Selenocysteine (U) at position 40 replaced by X

    ...and other similar lines...

    and the xml file is still empty...

    What's happening?

    The same command on a query of ten sequences works well.

    Someone knows where can I been wrong?

    bye and thanks
    Angelo

  • #2
    Don't know what is wrong but you best bet is to run Blast with a small group of sequences and then put all of the XML files together.

    Comment


    • #3
      It seems BLAST+ is a bit silly with the XML output and doesn't write it out incrementally.

      I personally split the input into batches of 1000 queries (works well for spreading the work over a cluster).

      Comment


      • #4
        can the reason be the fact that I use an XML file to extract informations?
        Last edited by angeloulivieri; 09-28-2012, 12:24 AM.

        Comment


        • #5
          Originally posted by angeloulivieri View Post
          can the reason be the fact that I use an XML file to extract informations?
          If you use the text or tabular output, you should see results written to the file while BLAST+ is running.

          Comment


          • #6
            so only with these options? The 6,7 and 8...
            With other types my output will be full only at the end of the computation?

            Comment


            • #7
              I've only noticed a problem of delayed output with the XML output format.

              Comment


              • #8
                Ok. The first time I used blastx with these 40thousands sequences it doesn't give me nothing and for me was very strange. Now I'm trying using a not-XML output file cause I need only to use some bioPerl functions to watch for results.

                Thanks

                Comment


                • #9
                  Originally posted by angeloulivieri View Post

                  Selenocysteine (U) at position 73 replaced by X
                  Selenocysteine (U) at position 40 replaced by X
                  Selenocysteine (U) at position 52 replaced by X
                  Selenocysteine (U) at position 48 replaced by X
                  Selenocysteine (U) at position 37 replaced by X
                  Selenocysteine (U) at position 40 replaced by X
                  Selenocysteine (U) at position 40 replaced by X
                  Hi-
                  Just for information, from http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml this is due to
                  For protein code, U is replaced by X first before the search since it is not specified in any scoring matrices.
                  so this is nothing wrong. I don't know about the rest...

                  Best
                  Dario

                  Comment


                  • #10
                    Thank you Dario

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advancing Precision Medicine for Rare Diseases in Children
                      by seqadmin




                      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                      12-16-2024, 07:57 AM
                    • seqadmin
                      Recent Advances in Sequencing Technologies
                      by seqadmin



                      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                      Long-Read Sequencing
                      Long-read sequencing has seen remarkable advancements,...
                      12-02-2024, 01:49 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 12-17-2024, 10:28 AM
                    0 responses
                    33 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-13-2024, 08:24 AM
                    0 responses
                    48 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-12-2024, 07:41 AM
                    0 responses
                    34 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-11-2024, 07:45 AM
                    0 responses
                    46 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X