Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • retrieving <Hit_def> information from XML output

    Greetings, I am trying to retrieve information regarding the putative taxonomic identifications of 16S/18S rRNA genes retrieved from a HiSeq Illumina run using BLASTN (from the blast+ package). Thus far i have been relying on biopython to parse the data. I'm able to retrieve information regarding the e-values to each query, alignment lengths and such for all of the hits using commands like the ones below.

    ###############################################
    >>>from Bio.Blast import NCBIXML
    >>>blast = NCBIXML.parse(open('16SxmlResults', 'rU'))
    >>>for record in blast:
    >>> print record.alignments[0].hsps[0].score
    ###############################################

    The above prints all the high-scoring pair bit scores to standard output.
    However, the piece of information i can't seem to access is located in the <Hit_def>. Looks like this;

    <Hit_def>JR951091.270.2233 Bacteria;Proteobacteria;Alphaproteobacteria;Rickettsiales;mitochondria;Pisum sativum (pea)

    I have looked into the biopython Bio.Blast.Record documention, as well as the tutorial, and can't seem to find any mixes/matches of how to retrieve this information. As well, I have also tried using elementtree to parse the data. This works, but i'm having a hard time "looping" through the whole file (there are ~4000 entries.

    If anyone has any suggestions, or can provide some guidance i would sincerely appreciate it. Thanks,

    -Tony

  • #2
    You want the alignment's hit_def attribute, e.g.

    Code:
    from Bio.Blast import NCBIXML
    blast = NCBIXML.parse(open('16SxmlResults', 'rU'))
    for record in blast:
        for align in record.alignments:
            for hsp in align.hsps:
                print hsp.score, align.hit_def
    Tip: Explore dir(x) and help(x) at the Python prompt where x is an unfamiliar class.

    Comment


    • #3
      I am very new to python, as the codes above are just printing, could you please tell me how to save this in a file(.csv or .txt).

      Thanks

      Comment


      • #4
        Easy way: When you run BLAST+ rather than asking for XML output with
        Code:
        -outfmt 5
        ask for tabular output with
        Code:
        -outfmt 6
        (or ask for CSV if you prefer).

        Hard way: Convert the BLAST XML into tabular format using a script like https://github.com/peterjc/galaxy_bl..._to_tabular.py
        Last edited by maubp; 12-09-2014, 08:41 AM. Reason: formatting

        Comment


        • #5
          Originally posted by maubp View Post
          You want the alignment's hit_def attribute, e.g.

          Code:
          from Bio.Blast import NCBIXML
          blast = NCBIXML.parse(open('16SxmlResults', 'rU'))
          for record in blast:
              for align in record.alignments:
                  for hsp in align.hsps:
                      print hsp.score, align.hit_def
          Tip: Explore dir(x) and help(x) at the Python prompt where x is an unfamiliar class.
          What does 'rU' refers to? a second input file?

          Comment


          • #6
            Originally posted by bernardo_bello View Post
            What does 'rU' refers to? a second input file?
            r is open for reading. As for "U"

            Python is usually built with universal newlines support; supplying 'U' opens the file as a text file, but lines may be terminated by any of the following: the Unix end-of-line convention '\n', the Macintosh convention '\r', or the Windows convention '\r\n'. All of these external representations are seen as '\n' by the Python program.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              Yesterday, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 06:57 AM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 07:17 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-02-2024, 08:06 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-30-2024, 12:17 PM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Working...
            X