Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help getting fastq_filter.py working on command line?

    I want to use galaxy's fastq_filter tool on the command line.

    Basically, I already know what the inputs are required by fastq_filter.py, but not sure how to generate two of them.

    After you read the python and xml file, you learn that it is expecting us to run a line something like this:
    Code:
    fastq_filter.py $input_file $fastq_filter_file $output_file $output_file.files_path '${input_file.extension[len( 'fastq' ):]}'
    • $input_file
    • $fastq_filter_file I don't know how to make this
    • $output_file
    • $output_file.files_path I don't know what this is or how to avoid it
    • ${input_file.extension[len( 'fastq' ):]} Seems to be type check input file type ? Not going to worry about this for now


    The fastq_filter.ply is interesting. In it it has something like
    Code:
    def fastq_read_pass_filter( fastq_read ):
         def mean( score_list ):
             return float( sum( score_list ) ) / float( len( score_list ) )
         if len( fastq_read ) < $min_size:
             return False
         if $max_size > 0 and len( fastq_read ) > $max_size:
             return False
         num_deviates = $max_num_deviants
         qual_scores = fastq_read.get_decimal_quality_scores()
         for qual_score in qual_scores:
             if qual_score < $min_quality or ( $max_quality > 0 and qual_score > $max_quality ):
                 if num_deviates == 0:
                     return False
                 else:
                     num_deviates -= 1
     #if not $paired_end:
         qual_scores_split = [ qual_scores ]
     #else:
         qual_scores_split = [ qual_scores[ 0:int( len( qual_scores ) / 2 ) ], qual_scores[ int( len( qual_scores ) / 2 ): ] ]
     #end if
     #for $fastq_filter in $fastq_filters:
         for split_scores in qual_scores_split:
             left_column_offset = $fastq_filter[ 'offset_type' ][ 'left_column_offset' ]
             right_column_offset = $fastq_filter[ 'offset_type' ][ 'right_column_offset' ]
     #if $fastq_filter[ 'offset_type' ]['base_offset_type'] == 'offsets_percent':
             left_column_offset = int( round( float( left_column_offset ) / 100.0 * float( len( split_scores ) ) ) )
             right_column_offset = int( round( float( right_column_offset ) / 100.0 * float( len( split_scores ) ) ) )
     #end if
             if right_column_offset > 0:
                 split_scores = split_scores[ left_column_offset:-right_column_offset]
             else:
                 split_scores = split_scores[ left_column_offset:]
             if split_scores: ##if a read doesn't have enough columns, it passes by default
                 if not ( ${fastq_filter[ 'score_operation' ]}( split_scores ) $fastq_filter[ 'score_comparison' ] $fastq_filter[ 'score' ]  ):
                     return False
     #end for
         return True
    Is that python? Is this how the xml turns user input into a filter script? I had someone suggest I use the galaxy api for this, but that might be just as much work to get set up as getting this script to run? I'm not opposed to it, but I want to the easy way out because this is the last galaxy tool I have to run in my analysis I think before I move on to other things.

    Any help and assistance would be appreciated.
    Last edited by hlyates; 03-27-2015, 06:12 AM. Reason: Added tags

  • #2
    The development repository is here:
    Contains a set of Galaxy Tools mostly written by the Galaxy Team. - File not found · galaxyproject/tools-devteam


    Correction: The code you quoted is from the <configfile> XML snippet, it is a Python-like templating language called Cheetah.
    Last edited by maubp; 03-29-2015, 09:20 AM. Reason: correction

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    25 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    29 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    25 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    52 views
    0 likes
    Last Post seqadmin  
    Working...
    X