Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to change Bowtie settings under TopHat

    Dear all

    After setting bowtie and tophat under a path I ran tophat on 20MB fastq reads.
    And I found only few aligned reads and junctions in output. When I checked logs the results made me suspicious.

    After that
    when i ran just default bowtie command, 90% of my reads are failed to align. Does any one know why ?

    If I need to change any parameters in bowtie where and how could I change them so that tophat recognizes them every time when i ran tophat?

    My reads are 35bp single end illumina-solexa fastq reads (hg18). It would be great if you provide any parameters if I need to change in bowtie especially for these kinda reads.

    Thanx in advance
    Last edited by repinementer; 08-09-2010, 01:36 AM.

  • #2
    You may not be able to do it as TopHat needs Bowtie to run in a specific way. The only way to crack is to re-compile TopHat, changing the way it calls Bowtie, and you're at the risk of crashing TopHat.

    Comment


    • #3
      Actually, the 'tophat' command itself is a python script that could potentially be edited if you wanted to hack at that. Save a copy first in case a change makes it stop working.

      Comment


      • #4
        Here is the "call bowtie" part in tophat python script. From what I can see (by a quick search), only -k, -m, -v (or -n) are set (see the red part below), which means only output the reads that have mismatch number <= a specific number (set by -n or -v) and total number of qualified reads <= maxhit (set by -k and -m, in tophat it's -g/--max-multihits). So, by default, tophat call bowtie as "bowtie -v N -k 20 -m 20". You can change it if you want to use parameters like "-a --best --strata" (be aware that -a and -k are mutually exclusive).


        # Call bowtie
        def bowtie(params,
        bwt_idx_prefix,
        reads_list,
        reads_format,
        num_mismatches,
        mapped_reads,
        unmapped_reads,
        extra_output = "",
        t_mapping = False,
        multihits_out = None): #only --prefilter-multihits should activate this parameter for the initial prefilter search
        start_time = datetime.now()
        bwt_idx_name = bwt_idx_prefix.split('/')[-1]
        reads_file=reads_list[0]
        readfile_basename=getFileBaseName(reads_file)
        if t_mapping:
        print >> sys.stderr, "[%s] Mapping %s against transcriptome %s with Bowtie %s" % (start_time.strftime("%c"),
        readfile_basename, bwt_idx_name, extra_output)
        else:
        print >> sys.stderr, "[%s] Mapping %s against %s with Bowtie %s" % (start_time.strftime("%c"),
        readfile_basename, bwt_idx_name, extra_output)
        bwt_log = open(logging_dir + 'bowtie.'+readfile_basename+'.fixmap.log', "w")
        #bwt_mapped=mapped_reads
        unmapped_reads_out=unmapped_reads
        if use_BWT_FIFO:
        global unmapped_reads_fifo
        if os.path.exists(unmapped_reads_fifo):
        os.remove(unmapped_reads_fifo)
        try:
        os.mkfifo(unmapped_reads_fifo)
        except OSError, o:
        die(fail_str+"Error at mkfifo("+unmapped_reads_fifo+'). '+str(o))
        unmapped_reads_out=unmapped_reads_fifo

        # Launch Bowtie
        try:
        bowtie_cmd = [bowtie_path]

        if reads_format == "fastq":
        bowtie_cmd += ["-q"]
        elif reads_format == "fasta":
        bowtie_cmd += ["-f"]

        if params.read_params.color:

        bowtie_cmd += ["-C", "--col-keepends"]
        unzip_cmd=None
        if use_zpacker:
        unzip_cmd=[ params.system_params.zipper ]
        unzip_cmd.extend(params.system_params.zipper_opts)
        zip_cmd=unzip_cmd[:]
        unzip_cmd+=['-cd']
        zip_cmd+=['-c']

        fifo_pid=None
        if unmapped_reads:
        bowtie_cmd += ["--un", unmapped_reads_out]

        if use_BWT_FIFO:
        print >> run_log, ' '.join(zip_cmd)+' < '+ unmapped_reads_fifo + ' > '+ unmapped_reads + ' & '
        fifo_pid=os.fork()
        if fifo_pid==0:
        def on_sig_exit(sig, func=None):
        os._exit(os.EX_OK)
        signal.signal(signal.SIGTERM, on_sig_exit)
        subprocess.call(zip_cmd,
        stdin=open(unmapped_reads_fifo, "r"),
        stdout=open(unmapped_reads, "wb"))
        os._exit(os.EX_OK)

        fix_map_cmd = [prog_path('fix_map_ordering')]

        max_hits = params.max_hits
        if t_mapping:
        max_hits = params.t_max_hits
        if params.bowtie_alignment_option == "-n" and num_mismatches>3:
        num_mismatches=3
        bowtie_cmd += [params.bowtie_alignment_option, str(num_mismatches),
        "-p", str(params.system_params.num_cpus),
        "-k", str(max_hits),
        "-m", str(max_hits)]

        if multihits_out:
        bowtie_cmd += ["--max", multihits_out]
        else:
        bowtie_cmd += ["--max", "/dev/null"]


        bowtie_cmd += [ bwt_idx_prefix ]
        bowtie_proc=None
        shellcmd=""
        unzip_proc=None
        zip_proc=None
        if multihits_out:
        #special prefilter bowtie run: we use prep_reads on the fly
        #in order to get multi-mapped reads to exclude later
        prep_cmd = prep_reads_cmd(params, ",".join(reads_list))
        preplog_fname=logging_dir + "prep_reads_prefilter.log"
        prepfilter_log = open(preplog_fname,"w")
        unzip_proc = subprocess.Popen(prep_cmd,
        stdout=subprocess.PIPE,
        stderr=prepfilter_log)
        shellcmd=' '.join(prep_cmd) + "|"
        else:
        z_input=use_zpacker and reads_file.endswith(".z")
        if z_input:
        unzip_proc = subprocess.Popen(unzip_cmd,
        stdin=open(reads_file, "rb"),
        stdout=subprocess.PIPE)
        shellcmd=' '.join(unzip_cmd) + "< " +reads_file +"|"
        else:
        #must be uncompressed fastq input (e.g. unmapped reads from a previous run)
        bowtie_cmd += [reads_file]
        bowtie_proc = subprocess.Popen(bowtie_cmd,
        stdout=subprocess.PIPE,
        stderr=bwt_log)
        if unzip_proc:
        #input is compressed OR prep_reads is used as a filter
        bowtie_cmd += ['-']
        bowtie_proc = subprocess.Popen(bowtie_cmd,
        stdin=unzip_proc.stdout,
        stdout=subprocess.PIPE,
        stderr=bwt_log)
        unzip_proc.stdout.close() # see http://bugs.python.org/issue7678

        fix_map_cmd += ['-']
        shellcmd += ' '.join(bowtie_cmd) + '|' + ' '.join(fix_map_cmd)
        if use_zpacker:
        shellcmd += "|"+ ' '.join(zip_cmd)
        fix_order_proc = subprocess.Popen(fix_map_cmd,
        stdin=bowtie_proc.stdout,

        stdout=subprocess.PIPE)
        zip_proc = subprocess.Popen(zip_cmd,
        preexec_fn=subprocess_setup,
        stdin=fix_order_proc.stdout,
        stdout=open(mapped_reads, "wb"))
        fix_order_proc.stdout.close()
        else:
        fix_order_proc = subprocess.Popen(fix_map_cmd,
        preexec_fn=subprocess_setup,
        stdin=bowtie_proc.stdout,
        stdout=open(mapped_reads, "w"))
        bowtie_proc.stdout.close()
        shellcmd += " > " + mapped_reads
        print >> run_log, shellcmd
        if use_zpacker:
        zip_proc.communicate()
        else:
        fix_order_proc.communicate()
        if use_BWT_FIFO:
        if fifo_pid and not os.path.exists(unmapped_reads):
        try:
        os.kill(fifo_pid, signal.SIGTERM)
        except:
        pass
        except OSError, o:
        die(fail_str+"Error: "+str(o))

        # Success
        #finish_time = datetime.now()
        #duration = finish_time - start_time
        #print >> sys.stderr, "\t\t\t[%s elapsed]" % formatTD(duration)
        if use_BWT_FIFO:
        try:
        os.remove(unmapped_reads_fifo)
        except:
        pass
        if multihits_out and not os.path.exists(multihits_out):
        open(multihits_out, "w").close()
        return (mapped_reads, unmapped_reads)
        Last edited by sterding; 04-04-2012, 07:52 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        50 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X