Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie 80 mer-long mapping

    Hi,

    I try to map relatively long reads (80-bases-long single read, fastq format) to hg19 using bowtie.
    Does anybody tell me which parameters should be modify for this kind of mapping ?
    I used
    bowtie -p 4 --best --strata -m 1 --sam /index_hg19 -q 80_mer_read.fastq output.sam

    I may forget some important parameters to be changed.

    Thanks,

  • #2
    80bp are not especially long, i would say.
    which instrument was used to generate the data?
    did you run a quality check, e.g. FastQC?

    Comment


    • #3
      Thanks volks,
      This is an Illumina GAIIx data.
      I just want to know if i need to add or change some parameters based on the read length.
      Do I use the same command-line for 80 bp reads mapping as I used to use for 35 bp reads? I feel i need to change some parameters based on the read length.
      May be I'm wrong.

      Comment


      • #4
        Hi kenosaki,

        No need to adjust bowtie parameters for read length.

        Douglas

        Comment


        • #5
          I've found a big improvement from trimming the reads, for aligning some of the longer Illumina reads. Here's a function in zsh that I've used for clipping everything from the right end of a read below a certain read quality.


          [ord.awk

          # ord.awk --- do ord and chr
          # taken from the gawk texinfo manual
          # therefore, this may be covered by the GNU Free Documentation License
          # the GFDL still allows commercial redistribution, however

          # Global identifiers:
          # _ord_: numerical values indexed by characters
          # _ord_init: function to initialize _ord_
          BEGIN { _ord_init() }

          function _ord_init( low, high, i, t)
          {
          low = sprintf("%c", 7) # BEL is ascii 7
          if (low == "\a") { # regular ascii
          low = 0
          high = 127
          } else if (sprintf("%c", 128 + 7) == "\a") {
          # ascii, mark parity
          low = 128
          high = 255
          } else { # ebcdic(!)
          low = 0
          high = 255
          }

          for (i = low; i <= high; i++) {
          t = sprintf("%c", i)
          _ord_[t] = i
          }
          }

          function ord(str, c)
          {
          # only first character is of interest
          c = substr(str, 1, 1)
          return _ord_[c]
          }

          function chr(c)
          {
          # force c to be numeric by adding 0
          return sprintf("%c", c + 0)
          }

          #### test code ####
          # BEGIN \
          # {
          # for (; {
          # printf("enter a character: ")
          # if (getline var <= 0)
          # break
          # printf("ord(%s) = %d\n", var, ord(var))
          # }
          # }

          ]//

          [trimReadsRaw
          #! /usr/bin/zsh

          # the 'raw' version of this this doesn't subtract 33 from the (raw)
          # qualities
          # also, it only trims the back end part of the read (not the front)
          # cuts off everything from the end less than $1
          function trimReadsRaw() {
          thresh=$1
          awk -f ord.awk \
          --source '{name=$0; getline; read=$0;
          getline; strand=$0; getline; qual=$0; len=length(qual); start=len;
          start=1; minEnd=start+20; end=0;
          for (i=len; i>=minEnd; i--) {
          if (ord(substr(qual,i,1)) >= '$thresh') { end=i; break; }
          }
          if ( (end-start) < 20 ) { next; }
          print name; print substr(read,start,end-start+1); print strand;
          print substr(qual,start,end-start+1);
          }' --
          }

          ]//trimReadsRaw

          Comment


          • #6
            If you have genomic data I would use another aligner because bowtie can't deal with indels. Bwa is good for example, as it novoalign,

            For transcriptome data you could try adjusting the following settings.

            -n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
            -e/--maqerr <int> max sum of mismatch quals across alignment for -n (def: 70)
            -l/--seedlen <int> seed length for -n (default: 28)

            eg -n 3 -e 100 -l 40

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            59 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            57 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            56 views
            0 likes
            Last Post seqadmin  
            Working...
            X