Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Alignment using Bowtie

    Hi everyone,

    I am extremely new to this Bioinformatics; in fact, I'm only going to be a rising sophomore. That being said, I have very little knowledge about how to perform alignments and ultimately, get peak callings of my ChiP sequencing data. I have downloaded an sra file from NCBI and then using sratoolkits, I was able to convert it into fastq. I used FASTQC to check the quality of my file, and it was given that the file was very good quality. Now, my next step is alignment. However, I don't know how to go about that. I messed around with unix on my computer to see how I can open my fastq file with bowtie, but I don't really understand it. I'll copy paste what I have right now.

    sakeths-mbp:desktop sakethjayanthi$ ls -la
    total 8537728
    drwx------+ 8 sakethjayanthi staff 272 Jul 15 09:11 .
    drwxr-xr-x+ 26 sakethjayanthi staff 884 Jul 11 13:52 ..
    -rw-r--r--@ 1 sakethjayanthi staff 6148 Jul 15 09:12 .DS_Store
    -rw-r--r-- 1 sakethjayanthi staff 0 Jul 23 2013 .localized
    -rw-r--r--@ 1 sakethjayanthi staff 3968605302 Jun 19 10:55 SRR1171521.fastq
    -rw-r--r--@ 1 sakethjayanthi staff 402698553 Jun 16 10:37 SRR1171521.sra
    drwxr-xr-x@ 22 sakethjayanthi staff 748 Jun 27 13:44 bowtie-1.0.1
    drwxr-xr-x@ 20 sakethjayanthi staff 680 Jul 15 09:12 bowtie2-2.1.0
    sakeths-mbp:desktop sakethjayanthi$ bowtie2
    No index, query, or output file specified!
    Bowtie 2 version 2.1.0 by Ben Langmead ([email protected], www.cs.jhu.edu/~langmea)
    Usage:
    bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]

    <bt2-idx> Index filename prefix (minus trailing .X.bt2).
    NOTE: Bowtie 1 and Bowtie 2 indexes are not compatible.
    <m1> Files with #1 mates, paired with files in <m2>.
    Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
    <m2> Files with #2 mates, paired with files in <m1>.
    Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
    <r> Files with unpaired reads.
    Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
    <sam> File for SAM output (default: stdout)

    <m1>, <m2>, <r> can be comma-separated lists (no whitespace) and can be
    specified many times. E.g. '-U file1.fq,file2.fq -U file3.fq'.

    Options (defaults in parentheses):

    Input:
    -q query input files are FASTQ .fq/.fastq (default)
    --qseq query input files are in Illumina's qseq format
    -f query input files are (multi-)FASTA .fa/.mfa
    -r query input files are raw one-sequence-per-line
    -c <m1>, <m2>, <r> are sequences themselves, not files
    -s/--skip <int> skip the first <int> reads/pairs in the input (none)
    -u/--upto <int> stop after first <int> reads/pairs (no limit)
    -5/--trim5 <int> trim <int> bases from 5'/left end of reads (0)
    -3/--trim3 <int> trim <int> bases from 3'/right end of reads (0)
    --phred33 qualities are Phred+33 (default)
    --phred64 qualities are Phred+64
    --int-quals qualities encoded as space-delimited integers

    Presets: Same as:
    For --end-to-end:
    --very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50
    --fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50
    --sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default)
    --very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

    For --local:
    --very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00
    --fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75
    --sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default)
    --very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

    Alignment:
    -N <int> max # mismatches in seed alignment; can be 0 or 1 (0)
    -L <int> length of seed substrings; must be >3, <32 (22)
    -i <func> interval between seed substrings w/r/t read len (S,1,1.15)
    --n-ceil <func> func for max # non-A/C/G/Ts permitted in aln (L,0,0.15)
    --dpad <int> include <int> extra ref chars on sides of DP table (15)
    --gbar <int> disallow gaps within <int> nucs of read extremes (4)
    --ignore-quals treat all quality values as 30 on Phred scale (off)
    --nofw do not align forward (original) version of read (off)
    --norc do not align reverse-complement version of read (off)

    --end-to-end entire read must align; no clipping (on)
    OR
    --local local alignment; ends might be soft clipped (off)

    Scoring:
    --ma <int> match bonus (0 for --end-to-end, 2 for --local)
    --mp <int> max penalty for mismatch; lower qual = lower penalty (6)
    --np <int> penalty for non-A/C/G/Ts in read/ref (1)
    --rdg <int>,<int> read gap open, extend penalties (5,3)
    --rfg <int>,<int> reference gap open, extend penalties (5,3)
    --score-min <func> min acceptable alignment score w/r/t read length
    (G,20,8 for local, L,-0.6,-0.6 for end-to-end)

    Reporting:
    (default) look for multiple alignments, report best, with MAPQ
    OR
    -k <int> report up to <int> alns per read; MAPQ not meaningful
    OR
    -a/--all report all alignments; very slow, MAPQ not meaningful

    Effort:
    -D <int> give up extending after <int> failed extends in a row (15)
    -R <int> for reads w/ repetitive seeds, try <int> sets of seeds (2)

    Paired-end:
    -I/--minins <int> minimum fragment length (0)
    -X/--maxins <int> maximum fragment length (500)
    --fr/--rf/--ff -1, -2 mates align fw/rev, rev/fw, fw/fw (--fr)
    --no-mixed suppress unpaired alignments for paired reads
    --no-discordant suppress discordant alignments for paired reads
    --no-dovetail not concordant when mates extend past each other
    --no-contain not concordant when one mate alignment contains other
    --no-overlap not concordant when mates overlap at all

    Output:
    -t/--time print wall-clock time taken by search phases
    --un <path> write unpaired reads that didn't align to <path>
    --al <path> write unpaired reads that aligned at least once to <path>
    --un-conc <path> write pairs that didn't align concordantly to <path>
    --al-conc <path> write pairs that aligned concordantly at least once to <path>
    (Note: for --un, --al, --un-conc, or --al-conc, add '-gz' to the option name, e.g.
    --un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.)
    --quiet print nothing to stderr except serious errors
    --met-file <path> send metrics to file at <path> (off)
    --met-stderr send metrics to stderr (off)
    --met <int> report internal counters & metrics every <int> secs (1)
    --no-head supppress header lines, i.e. lines starting with @
    --no-sq supppress @SQ header lines
    --rg-id <text> set read group id, reflected in @RG line and RG:Z: opt field
    --rg <text> add <text> ("lab:value") to @RG line of SAM header.
    Note: @RG line only printed when --rg-id is set.
    --omit-sec-seq put '*' in SEQ and QUAL fields for secondary alignments.

    Performance:
    -o/--offrate <int> override offrate of index; must be >= index's offrate
    -p/--threads <int> number of alignment threads to launch (1)
    --reorder force SAM output order to match order of input reads
    --mm use memory-mapped I/O for index; many 'bowtie's can share

    Other:
    --qc-filter filter out reads that are bad according to QSEQ filter
    --seed <int> seed for random number generator (0)
    --non-deterministic seed rand. gen. arbitrarily instead of using read attributes
    --version print version information and quit
    -h/--help print this usage message
    bowtie2-align exited with value 1
    sakeths-mbp:desktop sakethjayanthi$ ls -la
    total 8537728
    drwx------+ 8 sakethjayanthi staff 272 Jul 15 09:11 .
    drwxr-xr-x+ 26 sakethjayanthi staff 884 Jul 11 13:52 ..
    -rw-r--r--@ 1 sakethjayanthi staff 6148 Jul 15 09:12 .DS_Store
    -rw-r--r-- 1 sakethjayanthi staff 0 Jul 23 2013 .localized
    -rw-r--r--@ 1 sakethjayanthi staff 3968605302 Jun 19 10:55 SRR1171521.fastq
    -rw-r--r--@ 1 sakethjayanthi staff 402698553 Jun 16 10:37 SRR1171521.sra
    drwxr-xr-x@ 22 sakethjayanthi staff 748 Jun 27 13:44 bowtie-1.0.1
    drwxr-xr-x@ 20 sakethjayanthi staff 680 Jul 15 09:12 bowtie2-2.1.0


    How can I align my file? Thank you so much!

  • #2
    Before starting with your sample file it may be worth while to follow the tutorial located here: http://bowtie-bio.sourceforge.net/bo...-phage-example

    You are going to have to find the relevant genome indexes. Best place to get premade ones is the iGenomes site. The download would be large but you will get the sequence/annotation/index the whole package. http://support.illumina.com/sequenci...e/igenome.ilmn

    Comment


    • #3
      I've already read through the tutorial, but the it was extremely confusing. I wasn't able to thoroughly understand how I can align. The tutorial talked a lot about the different things bowtie is capable of doing, but I just need to get my data aligned.

      Comment


      • #4
        Reading through the tutorial is not sufficient (since you have admitted that you are new to bioinformatics). You need to actually follow the steps in the tutorial in a hands-on manner to understand the process.

        Fair warning. "Just getting the data aligned" is not the right way to go about this. We can give you command lines as to how to run the alignments but if you don't understand what is happening then how can you be sure that the result you get is appropriate/right. If you still want to do that you would need to tell us what genome you want to align to.

        Comment


        • #5
          That's fair enough. I'm having a tough time even trying to go through the tutorial because I simply don't understand how/where I am supposed to issue certain commands in my terminal.

          I'm looking to align it with the Human genome.

          Comment


          • #6
            Go through part 1 of this tutorial to get yourself oriented on the command line use in a terminal. http://korflab.ucdavis.edu/Unix_and_...ent.html#part1 Luckily the example here is for OS X which is what you are using. Should not take you longer than 30 min.

            Start downloading the human genome files from the iGenomes site I linked in the previous post. It is going to take some time (~25G download). We are going to need those files.

            Comment


            • #7
              Have you figured out the basics of the command line by now and managed to run through the tutorial hands-on?

              Comment


              • #8
                I have, I'm just trying to find a way to align it to the Human genome. The version of bowtie I have has only the lambda virus genome, so I was able to align it to that, but I'm confused about how to align it to the Human genome/where I can get that genome.

                Comment


                • #9
                  Illumina provides premade bowtie2 indices for many organisms, including human. Note that they don't yet provide files for the most recent human reference build (GRCh38). If you want to align to that, just download the sequence from UCSC and use bowtie2-build.

                  Comment


                  • #10
                    I should note that the files from Illumina iGenomes are HUGE, since they contain more than just indices.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    9 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    50 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X