Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Why 50

    Originally posted by blakeoft View Post
    You can use bamToBed from bedtools to convert from bam to bed format as you have mentioned. It's quite easy to adjust the start and end site of reads in bed format. I'm not an awk master, so there's probably a better way to do this, but it should work:

    Code:
    awk 'BEGIN {OFS = "\t"} ; {if ($6 == "+") print $1, $2 + 50, $3 + 50, $4, $5, $6; else print $1, $2 - 50, $3 - 50, $4, $5, $6}' alignment.bed > alignment_shifted.bed
    That will shift the start and end forward by fifty base pairs if the strand is +, backward if the strand is -, and print fields 1 through 6. If you need more fields, just tack them on the end.


    I don't think 50 is the number to shift. In Jason's paper, they did "all reads aligning to the + strand were offset by +4 bp, and all reads aligning to the – strand were offset −5 bp". In fact, whether you shift or not, the distribution of reads will not significantly change.

    Comment


    • #17
      Originally posted by dariober View Post
      I was going to say this myself. I treated ATAC-Seq libraries pretty much like ChIP-Seq libraries.

      Trim adapters -> Align (bwa mem) -> Remove reads in blacklist regions if applicable (https://sites.google.com/site/anshul...cts/blacklists) -> Remove reads with MAPQ < 10 or there about -> Remove duplicates (debatable, but I tend to see better results w/o dups) -> call peaks (macs2, with mitochondrial reads removed, see below)

      We find 10s to 100s of thousands of peaks, quite sharp, often mapping to transcription start sites. A feature we find in our data and in the published one is the large number of reads mapping to the mitochondrial genome (up to 40-60%), but it can vary a lot. So I prefer to remove these reads before peak calling.
      Hi Dariober,

      I'm also analyzing ATAC-seq data. I did in a similar way to your procedure:

      1. Map to hg19 with bowtie2 (-X 2000).
      2. Remove reads with MAPQ < 10 and sort bam file using samtools.
      3. Remove duplicates using picard tools.
      4. Merge sorted bam of four replicates together.
      5. Get cutting position for each fragment and visualize cutting frequency (per basepair) in IGV.

      But the peaks I observed are quite noisy. I tried the data from Buenrostro 2013 Nature method paper. I got a good mapping rate (>90%) but a lot of duplicates (60%-70%). After removing duplicates, the per base cutting frequency becomes quite low. I was wondering if you observed a similar pattern for the public data. Would you have some suggestions on how I could improve the signal? Thank you!!

      Comment


      • #18
        Do not remove duplicates

        Originally posted by cloverliang View Post
        Hi Dariober,

        I'm also analyzing ATAC-seq data. I did in a similar way to your procedure:

        1. Map to hg19 with bowtie2 (-X 2000).
        2. Remove reads with MAPQ < 10 and sort bam file using samtools.
        3. Remove duplicates using picard tools.
        4. Merge sorted bam of four replicates together.
        5. Get cutting position for each fragment and visualize cutting frequency (per basepair) in IGV.

        But the peaks I observed are quite noisy. I tried the data from Buenrostro 2013 Nature method paper. I got a good mapping rate (>90%) but a lot of duplicates (60%-70%). After removing duplicates, the per base cutting frequency becomes quite low. I was wondering if you observed a similar pattern for the public data. Would you have some suggestions on how I could improve the signal? Thank you!!

        According to my experience, removing duplicates and trimming the reads to 1 bp (i.e, tracking the integration site) will both decrease the resolution. Removing duplicates will force the coverage to be no higher than the read length, thus compromise good separation of peaks and non-peaks ; trimming the reads to 1bp is equivalent to giving up Kernel functions, thus the peaks will look very narrow and nasty. If you do want to trim to 1bp, you'd better separate the track into plus and minus strands so that you can easily tell which peak is real and which is not based on tag shifting explained here http://www.genomebiology.com/content/9/9/R137

        Comment


        • #19
          Originally posted by resara View Post
          Thanks blakeoft, checking for negative indices and removing them is a good idea.

          I understand what you mean, and in most cases you can disregard the 0/1-based positioning, as it won't affect your shift (it will still shift it by 50bp).

          I was thinking that since bam is 0-based, if the conversion to bed changes the position to 1-based, that's a 1nt difference in position, which could be an issue if you're planning to do footprinting with the alignments.

          edit: sorry, looks like bed is 0-based, so should be fine
          Just fyi, SAM/BAM is 1-based

          Comment


          • #20
            Originally posted by baritone View Post
            According to my experience, removing duplicates and trimming the reads to 1 bp (i.e, tracking the integration site) will both decrease the resolution. Removing duplicates will force the coverage to be no higher than the read length, thus compromise good separation of peaks and non-peaks ; trimming the reads to 1bp is equivalent to giving up Kernel functions, thus the peaks will look very narrow and nasty. If you do want to trim to 1bp, you'd better separate the track into plus and minus strands so that you can easily tell which peak is real and which is not based on tag shifting explained here http://www.genomebiology.com/content/9/9/R137
            Thank you baritone!! I'm looking for TF binding footprints, which presumably have less transposase cuts compared to its flanking region. So I wanted to trim reads to the 1bp cutting position. The peaks I got are really shadow (as in the first track of attached figure). I was wondering if that's common pattern people observed in ATAC-seq data.
            Attached Files

            Comment


            • #21
              Dear all,

              I am currently analyzing my ATAC-seq data and I am wondering what is the best way to adjust the coordinates of alignments.

              I am aware of the solution proposed in this thread by blakeoft, which consists in converting bam file to bed, and to shift reads using awk.

              Even if this approach is efficient and does the job, I was wondering if anyone knows about an easy way to shift coordinates on bam directly, and to conserve read sequences (and qualities, at least for the bases that belong to the original sequence before shifting), as well as information about paires.

              Thanks :-)
              Last edited by Pef; 09-28-2015, 07:39 AM.

              Comment


              • #22
                Originally posted by baritone View Post
                I don't think 50 is the number to shift. In Jason's paper, they did "all reads aligning to the + strand were offset by +4 bp, and all reads aligning to the – strand were offset −5 bp". In fact, whether you shift or not, the distribution of reads will not significantly change.
                This. I do not understand the reasoning behind blakeoft's suggestion to simply shift everything by 50 into one direction. What is this supposed to achieve? Also, as baritone pointed out the original paper has a very different explanation of how they fixed the coordinates.

                Comment


                • #23
                  Since BED files are a combination of +/- strands how can I use this AWK script for the individual strands?


                  Originally posted by blakeoft View Post
                  You can use bamToBed from bedtools to convert from bam to bed format as you have mentioned. It's quite easy to adjust the start and end site of reads in bed format. I'm not an awk master, so there's probably a better way to do this, but it should work:

                  Code:
                  awk 'BEGIN {OFS = "\t"} ; {if ($6 == "+") print $1, $2 + 50, $3 + 50, $4, $5, $6; else print $1, $2 - 50, $3 - 50, $4, $5, $6}' alignment.bed > alignment_shifted.bed
                  That will shift the start and end forward by fifty base pairs if the strand is +, backward if the strand is -, and print fields 1 through 6. If you need more fields, just tack them on the end.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X