Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ATAC Seq data analysis

    Hi everybody,

    We have recently done our first ATAC Seq project in our lab and my task is analysing the data. But since I am quite new to bioinformatics, I was wondering whether somebody here has experience with the different analysis steps?

    So far, I've mapped the reads with bwt and have the .bam files.
    Now in the paper (Buenrostro et al. 2013) they say they adjusted the read start sites to represent the center of the transposon binding event for peak calling and footprinting and then used ZINBA for peak calling.

    How would I go about adjusting the read start sites?
    And ZINBA wants .bed tagAlign or bowtie files. Would you simply transfom bam to bed with bedtools?

    Thanks!

  • #2
    You can use bamToBed from bedtools to convert from bam to bed format as you have mentioned. It's quite easy to adjust the start and end site of reads in bed format. I'm not an awk master, so there's probably a better way to do this, but it should work:

    Code:
    awk 'BEGIN {OFS = "\t"} ; {if ($6 == "+") print $1, $2 + 50, $3 + 50, $4, $5, $6; else print $1, $2 - 50, $3 - 50, $4, $5, $6}' alignment.bed > alignment_shifted.bed
    That will shift the start and end forward by fifty base pairs if the strand is +, backward if the strand is -, and print fields 1 through 6. If you need more fields, just tack them on the end.

    Comment


    • #3
      Thanks a lot!

      Comment


      • #4
        Originally posted by Sciurus View Post
        Hi everybody,

        We have recently done our first ATAC Seq project in our lab and my task is analysing the data. But since I am quite new to bioinformatics, I was wondering whether somebody here has experience with the different analysis steps?

        So far, I've mapped the reads with bwt and have the .bam files.
        Now in the paper (Buenrostro et al. 2013) they say they adjusted the read start sites to represent the center of the transposon binding event for peak calling and footprinting and then used ZINBA for peak calling.

        How would I go about adjusting the read start sites?
        And ZINBA wants .bed tagAlign or bowtie files. Would you simply transfom bam to bed with bedtools?

        Thanks!

        For trimming follow what blakeoft said, and for peak calling you do not need to use ZINBA, rather use macs2 which is more easier to use, faster, accurate and works well with ATAC-SEQ.
        Good luck

        Comment


        • #5
          Originally posted by hkoohy View Post
          for peak calling you do not need to use ZINBA, rather use macs2 which is more easier to use, faster, accurate and works well with ATAC-SEQ.
          Good luck
          I was going to say this myself. I treated ATAC-Seq libraries pretty much like ChIP-Seq libraries.

          Trim adapters -> Align (bwa mem) -> Remove reads in blacklist regions if applicable (https://sites.google.com/site/anshul...cts/blacklists) -> Remove reads with MAPQ < 10 or there about -> Remove duplicates (debatable, but I tend to see better results w/o dups) -> call peaks (macs2, with mitochondrial reads removed, see below)

          We find 10s to 100s of thousands of peaks, quite sharp, often mapping to transcription start sites. A feature we find in our data and in the published one is the large number of reads mapping to the mitochondrial genome (up to 40-60%), but it can vary a lot. So I prefer to remove these reads before peak calling.

          Comment


          • #6
            Originally posted by dariober View Post
            I was going to say this myself. I treated ATAC-Seq libraries pretty much like ChIP-Seq libraries.

            Trim adapters -> Align (bwa mem) -> Remove reads in blacklist regions if applicable (https://sites.google.com/site/anshul...cts/blacklists) -> Remove reads with MAPQ < 10 or there about -> Remove duplicates (debatable, but I tend to see better results w/o dups) -> call peaks (macs2, with mitochondrial reads removed, see below)

            We find 10s to 100s of thousands of peaks, quite sharp, often mapping to transcription start sites. A feature we find in our data and in the published one is the large number of reads mapping to the mitochondrial genome (up to 40-60%), but it can vary a lot. So I prefer to remove these reads before peak calling.
            Have you ever investigated the effect of mitcondria reads on peak calling(in particular with macs2) for ATAC_Seq for which we have a lot of MT reads?

            Comment


            • #7
              Originally posted by hkoohy View Post
              Have you ever investigated the effect of mitcondria reads on peak calling(in particular with macs2) for ATAC_Seq for which we have a lot of MT reads?
              Not quite. From what I recall keeping chrM doesn't affect the peak calling in the rest of genome too much. After all chrM is small and the ATAC-Seq reads there are fairly uniform, so macs is not too affected. Nevertheless I feel more comfortable removing chrM to avoid such imbalance.

              Comment


              • #8
                thanks a lot,
                fair enough.
                Hkoohy

                Comment


                • #9
                  Thanks for your replies!

                  But in order to work with macs2, I'd need to use Python or is there another way of implementing it in linux?

                  I don't know how to program with Python, btw. ;-)

                  Comment


                  • #10
                    Originally posted by Sciurus View Post
                    Thanks for your replies!

                    But in order to work with macs2, I'd need to use Python or is there another way of implementing it in linux?

                    I don't know how to program with Python, btw. ;-)

                    No, you do not have do any python scripting, rather you must first install it and then
                    Code:
                    macs2 --help
                    will provide you with the functions and in particular you need the callpeak function:
                    Code:
                    macs2 callpeak --help

                    Comment


                    • #11
                      It looks like you just need to have Python installed, so you won't need to do any kind of Python programming. Check out the INTALL.rst file in the macs2 tarball for information on what you need to have installed and steps for installing macs2.

                      Comment


                      • #12
                        Originally posted by blakeoft View Post
                        You can use bamToBed from bedtools to convert from bam to bed format as you have mentioned. It's quite easy to adjust the start and end site of reads in bed format. I'm not an awk master, so there's probably a better way to do this, but it should work:

                        Code:
                        awk 'BEGIN {OFS = "\t"} ; {if ($6 == "+") print $1, $2 + 50, $3 + 50, $4, $5, $6; else print $1, $2 - 50, $3 - 50, $4, $5, $6}' alignment.bed > alignment_shifted.bed
                        That will shift the start and end forward by fifty base pairs if the strand is +, backward if the strand is -, and print fields 1 through 6. If you need more fields, just tack them on the end.
                        Oh man, this is very useful. I need to school myself in AWK. I'm still making perl scripts to do this stuff.

                        I have a question though: I always thought that bam files were 0-indexed - does the bam to bed conversion change this, and so therefore do you need to account for it when you shift the read positions?

                        Comment


                        • #13
                          resara,

                          I picked 50 base pairs as an example. However, I don't think that 0-indexing or otherwise will affect how much you want to shift the reads. For example, 0 -> 50 vs 1 -> 51 is is still a shift by 50 bases.

                          Regarding my awk code, you might also want to make sure you aren't shifting your reads to negative indices. You could check for that with a similar awk call.

                          Comment


                          • #14
                            Thanks blakeoft, checking for negative indices and removing them is a good idea.

                            I understand what you mean, and in most cases you can disregard the 0/1-based positioning, as it won't affect your shift (it will still shift it by 50bp).

                            I was thinking that since bam is 0-based, if the conversion to bed changes the position to 1-based, that's a 1nt difference in position, which could be an issue if you're planning to do footprinting with the alignments.

                            edit: sorry, looks like bed is 0-based, so should be fine

                            Comment


                            • #15
                              Thanks a lot!

                              Now, when using macs2 is it correct to set the offset of 75bp, that was used by Buenrostro et al. with
                              --nomodel --shift 37 --extsize 37?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X