Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Downsampling a BAM file

    Hi ChIP-seq experts

    I'm a newbie in the field of ChIP-seq data mining and need some help! I've sequenced several samples and mapped them using bowtie and everything looks fine so fare

    But they differ somewhat in sequence depth which makes them difficult to compare - until now I've used coverageBed (bedtools) to find the read coverage around TSS and in my peak regions and then normalized the read count in these regions to sequence depth.

    But for some of my future analysis it would be really nice if the BAM file was normalized to sequence depth - simply, I want to remove some random reads from one sample so it has the same amount of reads as my second sample... I've found that picard "DownsampleSam" should be able to do this, however I cannot get the programme to work on my (mac) computer.

    I hope someone can help!!

    BR, Kathrine

  • #2
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    If you have no headers or you convert to BED the attached perl script should work as well. Warning: I'm not really sure what the script does, but it seems to work.
    Attached Files
    --------------
    Ethan

    Comment


    • #3
      You could use bamtools random for this as well.
      What errors is Picard giving you?

      Comment


      • #4
        Thank you so much for your input!!

        I've tried the code, but i doesn't seem to work - I'm not that much up for converting it to a bed file as I need the BAM format later on.

        I'e tried bamtools random, but keep on getting the same error message

        bamtools random ERROR: could not load index data for all input BAM file(s)... Aborting.

        My code line is as follows:

        bamtools random -in Input_file.bam -out output_reduced.bam -n 1000000

        The input bam file originates from the SAM file produced when mapping with bowtie - it is converted to BAM with "samtools view", and sorted with "samtools sort" - then I extract all mapped reads with "samtools view -b -F 4"

        As you might can imagine I'm a newbie in this field and all help is very much appreciated!

        Comment


        • #5
          Regarding the Picard errors - I think it relates to the (mac) version of my java (which otherwise is up to date):

          Exception in thread "main" java.lang.NoClassDefFoundError: jvm-argsCaused by: java.lang.ClassNotFoundException: jvm-args
          at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
          at java.security.AccessController.doPrivileged(Native Method)
          at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
          at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

          Comment


          • #6
            You need to index the bam file first. You can do this with bamtools:

            bamtools index -in Input_file.bam

            Comment


            • #7
              Originally posted by KathrineBL View Post
              Regarding the Picard errors - I think it relates to the (mac) version of my java (which otherwise is up to date):

              Exception in thread "main" java.lang.NoClassDefFoundError: jvm-args
              [...]
              You're using this as a template, I'm guessing:
              java jvm-args -jar PicardCommand.jar OPTION1=value1 OPTION2=value2...

              Like the command and the options, jvm-args needs to be substituted for actual java arguments. A typical example is -Xmx2g (specifying 2G of memory allocated for the run)

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              56 views
              0 likes
              Last Post seqadmin  
              Working...
              X