Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bedtools 2.25 vs bedtools2.20 output of coverageBed

    In bedTools 2.20 if I run:

    Code:
    coverageBed -hist -d -abam /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_newheader.bam -b /home/cmccabe/Desktop/NGS/bed/sorted_unix_5column_xgen_targets.bed > /home/cmccabe/Desktop/NGS/pool_I_090215/output.bam.hist.txt
    output.bam.hist.txt
    Code:
    chr11	1785011	1785099	chr11:1785011-1785099	CTSD	1	17
    chr11	1785011	1785099	chr11:1785011-1785099	CTSD	2	20
    chr11	1785011	1785099	chr11:1785011-1785099	CTSD	3	20
    Now in bedTools 2.25 since I can not use -hist and -d together (ERROR: -counts, -d, -mean, and -hist are all mutually exclusive options), I run:

    Code:
    coverageBed -hist -abam /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_newheader.bam -b /home/cmccabe/Desktop/NGS/bed/sorted_unix_5column_xgen_targets.bed > /home/cmccabe/Desktop/NGS/pool_I_090215/output.bam.hist.txt
    output.bam.hist.txt
    Code:
    chr1	14251	14393	X28LU:04862:12482	0	+	14251	14393	0,0,0	1	142,	0,	0	142	142	1.0000000
    chr1	16224	16400	X28LU:08504:06628	0	-	16224	16400	0,0,0	1	176,	0,	0	176	176	1.0000000
    chr1	16324	16500	X28LU:06201:09146	3	-	16324	16500	0,0,0	1	176,	0,	0	176	176	1.0000000
    The two outputs look very different and basically all that I am trying to get is the depth per nucleotide in the target.bed.

    target.bed
    Code:
    chr1	955542	955763	+	AGRN:exon.1
    chr1	957570	957852	+	AGRN:exon.2
    chr1	976034	976270	+	AGRN:exon.2;AGRN:exon.3;AGRN:exon.4
    Thank you .

  • #2
    This is just a guess but…

    It is likely the '-hist' and '-d' were ALWAYS incompatible with each other and it is only in the latest revisions where the user is explicitly told that they are attempting to use incompatible options. In earlier versions it may have been that when incompatible options were given on the command line only the last option was used and any previous ones ignored.

    This means that when running 2.20 and you used the command

    Code:
    coverageBed -hist -d ……
    was equivalent to

    Code:
    coverageBed -d ………
    so your previous output is what you would get for the -d option alone. Now when you run 2.25 with

    Code:
    coverageBed -hist ……
    you are getting the output expected for the -hist option.

    Separately run each version of coverageBed (2.20 and 2.25) with the '-d' and '-hist' options alone and compare the outputs from each version with each single option.
    Last edited by kmcarr; 09-24-2015, 05:20 AM.

    Comment


    • #3
      I compared the different options in 2.20 and 2.25 and I really only need -d. However the outputs are very different. I use a xeon processor with 64GB of RAM. I included the times and size of the output. Basically, I just need:

      chrmosome start end strand gene and exon base read depth (output in 2.20).

      I am not sure what changed in 2.25 but it seems to have effected the data. Any sugesstions? Thank you .

      Code:
      coverageBed -d -abam /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_newheader.bam -b /home/cmccabe/Desktop/NGS/bed/sorted_unix_5column_xgen_targets.bed > /home/cmccabe/Desktop/NGS/pool_I_090215/output.bam.hist_2.20.txt
      2.20 output (720MB) took about 5 minutes to process
      Code:
      chr1	161480613	161480756	+	FCGR2A:exon.3;FCGR2A:exon.5	1	250
      chr1	161480613	161480756	+	FCGR2A:exon.3;FCGR2A:exon.5	2	251
      chr1	161480613	161480756	+	FCGR2A:exon.3;FCGR2A:exon.5	3	251


      Code:
      coverageBed -d -abam /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_newheader.bam -b /home/cmccabe/Desktop/NGS/bed/sorted_unix_5column_xgen_targets.bed > /home/cmccabe/Desktop/NGS/pool_I_090215/output.bam.hist_2.25.txt
      2.25 output (took ~45 minutes for 425MB file)
      Code:
      chr1	14251	14393	X28LU:04862:12482	0	+	14251	14393	0,0,0	1	142,	0,	1	0
      chr1	14251	14393	X28LU:04862:12482	0	+	14251	14393	0,0,0	1	142,	0,	2	0
      chr1	14251	14393	X28LU:04862:12482	0	+	14251	14393	0,0,0	1	142,	0,	3	0

      Comment


      • #4
        Originally posted by cmccabe View Post
        Any sugesstions?
        Yes, read the documentation. When software behavior appears to change from one version to the next the first thing to be done is read the change log:

        Version 2.24.0 (27-May-2015)
        1. The coverage tool now takes advantage of pre-sorted intervals via the -sorted option. This allows the coverage tool to be much faster, use far less memory, and report coverage for intervals in their original order in the input file.
        2. We have changed the behavior of the coverage tool such that it is consistent with the other tools. Specifically, coverage is now computed for the intervals in the A file based on the overlaps with the B file, rather than vice versa.
        3. The subtract tool now supports pre-sorted data via the -sorted option and is therefore much faster and scalable.
        4. The -nonamecheck option provides greater tolerance for chromosome labeling when using the -sorted option.
        5. Support for multiple SVLEN tags in VCF format, and fixed a bug that failed to process SVLEN tags coming at the end of a VCF INFO field.
        6. Support for reverse complementing IUPAC codes in the getfasta tool.
        7. Provided greater flexibility for “BED+” files, where the first 3 columns are chrom, start, and end, and the remaining columns are free-form.
        8. We now detect stale FAI files and recreate an index thanks to a fix from @gtamazian.
        9. New feature from Pierre Lindenbaum allowing the sort tool to sort files based on the chromosome order in a faidx file.
        10. Eliminated multiple compilation warnings thanks to John Marshall.
        11. Fixed bug in handling INS variants in VCF files.
        Thank you .
        You're welcome.

        Comment


        • #5
          You must have overlooked this important note

          Code:
          As of version 2.24.0, the coverage tool has changed such that the coverage is computed for the A file, not the B file. This changes the command line interface to be consistent with the other tools. Also, the coverage tool can accept multiple files for the -b option. This allows one to measure coverage between a single query (-a) file and multiple database files (-b) at once!
          Edit: @kmcarr beat me to it.

          Comment


          • #6
            The new output really doesn't make sense to me and I don't think is useful (probably wrong in my thinking)

            Anyway, would something like below (or is there something better) give me the output I am after? Basically, the count of each nucleotide and the output those targets that are less than 30 reads. Thank you .

            Code:
            bedtools genomecov -ibam /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_newheader.bam -bga \
            >    | awk '$4<30' \
            >    | bedtools intersect -a /home/cmccabe/Desktop/NGS/bed/sorted_unix_5column_xgen_targets.bed -b - -f 1.0 >  /home/cmccabe/Desktop/NGS/pool_I_090215/output.bam.txt
            I tried that code and here is the output:

            Code:
            chr1	1950852	1950940	+	GABRD:exon.1
            chr1	6880230	6880320	+	CAMTA1:exon.2
            chr1	16482332	16482437	+	EPHA2:exon.1
            So is each record in that file a target with less than 30 reads? Is there a way to get the read count/depth at that position? Thank you .

            Comment


            • #7
              Guess you did not read what @kmcarr and I pointed out.

              Comment


              • #8
                No I read it and finding the output hard to read and not very useful (probably don't understand it though). The old output in 2.20 made sense and I was attempting to achieve that again. I don't know why -d takes so long in 2.25 but -hist is fast in 2.25. I guess I can run that and use awk? Thank you both for your help .

                Comment


                • #9
                  You need to switch your input files around.

                  Comment


                  • #10
                    I was not even thinking of that. So, use the target bed as the input and those are used to overlap the bam? Thank you .

                    Code:
                    coverageBed -d -b /home/cmccabe/Desktop/NGS/bed/sorted_unix_5column_xgen_targets.bed -abam /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_newheader.bam > /home/cmccabe/Desktop/NGS/pool_I_090215/output.bam.hist.txt

                    Comment


                    • #11
                      Originally posted by cmccabe View Post
                      I was not even thinking of that.
                      That pretty much seems to be your problem.

                      Switching the input files means leaving the program options intact and only switching the inputs.

                      Code:
                      $ coverageBed -d -a /home/cmccabe/Desktop/NGS/bed/sorted_unix_5column_xgen_targets.bed -b /home/cmccabe/Desktop/NGS/pool_I_090215/IonXpress_008_150902_newheader.bam > /home/cmccabe/Desktop/NGS/pool_I_090215/output.bam.hist.txt

                      Comment


                      • #12
                        Thank you very much, I am learning (moving from the lab bench to more bioinformatics). It's a learning process, but I really appreciate your help . Thank you.

                        Comment


                        • #13
                          Did the last command iteration produce output that made sense?

                          Comment


                          • #14
                            Both the -d and -hist options took awhile to run and I had to go to a meeting. I am going to sort and index the reheader bam and try again. The command did run though and I will post back. Thanks again .

                            Comment


                            • #15
                              When I go to sort the bam file I get the below output. I am looking on the "net" but wanted to post as well. I am thuinking that the aligned bam file is already sorted and maybe just needs to be indexed? Thank you .

                              Code:
                              cmccabe@DTV-A5211QLM:~/Desktop/NGS$ samtools sort /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.bam /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0043.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0044.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0045.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0046.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0047.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0048.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0049.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0050.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0051.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0052.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0053.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0054.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0055.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0056.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0057.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0058.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0059.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0060.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0061.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0062.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0063.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0064.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0065.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0066.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0067.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0068.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0069.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0070.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0071.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0072.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0073.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0074.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0075.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0076.bam'
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0077.bam'
                              [bam_sort_core] merging from 78 files...
                              [W::bam_hdr_read] EOF marker is absent. The input is probably truncated.
                              [E::hts_open] fail to open file '/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0043.bam'
                              [bam_merge_core] fail to open file /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902.sorted.0043.bam

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                Yesterday, 07:48 AM
                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 06:57 AM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 07:17 AM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-02-2024, 08:06 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-30-2024, 12:17 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X