Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genome size and corrected genome size

    Hi,
    I came across this term "corrected genome size" while reading one paper. Is there any difference between genome size and corrected genome size. If yes, then what needs to be corrected for genome size.

  • #2
    hi priya,
    You haven't given any specific detail. Was the term used in context to NGS analysis? Was it referring to a reference genome or a de novo assembled one?

    Comment


    • #3
      Originally posted by amitm View Post
      hi priya,
      You haven't given any specific detail. Was the term used in context to NGS analysis? Was it referring to a reference genome or a de novo assembled one?
      I came across this term in paper describing normalization of chip-seq reads .
      For your better understanding , I attached screenshoot of lines from the paper .
      Attached Files
      Last edited by priya; 03-09-2015, 03:06 AM.

      Comment


      • #4
        hi,
        it seems that corrected genome size is the area of the genome covered by all the ChIP-seq reads in that sample.
        So, if sample A has 20M reads and they cover 2Gb of hg19, then corrected genome size is 2Gb.

        Comment


        • #5
          Originally posted by amitm View Post
          hi,
          it seems that corrected genome size is the area of the genome covered by all the ChIP-seq reads in that sample.
          So, if sample A has 20M reads and they cover 2Gb of hg19, then corrected genome size is 2Gb.
          Hi amitm,
          Thank you for your reply!

          Can you please clarify me how to calculate the genome coverage from sequencing experiment.
          For sample read coverage, i can easily check the alignment logs (for eg: Bowtie log files), which gives me clearly stat of number of reads mapped per sample.

          Comment


          • #6
            hi,
            Once you have done the mapping of reads, use the BAM file obtained to create a BED file.
            Use bedtools -


            Then, the coordinates returned would be overlapping. You need to merge them to create "unique" regions/ coordinate intervals.
            Use -


            Once there, add up the lengths of all intervals and thats the portion of the genome covered, i.e. corrected genome size

            Comment


            • #7
              Originally posted by amitm View Post
              hi,
              Once you have done the mapping of reads, use the BAM file obtained to create a BED file.
              Use bedtools -


              Then, the coordinates returned would be overlapping. You need to merge them to create "unique" regions/ coordinate intervals.
              Use -


              Once there, add up the lengths of all intervals and thats the portion of the genome covered, i.e. corrected genome size
              Hi amitm,
              Thanks alot for your clear explaination. I will try it out

              Comment


              • #8
                You can use BEDOPS bam2bed to convert from BAM to BED, pipe to bedops to merge overlapping elements, and pipe to bedmap to generate a list of lengths per merged element to sum with awk:

                $ bam2bed < foo.bam | bedops --merge - | bedmap --echo-overlap-size - | awk '{s += $1;} END {print s;}' > answer.txt

                In this case, bedmap is mapping merged elements against themselves. Merged elements coming out of bedops are guaranteed to be disjoint, so --echo-overlap-size is guaranteed to report the unique length of each merged element.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin


                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                37 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                41 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                35 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X