Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create GC% track for IGV?

    Anyone know a program or method to create a GC% track for IGV?

    I want to convert a bacterial genome sequence into a %GC track to compare depth of coverage with %GC. If there were some code to produce a (for example) wiggle format file from a genome sequence, I could easily display it. Seems like such a program, or at least a module, will already have been written, but I am not finding it.

    More details:

    I have a couple of Deinococcus radiodurans (~70% GC bacterial genome) TruSeq library data sets sequenced 100x2 PE and aligned to the reference sequence. The average coverage is around 100X, but it is variable and in places very low. I would like to see if the areas of low coverage correlate with the higher GC% areas.

    Here is an example window:


    Thanks,
    --
    Phillip

  • #2
    Windowsize or binary?

    Hi Phillip,

    you should specify more what you want to see. %GC per position would be a track that just tells you for each position whether it is a G or C or not.
    But you probably want to compute the %GC in a window around each position or in a set of bins into which you partition your genome.

    Either can be achieved for example by using HTSeq to read in your genome, compute the binned or windowed gc-percentage and then write that to a wiggle file (which basically means just writing all the values into a plain text file one value per line and some header line telling the name of the track etc.)

    btw.: You might want to check out mappability also to see whether those regions are not mappable due to non-uniqueness of the reads originating there. (e.g. repetitive regions)

    Cheers,
    Paul

    "You are only young once, but you can stay immature indefinitely."

    Comment


    • #3
      Hi Paul,
      Yes I did mean a windowed %GC.

      I will look at HTSeq. I am also interested in mappability. Will HTSeq produce a wiggle plot of that as well? This would be a plot of the number of repetitions of a given window of sequence (read length sized?) in the genome. It would have to allow a certain number of mismatches to usefully estimate the mappability of a given area.

      Thanks,
      Phillip

      Comment


      • #4
        Mappability

        Hi Phillip,

        this can be done with HTSeq, but it requires some expertise in programming python (or some time to learn some python).

        For mappability I like to generate reads of the same length as my library based on the reference and then align them with the same tool used for the reads from my sample. This way the estimate of mappability best reflects what happened to the reads from the sample. (I usually don't give them qualities but if your aligner worked with a quality instead of a mismatch-count cut-off and you already determined the quality distribution of your reads, then simulating that could be helpful as well)

        Then you can check which reads could be mapped (uniquely, if you want to apply such restrictions) and directly get the mappability of each position based on whether or not the read originating there was mapped or not. Followed by some binning or sliding window approach you can get a nice estimate of the mappability.

        Cheers,
        Paul

        "You are only young once, but you can stay immature indefinitely."

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X