Seqanswers Leaderboard Ad

**Dethecor** · 05-18-2011, 05:10 AM

Windowsize or binary?

Hi Phillip,

you should specify more what you want to see. %GC per position would be a track that just tells you for each position whether it is a G or C or not.
But you probably want to compute the %GC in a window around each position or in a set of bins into which you partition your genome.

Either can be achieved for example by using HTSeq to read in your genome, compute the binned or windowed gc-percentage and then write that to a wiggle file (which basically means just writing all the values into a plain text file one value per line and some header line telling the name of the track etc.)

btw.: You might want to check out mappability also to see whether those regions are not mappable due to non-uniqueness of the reads originating there. (e.g. repetitive regions)

Cheers,
Paul

**pmiguel** · 05-18-2011, 06:40 AM

Hi Paul,
Yes I did mean a windowed %GC.

I will look at HTSeq. I am also interested in mappability. Will HTSeq produce a wiggle plot of that as well? This would be a plot of the number of repetitions of a given window of sequence (read length sized?) in the genome. It would have to allow a certain number of mismatches to usefully estimate the mappability of a given area.

Thanks,
Phillip

**Dethecor** · 05-18-2011, 06:50 AM

Mappability

Hi Phillip,

this can be done with HTSeq, but it requires some expertise in programming python (or some time to learn some python).

For mappability I like to generate reads of the same length as my library based on the reference and then align them with the same tool used for the reads from my sample. This way the estimate of mappability best reflects what happened to the reads from the sample. (I usually don't give them qualities but if your aligner worked with a quality instead of a mismatch-count cut-off and you already determined the quality distribution of your reads, then simulating that could be helpful as well)

Then you can check which reads could be mapped (uniquely, if you want to apply such restrictions) and directly get the mappability of each position based on whether or not the read originating there was mapped or not. Followed by some binning or sliding window approach you can get a nice estimate of the mappability.

Cheers,
Paul

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

How to create GC% track for IGV?

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News