Seqanswers Leaderboard Ad

**GenoMax** · 01-06-2015, 05:30 PM

See #2 and 3 posts in this thread: http://seqanswers.com/forums/showthread.php?t=20248. What you need is the .cif files, which are not saved by majority of people who run sequencers for last 2-3 years. I am not sure why you need the intensity files but your best bet would be to ask someone who owns a MiSeq to see if they would be willing to save them for a run.

I found this where a cif file simulator has been discussed (see section 6.1): http://www.wpi.edu/Pubs/E-project/Av...Correction.pdf If you can use simulated data then you may want to contact these authors.

**pmiguel** · 01-07-2015, 09:46 AM

What use is the signal intensity data to you, Mark?

--
Phillip

**Mark2** · 01-07-2015, 10:39 AM

Thanks for your responses. I am interested in detecting heterogeneity in cell populations. More specifically, I am thinking about when one sequences cancer cells from a tumor in which some cells have a certain mutation, and others do not (there may be multiple subclones, or it may just be that there are some normal cells mixed in with the cancer cells, especially if it's a solid tumor).

For example, if you have a population of cells in which half of the cells have a G at a given locus and the other half have a C, due to a mutation in an an 'ancestor.' How well would one be able to detect this sort of heterogeneity at the base level with sequencing data? In any event, this is why I am interested in base signal intensity.

**pmiguel** · 01-07-2015, 10:55 AM

Originally posted by Mark2 View Post

Thanks for your responses. I am interested in detecting heterogeneity in cell populations. More specifically, I am thinking about when one sequences cancer cells from a tumor in which some cells have a certain mutation, and others do not (there may be multiple subclones, or it may just be that there are some normal cells mixed in with the cancer cells, especially if it's a solid tumor).

For example, if you have a population of cells in which half of the cells have a G at a given locus and the other half have a C, due to a mutation in an an 'ancestor.' How well would one be able to detect this sort of heterogeneity at the base level with sequencing data? In any event, this is why I am interested in base signal intensity.

This will not be detectable via intensity files of next gen sequencing data for reasons I won't go into at the moment.

I guess you are thinking about Sanger sequencing intensity files. These are .ab1 files, for example. For Sanger sequencing each base intensity reading is a summation of all the signal from thousands or millions of sequence product strands. Importantly, these product strands potentially derive from a mixed population of templates.

Usage of Sanger sequencing has fallen off dramatically as the price per base of Nextgen sequence is many orders of magnitude less to obtain.

To obtain the equivalent of Sanger intensity values from next gen data sets you would count the numbers of bases at each position of interest in the .bam file. This is arguably more accurate than Sanger for this purpose.

There are, of course, caveats to using either method depending on details of the samples and assays used.

--
Phillip

**Mark2** · 01-07-2015, 12:15 PM

Thanks pmiguel. Would counting numbers of bases at each position be simple to do in IGV? (I ask about IGV because it's the only tool for viewing bam files I'm aware of, feel free to suggest another if preferable).

Edit: actually, can one just use R to view bam files? I just discovered the Rsamtools package. This might be easier as I'm more familiar with R.

**dpryan** · 01-07-2015, 12:48 PM

You could use the coverage histogram in IGV, which would be somewhat simpler than manual counting. An even simpler method would be to just do variant calling with a tool that's intended for complex samples (just google "variant call admixture" or "variant call heterogenous"). Such tools are more likely to directly do what it is you want.

I would generally recommend against processing BAM files in R. Rsamtools works fine, but the R model for this sort of thing generally involves reading the whole BAM file into memory and then processing it...which is often not desireable.

**pmiguel** · 01-08-2015, 10:55 AM

Originally posted by Mark2 View Post

Thanks pmiguel. Would counting numbers of bases at each position be simple to do in IGV? (I ask about IGV because it's the only tool for viewing bam files I'm aware of, feel free to suggest another if preferable).

Edit: actually, can one just use R to view bam files? I just discovered the Rsamtools package. This might be easier as I'm more familiar with R.

It's simple but not scalable. In IGV, IIRC, you just mouse over the position of interest in the coverage histogram and you get the percentage of each possible base at that position. If you wanted to check a few positions, then IGV might be your tool.
I am unfamiliar with Rsamtools.
I agree with dpryan that a variant caller of some sort is the way to go if you want to assess a large number of positions.

--
Phillip

**Mark2** · 01-08-2015, 11:26 AM

Thanks for the suggestions. I am currently looking at a public data set in IGV and am pleasantly surprised at how easy it was to see the coverage histogram.

It would be useful to be able to find all loci at which one base doesn't get 100% of the reads, as opposed to just checking specified loci for this condition. Would a variant caller allow me to do this?

Edit: actually, following dpryan's suggested google search I found a few variant callers that claim to be able to detect this sort of heterogeneity, including one from illumina: http://www.illumina.com/documents/pr...ant_caller.pdf

Anyone familiar with any particular variant callers of this sort?

dpryan: would using python for this necessarily have the same problem you describe regarding R?

Thanks.

**dpryan** · 01-08-2015, 11:53 AM

No, python wouldn't suffer from the same issues. The simplest route would be to use pysam and just make a pileup of a sorted and indexed BAM file that way (you could also simply use "samtools mpileup" and pipe the output into a python script).

I'm not personally familiar with variant callers for this use case, I just knew they existed. You might post a new question asking about that.

**Mark2** · 01-08-2015, 12:30 PM

Originally posted by dpryan View Post

No, python wouldn't suffer from the same issues. The simplest route would be to use pysam and just make a pileup of a sorted and indexed BAM file that way (you could also simply use "samtools mpileup" and pipe the output into a python script).

I'm not personally familiar with variant callers for this use case, I just knew they existed. You might post a new question asking about that.

Ok, thanks, I'll try it with python.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 45 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 46 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 39 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Where to find public sequencing data with signal intensity for each base?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News