Seqanswers Leaderboard Ad

**ETHANol** · 07-09-2012, 09:50 AM

intersectBed from Bedtools. Can that do what you are trying to do?

**swNGS** · 10-15-2012, 01:12 PM

I'm having another stab at this question, and I've tried to visually represent it in an attempt to explain better what I am trying to do:

I have a library that is comprised of thousands of overlapping amplicons.
I have the genomic coordinates of all the amplicons that go to make the library.

I want to be able to count the number of occurrences of paired reads that are derived from the original amplicons. The rationale for this is to look for amplicon drop out in a single sample relative to other multiplexed samples as indicative of a possible variant under a site involved in generating the original amplicon (probe binding site or restriction endonuclease recognition site)

The following image attempts to explain it:
Green track is the expected amplicons.
Blue is the region of interest

Amplicon1 = chr19:11203020-11203404
Amplicon2 = chr19:11203059-11203231
Amplicon3 = chr19:11202984-11203057

I want to be able to count the aligned read pairs that represent each amplicon.
If no amplicons overlapped, I could probably do this relatively easily , but I'm slightly stumped about what to do with situations like amplicons 1& 2... So in that circumstance, I dont want reads from amplicon2 counting towards amplicon1.

The end result should be something like:

Genomic_coord AmpliconID Ocurrances
chr19:11203020-11203404 Amplicon1 153
chr19:11203059-11203231 Amplicon2 15
chr19:11202984-11203057 Amplicon3 48

While I can find plenty of tools to produce summary info on insert size ranges, I can find anything to specifically do this.

I think intersectBed is probably a place to start as suggested above, but cant get my head around the caveats that I listed. I was hoping there would be other helpful suggestions out there.

Thanks,

Chris

**frozenlyse** · 10-15-2012, 08:56 PM

In R, the GenomicRanges package has the function countOverlaps which allows you to specify type="within" - that way the query must be completely inside the subject.

**swNGS** · 10-15-2012, 10:21 PM

How would that work if I have two amplicons where one lies entirely within another?
I'm not sure if the image has uploaded correctly so here is a link
http://s14.postimage.org/rbmswg41r/i...pshot_Main.png

**frozenlyse** · 10-15-2012, 10:33 PM

Ah sorry I didn't notice one was inside the other.

One (hacky) thing you could do is to subtract the counts of amplicon 2 from amplicon 3. Another is that if the reads from amplicons will always start and end at defined positions, you could use type="equal", so that way only reads that start and end exactly at your amplicon start/ends will be counted.

**Chipper** · 10-15-2012, 11:57 PM

I would use a short perl thing with the fragment start and length as a hash key:

samtools view file.bam | perl -e ' while (<>) {split; $id="$_[2]:$_[3]:$_[8]";$hash{$id}++;}foreach (sort keys %hash){print $_ ."\t".$hash{$_}."\n";}' > fragments.count

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Ideas on how to count the number of specific amplicons in aligned amplicon data

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News