Hello everyone,
I'm trying to analyze small ncRNAs from Solexa sequencing data.
I've clipped, collapsed, and then aligned the sequences up to 15 times each to the genome.
The problem comes about in the analysis. A lot of the sequences share basically the same alignment. I would like to group these sequences in some way if their coordinates are collapsible and also sum the number of counts if they are collapsed this would make it much easier for comparisons between samples.
Here is an example of the data I have:
Any help with this problem would definitely be appreciated. I'm a little stumped.
I'm trying to analyze small ncRNAs from Solexa sequencing data.
I've clipped, collapsed, and then aligned the sequences up to 15 times each to the genome.
The problem comes about in the analysis. A lot of the sequences share basically the same alignment. I would like to group these sequences in some way if their coordinates are collapsible and also sum the number of counts if they are collapsed this would make it much easier for comparisons between samples.
Here is an example of the data I have:
Code:
Chr2 4628 4644 TGAAAGACGAACAACT Chr2 4628 4645 TGAAAGACGAACAACTG Chr2 4628 4646 TGAAAGACGAACAACTGC Chr2 4628 4647 TGAAAGACGAACAACTGCG Chr2 4628 4648 TGAAAGACGAACAACTGCGA Chr2 4628 4649 TGAAAGACGAACAACTGCGAA Chr2 4628 4650 TGAAAGACGAACAACTGCGAAA Chr2 4628 4651 TGAAAGACGAACAACTGCGAAAG Chr2 4628 4652 TGAAAGACGAACAACTGCGAAAGC Chr2 4628 4653 TGAAAGACGAACAACTGCGAAAGCA Chr2 4628 4654 TGAAAGACGAACAACTGCGAAAGCAT Chr2 4628 4655 TGAAAGACGAACAACTGCGAAAGCATT Chr2 4628 4656 TGAAAGACGAACAACTGCGAAAGCATTT Chr2 4628 4659 TGAAAGACGAACAACTGCGAAAGCATTTGCC Chr2 4629 4645 GAAAGACGAACAACTG Chr2 4629 4646 GAAAGACGAACAACTGC Chr2 4629 4647 GAAAGACGAACAACTGCG Chr2 4629 4648 GAAAGACGAACAACTGCGA Chr2 4629 4649 GAAAGACGAACAACTGCGAA Chr2 4629 4650 GAAAGACGAACAACTGCGAAA Chr2 4629 4651 GAAAGACGAACAACTGCGAAAG Chr2 4629 4652 GAAAGACGAACAACTGCGAAAGC Chr2 4629 4653 GAAAGACGAACAACTGCGAAAGCA Chr2 4629 4654 GAAAGACGAACAACTGCGAAAGCAT Chr2 4629 4655 GAAAGACGAACAACTGCGAAAGCATT Chr2 4629 4657 GAAAGACGAACAACTGCGAAAGCATTTG Chr2 4629 4659 GAAAGACGAACAACTGCGAAAGCATTTGCC Chr2 4629 4660 GAAAGACGAACAACTGCGAAAGCATTTGCCA Chr2 4629 4661 GAAAGACGAACAACTGCGAAAGCATTTGCCAA Chr2 4630 4645 AAAGACGAACAACTG Chr2 4630 4646 AAAGACGAACAACTGC Chr2 4630 4647 AAAGACGAACAACTGCG Chr2 4630 4648 AAAGACGAACAACTGCGA Chr2 4630 4649 AAAGACGAACAACTGCGAA Chr2 4630 4650 AAAGACGAACAACTGCGAAA Chr2 4630 4651 AAAGACGAACAACTGCGAAAG Chr2 4630 4652 AAAGACGAACAACTGCGAAAGC Chr2 4630 4653 AAAGACGAACAACTGCGAAAGCA Chr2 4630 4654 AAAGACGAACAACTGCGAAAGCAT Chr2 4630 4655 AAAGACGAACAACTGCGAAAGCATT Chr2 4630 4656 AAAGACGAACAACTGCGAAAGCATTT Chr2 4630 4657 AAAGACGAACAACTGCGAAAGCATTTG Chr2 4630 4660 AAAGACGAACAACTGCGAAAGCATTTGCCA Chr2 4630 4662 AAAGACGAACAACTGCGAAAGCATTTGCCAAG Chr2 4631 4646 AAGACGAACAACTGC Chr2 4631 4647 AAGACGAACAACTGCG Chr2 4631 4648 AAGACGAACAACTGCGA Chr2 4631 4649 AAGACGAACAACTGCGAA Chr2 4631 4650 AAGACGAACAACTGCGAAA Chr2 4631 4651 AAGACGAACAACTGCGAAAG Chr2 4631 4652 AAGACGAACAACTGCGAAAGC Chr2 4631 4653 AAGACGAACAACTGCGAAAGCA Chr2 4631 4654 AAGACGAACAACTGCGAAAGCAT Chr2 4631 4655 AAGACGAACAACTGCGAAAGCATT Chr2 4631 4656 AAGACGAACAACTGCGAAAGCATTT Chr2 4631 4659 AAGACGAACAACTGCGAAAGCATTTGCC Chr2 4632 4647 AGACGAACAACTGCG
Comment