Finding transcriptome matches for thousands of 21-mers

greymouse451

Junior Member

Join Date: Sep 2024

Posts: 3
- Share
- Tweet
#1

Finding transcriptome matches for thousands of 21-mers

09-24-2024, 12:04 PM

I have received a project and could use some advice.

My task is to find sequence matches in mRNA databases across 20+ taxa. I was presented an EPA memorandum that outlines a method I was requested to replicate. I am not sure it is the best method and the memorandum didn't go into much detail. I played around a bit and could use a bit of advice.

Setup:
I have an unspecified number (I haven't been told yet) of dsRNA segments approximately 300bp in length. I need to match these to the human transcriptome as well as over 20 other taxa.

Criteria:
For each 300bp dsRNA, I am to find mRNA in the taxa that have 14 or more matches within a 21bp window. Then I sort the data by taxon, transcripts matched, and the annotation for the matched mRNA.

Approach (this is where I have questions):
The EPA memorandum says the Burrows-Wheeler Aligner (BWA) was used to align a 21-mer sliding window along the target transcriptomes to look for matches of 14 or greater within the window. The PI said to create all 21-mers using a sliding window along the dsRNA sequence. Easy enough.

Here are my questions:
Is BWA the best approach to use? I've never used BWA MEM for anything so small. Is there a better approach?

How should I set the parameters for the BWA for this case? The defaults are inadequate, but I'm just taking stabs in the dark to see what falls out. So far, I have adjusted:

Minimum seed length (-k) down to 3

band width (-w) down to 7

ignore alignment scores lower than (-T) range from 1 to 21

gap open penalty (-O) between 1 and 6

mismatch penalty (-B) between 1 and 4

Why do I see a Bitwise Flag of 0? In adjusting the parameters, the resulting SAM will contain matches where the Bitwise Flag is 0. This seems like nonsense to me, suggesting that I may be on the wrong track.

Sample Execution:
./bwa mem -k 5 -B 1 -O2 -T 5 ../ncbi_dataset/GCF000001405.40.rna.fna ../seqA.fasta | gzip -3 > ../bwa_results/aln_seqA.sam.gz

Bitwise Flag == 0?
seqA_332_353 16 XM_011510229.4 7276 0 7S14M * 0 0 TGATCGGTGTAAATCCCATAT * NM:i:0 MD:Z:14 AS:i:14 XS:i:14
seqA_333_354 0 XM_017008212.3 1680 0 7S14M * 0 0 TATGGGATTTACACCGATCAA * NM:i:0 MD:Z:14 AS:i:14 XS:i:13
seqA_334_355 0 XM_017008212.3 1680 0 6S15M * 0 0 ATGGGATTTACACCGATCAAC * NM:i:1 MD:Z:14A0 AS:i:14 XS:i:13
seqA_335_356 0 XM_017008212.3 1680 0 5S14M2S * 0 0 TGGGATTTACACCGATCAACT * NM:i:0 MD:Z:14 AS:i:14 XS:i:13
Tags: alignment, bwa, k-mers, off-target, rnai

Previous template Next

Recent Innovations in Spatial Biology

by seqadmin

Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

3D Genomics
While spatial biology often involves studying proteins and RNAs in their...
- Channel: Articles
Yesterday, 07:30 PM
Advancing Precision Medicine for Rare Diseases in Children

by seqadmin

Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
- Channel: Articles
12-16-2024, 07:57 AM

Topics	Statistics	Last Post
Decoding Neurodegeneration with Advanced RNA Sequencing by seqadmin Started by seqadmin, 12-30-2024, 01:35 PM	0 responses 21 views 0 likes	Last Post by seqadmin 12-30-2024, 01:35 PM
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 41 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 55 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 40 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM

Seqanswers Leaderboard Ad

Announcement

Finding transcriptome matches for thousands of 21-mers

Latest Articles

ad_right_rmr

News