Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding transcriptome matches for thousands of 21-mers

    I have received a project and could use some advice.

    My task is to find sequence matches in mRNA databases across 20+ taxa. I was presented an EPA memorandum that outlines a method I was requested to replicate. I am not sure it is the best method and the memorandum didn't go into much detail. I played around a bit and could use a bit of advice.

    Setup:
    I have an unspecified number (I haven't been told yet) of dsRNA segments approximately 300bp in length. I need to match these to the human transcriptome as well as over 20 other taxa.

    Criteria:
    For each 300bp dsRNA, I am to find mRNA in the taxa that have 14 or more matches within a 21bp window. Then I sort the data by taxon, transcripts matched, and the annotation for the matched mRNA.

    Approach (this is where I have questions):
    The EPA memorandum says the Burrows-Wheeler Aligner (BWA) was used to align a 21-mer sliding window along the target transcriptomes to look for matches of 14 or greater within the window. The PI said to create all 21-mers using a sliding window along the dsRNA sequence. Easy enough.

    Here are my questions:
    1. Is BWA the best approach to use? I've never used BWA MEM for anything so small. Is there a better approach?
    2. How should I set the parameters for the BWA for this case? The defaults are inadequate, but I'm just taking stabs in the dark to see what falls out. So far, I have adjusted:
    1. Minimum seed length (-k) down to 3
    2. band width (-w) down to 7
    3. ignore alignment scores lower than (-T) range from 1 to 21
    4. gap open penalty (-O) between 1 and 6
    5. mismatch penalty (-B) between 1 and 4
    1. Why do I see a Bitwise Flag of 0? In adjusting the parameters, the resulting SAM will contain matches where the Bitwise Flag is 0. This seems like nonsense to me, suggesting that I may be on the wrong track.

    Sample Execution:
    ./bwa mem -k 5 -B 1 -O2 -T 5 ../ncbi_dataset/GCF000001405.40.rna.fna ../seqA.fasta | gzip -3 > ../bwa_results/aln_seqA.sam.gz

    Bitwise Flag == 0?
    seqA_332_353 16 XM_011510229.4 7276 0 7S14M * 0 0 TGATCGGTGTAAATCCCATAT * NM:i:0 MD:Z:14 AS:i:14 XS:i:14
    seqA_333_354 0 XM_017008212.3 1680 0 7S14M * 0 0 TATGGGATTTACACCGATCAA * NM:i:0 MD:Z:14 AS:i:14 XS:i:13
    seqA_334_355 0 XM_017008212.3 1680 0 6S15M * 0 0 ATGGGATTTACACCGATCAAC * NM:i:1 MD:Z:14A0 AS:i:14 XS:i:13
    seqA_335_356 0 XM_017008212.3 1680 0 5S14M2S * 0 0 TGGGATTTACACCGATCAACT * NM:i:0 MD:Z:14 AS:i:14 XS:i:13​


Latest Articles

Collapse

  • seqadmin
    Recent Innovations in Spatial Biology
    by seqadmin


    Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

    3D Genomics
    While spatial biology often involves studying proteins and RNAs in their...
    Yesterday, 07:30 PM
  • seqadmin
    Advancing Precision Medicine for Rare Diseases in Children
    by seqadmin




    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
    12-16-2024, 07:57 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 12-30-2024, 01:35 PM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-17-2024, 10:28 AM
0 responses
41 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-13-2024, 08:24 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-12-2024, 07:41 AM
0 responses
40 views
0 likes
Last Post seqadmin  
Working...
X