Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How Tophat tell junctions from deletion?

    HI,

    I wrote to Cole for how Tophat tell the junctions (with CIGAR line xxxMxxxNxxxM) from deletions (xxxMxxxDxxxM). I guess they might be busy to answer all emails, so I dig the code myself a bit further and found a related part (I guess):

    -------------- long_spanning_reads.cpp --------------

    /*
    * FIXME, currently just differentiating between a deletion and a
    * reference skip based on length. However, would probably be better
    * to denote the difference explicitly, this would allow the user
    * to supply their own (very large) deletions
    */
    if ((lb->right - lb->left - 1) <= max_deletion_length)
    {
    new_cigar.push_back(CigarOp(DEL, lb->right - lb->left - 1));
    antisense_closure = prev_hit->is_spliced() ? prev_hit->antisense_splice() : curr_hit->antisense_splice();
    }
    else
    {
    new_cigar.push_back(CigarOp(REF_SKIP, lb->right - lb->left - 1));
    antisense_closure = lb->antisense;
    }

    However, when I change the value of max_deletion_length option, the output does not change.

    To reproduce what I said, I've made a small example file here:
    -- test.fa (http://zlab.umassmed.edu/~dongx/temp/test.fa) contais 4 reference sequences for index (which are identical in sequences with different ID, in order to get multiple hit)
    -- test.reads.fa (http://zlab.umassmed.edu/~dongx/temp/test.reads.fa) contains 3 reads (as below), where reads "read00" is directly extracted from the reference sequence (so should be perfectly matched). As you see, read1 and read2 are part of read00 (I marked the common part as upper case and the part deleted from read1/2 are marked as lower case). So, ideally, read1 should have CIGAR line as xxxM9DxxxxM, read2 has xxxM8DxxxM and read00 has xxxM.

    $ cat test.reads.fa
    >read1
    CCATCTGTAGAAGTGGAAGGGCTGAAGGAGgAATACTAGAGGCCACTGCAT
    >read2
    CCATCTGTAGAAGTGGAAGGGCTGAAGGAGagAATACTAGAGGCCACTGCAT
    >read00
    CCATCTGTAGAAGTGGAAGGGCTGAAGGAGtagagaggagAATACTAGAGGCCACTGCAT

    I made bowtie2 index from test.fa, called 'test', then run Tophat (v2.0.4):

    tophat -o test --max-deletion-length 10 --no-convert-bam -p 8 test test.reads.fa; grep read test/accepted_hits.sam

    read2 256 116 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 CC:Z:1168 CP:i:1 HI:i:0
    read00 256 116 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 CC:Z:1168 CP:i:1 HI:i:0
    read2 0 1168 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 CC:Z:1176 CP:i:1 HI:i:1
    read00 256 1168 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 CC:Z:1176 CP:i:1 HI:i:1
    read2 256 1176 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 CC:Z:1916 CP:i:1 HI:i:2
    read00 256 1176 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 CC:Z:1916 CP:i:1 HI:i:2
    read2 256 1916 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 HI:i:3
    read00 0 1916 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 HI:i:3

    As you see, read1 is not in the output. I also change to use different value of --max-deletion-length, but never can get read1 included. I also tried to use "-i 5" for example, but still no read1 there. I've also tested different deletion size in read2, and found only when the size < or = 8, it can be output; otherwise, it will be filtered. Why??

    Hopefully I can get your help on this.

    I also try to compile the source code myself. Everything works fine except an error in the make step. I made a note here:
    A blog about Tips and Tricks for Unix, Perl, R, HTML, Javascript, Google API and mostly Bioinformatics

    Appreciated if you can help by the way.

    Best,
    -sterding

Latest Articles

Collapse

  • seqadmin
    Advancing Precision Medicine for Rare Diseases in Children
    by seqadmin




    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
    12-16-2024, 07:57 AM
  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 12-17-2024, 10:28 AM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-13-2024, 08:24 AM
0 responses
42 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-12-2024, 07:41 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-11-2024, 07:45 AM
0 responses
42 views
0 likes
Last Post seqadmin  
Working...
X