Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • manual XS:A[-|+] assignment for cufflinks

    Hi,
    I am using GSNAP and I have assigned the strand to my reads. I have tried to format my bam file in same way tophat's accepted_hits.bam file is formatted (I have placed an example of this below). However, when I run my sorted bam file through cufflinks, I get the following lines:

    "BAM record error: found spliced alignment without XS attribute"

    Does anyone have any idea as to what I could be doing wrong with my formatting? Thanks.

    *Altered GSNAP file: output.bam*
    HTML Code:
    D8FF8JN1:127:C066VACXX:8:1207:12974:19520       147     chr1    4610    40      83M140N16M      =       4311    0       CGGCAGAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGTGCCTCATGAC     DDCADDC@CCDA>CC@@@DDDCADDDDDDDDCDCCCB?DCDDCFFHHBHHDGC=CJIIJGHCHEIIGEHE3JJJJIIHFEGCJIIIIFHHGHDDFDD?C     NM:i:1  XS:A:-  NH:i:1
    D8FF8JN1:127:C066VACXX:7:1103:12412:135532      163     chr1    15      40      75M     =       41      101     ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC     144=BDHHHHHJJJJJJJJJJJJJJJJJIJJJJJJGIJJFHJJJJJJJIJIJJIJIGIHHHHFFFFFEEDEDCDB     NM:i:0  XS:A:-  NH:i:1
    D8FF8JN1:127:C066VACXX:7:1103:12412:135532      83      chr1    41      40      69M6S   =       15      -101    CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCACTCCAC     DABBCDEEDBDFFFFHAHHJIGDJIIHFFJJJIHFJJHHGBJJIHHDJJIGHHJJIHHGJJJIHHHHFCFEFFFC     NM:i:0  XS:A:-  NH:i:1
    *TOPHAT File: accepted_hits.sam*
    HTML Code:
    D8FF8JN1:127:C066VACXX:8:2206:17700:177847      113     chr1    5783    1       28M659N71M      =       5783    0       TCGACCACTTCCCTGGCAGCTCCCTGGACTGAAGGAGACGTGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGTT     DDDCDDCDDDBBCC<3(BDDBDDDDDDEDDDDDDDDDDDDDDDDDDDDDDDDDDDDFFFHHHHJJJJJJJJIIJIJJIJJJJJJJJJHHHHHFFDB=+@     NM:i:4  XS:A:-  NH:i:3  CC:Z:chrX       CP:i:154906249
    D8FF8JN1:127:C066VACXX:8:1307:2101:94989        129     chr1    6590    0       39M88N60M       =       6590    0       GGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTGCCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAG     CCFFFFFHHHHGJ@BD<DDDCECEEDDDDDDDDDDDDDCCDDDDDDDDDDDDCDDDDBB@CDDCCC@@>@B(4>?9:?B?CB?>@DCDAA@AC+4:A@C     NM:i:1  XS:A:-  NH:i:7  CC:Z:chr15      CP:i:100331773
    D8FF8JN1:127:C066VACXX:8:1307:2101:94989        65      chr1    6590    0       39M88N60M       =       6590    0       GGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTGCCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAG     CCFFFFFHHHHGJ@BD<DDDCECEEDDDDDDDDDDDDDCCDDDDDDDDDDDDCDDDDBB@CDDCCC@@>@B(4>?9:?B?CB?>@DCDAA@AC+4:A@C     NM:i:1  XS:A:-  NH:i:7

  • #2
    Hi zorph,

    I too am trying to get cufflinks to read a GSNAP generated SAM file. Just curious, were those XS:A:- or XS:A:+ tags automatically inserted by GSNAP OR you manually inserted them?
    If you manually inserted these tags, how did you figure out strand information (+ or -) ?
    Thanks

    Comment


    • #3
      I'll third this complaint ... with SAM generated by GMAP. I've checked, and all of the records with N's in CIGAR strings do have XS:A:[+-] tags. I had wondered if there was a specific order that cufflinks is expecting the tags to be in, but the OP's example doesn't deviate from the example SAM lines given in the manual, so that seems unlikely.

      Please post if either of you crack this case.

      Comment


      • #4
        The latest version of GSNAP says that ( version released on 2012-04-27 ) it adds the XS tags, so one doe not have to do this manually.

        The XS tag is added to spliced reads and it tells information about which strand the read came from (not the strand it aligned to.) The cufflinks manual says that

        This attribute, which must have a value of "+" or "-", indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it must be present for all spliced alignment records (those with a 'N' operation in the CIGAR string).
        Note the strand it aligned to is easy to get from sam flag. But, getting the strand info of RNA it came from is tricky in unstranded sequencing. TopHat uses splice junction information to infer that. One can manually try to add the XS tag based on the sequence info at the splice junction of the alignment. TopHat manual says that

        With long (>=75bp) reads, "GT-AG", "GC-AG" and "AT-AC" introns will be found ab initio. With shorter reads, TopHat only reports alignments across "GT-AG" introns

        Comment


        • #5
          I should have mentioned that I found my issue. I was careless before in saying that all my spliced alignments had XS:A:[+-] tags. Some of them instead have XS:A:? tags (presumably where the transcript's strand couldn't be determined from the sequence at the edges of the splice?) - and when I removed these undetermined XS tags, Cufflinks doesn't give me that error anymore. Hope this helps someone.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Innovations in Spatial Biology
            by seqadmin


            Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

            3D Genomics
            While spatial biology often involves studying proteins and RNAs in their...
            Yesterday, 07:30 PM
          • seqadmin
            Advancing Precision Medicine for Rare Diseases in Children
            by seqadmin




            Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
            12-16-2024, 07:57 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 12-30-2024, 01:35 PM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-17-2024, 10:28 AM
          0 responses
          41 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-13-2024, 08:24 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-12-2024, 07:41 AM
          0 responses
          40 views
          0 likes
          Last Post seqadmin  
          Working...
          X