Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tximport for salmon output

    I am trying to use tximport to analyze salmon output, but I am stuck with the error below and not sure which files are the problematic ones

    Code:
    summarizing abundance
    summarizing counts
    summarizing length
    Error: all(names(aveLengthSampGene) == rownames(lengthMat)) is not TRUE
    In addition: Warning message:
    In names(aveLengthSampGene) == rownames(lengthMat) :
      longer object length is not a multiple of shorter object length
    this is how I run tximport:
    Code:
    txi.salmon <- tximport(files, type = "salmon", tx2gene = tx2gene, reader = read_tsv)

    Code:
    sessionInfo()
    R version 3.3.1 (2016-06-21)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)
    Running under: OS X 10.9.5 (Mavericks)
    
    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
    [1] readr_1.0.0        tximport_1.0.3     tximportData_1.0.2
    
    loaded via a namespace (and not attached):
    [1] assertthat_0.1 tools_3.3.1    tibble_1.1     Rcpp_0.12.6    tcltk_3.3.1

  • #2
    I happened to meet exactly the same error this morning. It turns out there is something "wrong" in the transcript to gene mapping (tx2gene). I am using GENCODE mouse v10 transcriptome and the tx2gene mapping file provided by GENCODE have lines like this:

    ENSMUST00000023648.5 4930553J12Rik
    ENSMUST00000023648.5 Krtap15
    ENSMUST00000187823.1 4930553J12Rik
    ENSMUST00000187823.1 Krtap15

    The same transcript is mapped to two gene symbols. I solved this issue by simply removing the lines with "4930553J12Rik". Hope this helps

    Comment


    • #3
      Thanks sulicon, that helped.
      btw a quick way to remove those rows in R:
      Code:
      tx2gene <- tx2gene[!duplicated(tx2gene$ens_id_version), ]

      Comment


      • #4
        Isn't aggregating the counts from multiple transcripts to a single gene kinda the point of tximport? Seems like removing the duplicate gene id's is removing important information.


        Originally posted by granger View Post
        Thanks sulicon, that helped.
        btw a quick way to remove those rows in R:
        Code:
        tx2gene <- tx2gene[!duplicated(tx2gene$ens_id_version), ]

        Comment


        • #5
          @peromhc Yes it is, but the problem with the above tx2gene mapping is that it maps one transcript to multiple genes. Best is to keep track of gene ids removed.

          A similar problem I came across: multiple ENS gene ids which encode the same gene product (e.g. have the same HUGO name). Thousands of HUGO genes have multiple ENS ids and that can be a problem.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          66 views
          0 likes
          Last Post seqadmin  
          Working...
          X