Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • UMI-Tools version 0.5: now with tools for cell barcoded scRNA-seq

    We are proud to announce the release of UMI-Tools 0.5.

    UMI-tools provides error aware tools for dealing with short random oligos (Unique Molecular Identifiers/Random Molecular Tags).

    The novel, error corrected UMI deduplication algorithm was published here. We provide tools to group, deduplicate or count reads by the UMIs.

    Find us on PyPI, conda or here: https://github.com/CGATOxford/UMI-tools

    General walk-through

    Droplet-barcoded single cell RNA-seq walk-through

    Version 0.5

    Version 0.5.0 introduces new commands to support single-cell RNA-Seq and reduces run-time. The underlying methods have not changed hence the minor release number uptick.

    UMI-tools goes single cell

    New commands for single cell RNA-Seq (scRNA-Seq):
    whitelist - Extract cell barcodes (CB) from droplet-based scRNA-Seq fastqs and estimate the number of "true" CBs. Outputs a flatfile listing the true cell barcodes and 'error' barcodes within a set distance.. Thanks to @Hoohm for input and patience in testing. Thanks to @k3yavi for input in discussions about implementing a 'knee' method.

    count - Count the number of reads per cell per gene after de-duplication. This tool uses the same underlying methods as group and dedup and acts to simplify scRNA-Seq read-counting with umi_tools.

    count_tab - As per count but works from a flatfile input from e.g featureCounts

    In the process of creating these commands, the options for dealing with UMIs on a "per-gene" basis have been re-jigged to make their purpose clearer.

    To perform group, dedup or count on a per-gene, basis, the --per-gene option should be provided. This must be combined with either --gene-tag if the BAM contains gene assignments in a tag, or --per-contig if the reads have been aligned to a transcriptome. In the later case, if the reads have been aligned to a transcriptome where each contig is a transcript, the option --gene-transcript-map can be used to operate at the gene level. These options are standardised across all tools such that one can easily change e.g a count command into a dedup command.

    Updated options:
    extract - Can now accept regex patterns to describe UMI +/- CB encoding in read(s). See --extract-method=regex option.

    We have written a guide for how to use UMI-tools for scRNA-Seq analysis including estimation of the number of true CBs, flexible extraction of cell barcodes and UMIs and per-cell read-counting as well as common workflow variations.

    Reduced run-time

    Introduced a hashing step to limit the scope of the edit-distance comparisons required to build the networks. Big thanks to @mparker2 for this!

    Simplified installation

    Previously extensions were cythonized and compiled on the fly using pyximport, requiring users to have access to the install directory the first time the extension was required. Now the cythonized extension is provided, and is compiled at install-time.

    Drop us a line here, on twitter (@IanSudbery) or on Github if you need further help or advice.

  • #2
    UnboundLocalError: local variable 'local_min' referenced before assignment

    Hello

    I went through your tutorial which you have provided in this link (Droplet-barcoded single cell RNA-seq walk-through), trying to repeat the codes. Accordingly I used the following command to make a whitelist using umi-tools (v=0.5.0) and python (v=2.7.13):

    umi_tools whitelist --stdin hgmm_100_R1.fastq.gz \
    --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \
    --set-cell-number=100 \
    --plot-prefix=100_cells_whitelist \
    --log2stderr > whitelist.txt

    It starts to work as follow, however it bump into an error:


    Quote:
    # output generated by whitelist --stdin hgmm_100_R1.fastq.gz --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN --set-cell-number=100 --plot-prefix=100_cells_whitelist --log2stderr
    # job started at Wed Sep 6 22:28:00 2017 on n126 -- a096c517-be3e-4d56-9f0e-c4753dbae35e
    # pid: 80367, system: Linux 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64
    # blacklist_tsv : None
    # cell_number : 100
    # compresslevel : 6
    # error_correct_threshold : 1
    # expect_cells : False
    # extract_method : string
    # filter_cell_barcodes : False
    # log2stderr : True
    # loglevel : 1
    # method : reads
    # pattern : CCCCCCCCCCCCCCCCNNNNNNNNNN
    # pattern2 : None
    # plot_prefix : 100_cells_whitelist
    # prime3 : None
    # random_seed : None
    # read2_in : None
    # short_help : None
    # stderr : <open file '<stderr>', mode 'w' at 0x2aaaaaf0d1e0>
    # stdin : <gzip open file 'hgmm_100_R1.fastq.gz', mode 'rb' at 0x2aab4f798db0 0x2aaab49f3b90>
    # stdlog : <open file '<stderr>', mode 'w' at 0x2aaaaaf0d1e0>
    # stdout : <open file '<stdout>', mode 'w' at 0x2aaaaaf0d150>
    # subset_reads : 100000000
    # timeit_file : None
    # timeit_header : None
    # timeit_name : all
    # whitelist_tsv : None
    2017-09-06 22:28:00,838 INFO Starting barcode extraction
    2017-09-06 22:28:00,882 INFO Parsed 0 reads
    2017-09-06 22:28:04,640 INFO Parsed 100000 reads
    2017-09-06 22:28:07,552 INFO Parsed 200000 reads
    2017-09-06 22:28:10,455 INFO Parsed 300000 reads
    2017-09-06 22:28:13,344 INFO Parsed 400000 reads
    2017-09-06 22:28:16,238 INFO Parsed 500000 reads
    2017-09-06 22:28:19,126 INFO Parsed 600000 reads
    2017-09-06 22:28:22,027 INFO Parsed 700000 reads
    2017-09-06 22:28:24,921 INFO Parsed 800000 reads
    2017-09-06 22:28:27,818 INFO Parsed 900000 reads
    2017-09-06 22:28:30,720 INFO Parsed 1000000 reads
    2017-09-06 22:28:33,617 INFO Parsed 1100000 reads
    2017-09-06 22:28:36,510 INFO Parsed 1200000 reads
    2017-09-06 22:28:39,408 INFO Parsed 1300000 reads
    2017-09-06 22:28:42,302 INFO Parsed 1400000 reads
    2017-09-06 22:28:45,195 INFO Parsed 1500000 reads
    2017-09-06 22:28:48,087 INFO Parsed 1600000 reads
    2017-09-06 22:28:50,980 INFO Parsed 1700000 reads
    2017-09-06 22:28:53,870 INFO Parsed 1800000 reads
    2017-09-06 22:28:56,771 INFO Parsed 1900000 reads
    2017-09-06 22:28:59,670 INFO Parsed 2000000 reads
    2017-09-06 22:29:02,566 INFO Parsed 2100000 reads
    2017-09-06 22:29:05,458 INFO Parsed 2200000 reads
    2017-09-06 22:29:08,354 INFO Parsed 2300000 reads
    2017-09-06 22:29:11,248 INFO Parsed 2400000 reads
    2017-09-06 22:29:14,140 INFO Parsed 2500000 reads
    2017-09-06 22:29:17,031 INFO Parsed 2600000 reads
    2017-09-06 22:29:19,919 INFO Parsed 2700000 reads
    2017-09-06 22:29:22,819 INFO Parsed 2800000 reads
    2017-09-06 22:29:25,720 INFO Parsed 2900000 reads
    2017-09-06 22:29:28,614 INFO Parsed 3000000 reads
    2017-09-06 22:29:31,503 INFO Parsed 3100000 reads
    2017-09-06 22:29:34,395 INFO Parsed 3200000 reads
    2017-09-06 22:29:37,287 INFO Parsed 3300000 reads
    2017-09-06 22:29:40,185 INFO Parsed 3400000 reads
    2017-09-06 22:29:43,078 INFO Parsed 3500000 reads
    2017-09-06 22:29:45,970 INFO Parsed 3600000 reads
    2017-09-06 22:29:48,876 INFO Parsed 3700000 reads
    2017-09-06 22:29:51,772 INFO Parsed 3800000 reads
    2017-09-06 22:29:54,667 INFO Parsed 3900000 reads
    2017-09-06 22:29:57,561 INFO Parsed 4000000 reads
    2017-09-06 22:30:00,458 INFO Parsed 4100000 reads
    2017-09-06 22:30:03,348 INFO Parsed 4200000 reads
    2017-09-06 22:30:06,241 INFO Parsed 4300000 reads
    2017-09-06 22:30:09,135 INFO Parsed 4400000 reads
    2017-09-06 22:30:12,028 INFO Parsed 4500000 reads
    2017-09-06 22:30:14,929 INFO Parsed 4600000 reads
    2017-09-06 22:30:17,826 INFO Parsed 4700000 reads
    2017-09-06 22:30:20,728 INFO Parsed 4800000 reads
    2017-09-06 22:30:23,625 INFO Parsed 4900000 reads
    2017-09-06 22:30:26,524 INFO Parsed 5000000 reads
    2017-09-06 22:30:29,420 INFO Parsed 5100000 reads
    2017-09-06 22:30:32,320 INFO Parsed 5200000 reads
    2017-09-06 22:30:35,221 INFO Parsed 5300000 reads
    2017-09-06 22:30:38,132 INFO Parsed 5400000 reads
    2017-09-06 22:30:41,055 INFO Parsed 5500000 reads
    2017-09-06 22:30:43,952 INFO Parsed 5600000 reads
    2017-09-06 22:30:46,861 INFO Parsed 5700000 reads
    2017-09-06 22:30:49,757 INFO Parsed 5800000 reads
    2017-09-06 22:30:52,654 INFO Parsed 5900000 reads
    2017-09-06 22:30:55,553 INFO Parsed 6000000 reads
    2017-09-06 22:30:58,451 INFO Parsed 6100000 reads
    2017-09-06 22:31:01,347 INFO Parsed 6200000 reads
    2017-09-06 22:31:04,246 INFO Parsed 6300000 reads
    2017-09-06 22:31:07,151 INFO Parsed 6400000 reads
    2017-09-06 22:31:10,047 INFO Parsed 6500000 reads
    2017-09-06 22:31:12,946 INFO Parsed 6600000 reads
    2017-09-06 22:31:15,837 INFO Parsed 6700000 reads
    2017-09-06 22:31:18,732 INFO Parsed 6800000 reads
    2017-09-06 22:31:21,622 INFO Parsed 6900000 reads
    2017-09-06 22:31:24,517 INFO Parsed 7000000 reads
    2017-09-06 22:31:27,415 INFO Parsed 7100000 reads
    2017-09-06 22:31:30,242 INFO Starting - whitelist determination
    Traceback (most recent call last):
    File "/home/honardoostma/.conda/envs/pysam-bzlinkerr/bin/umi_tools", line 11, in <module>
    sys.exit(main())
    File "/home/honardoostma/.conda/envs/pysam-bzlinkerr/lib/python2.7/site-packages/umi_tools/umi_tools.py", line 50, in main
    module.main(sys.argv)
    File "/home/honardoostma/.conda/envs/pysam-bzlinkerr/lib/python2.7/site-packages/umi_tools/whitelist.py", line 371, in main
    options.plot_prefix)
    File "/home/honardoostma/.conda/envs/pysam-bzlinkerr/lib/python2.7/site-packages/umi_tools/umi_methods.py", line 399, in getCellWhitelist
    cell_barcode_counts, expect_cells, cell_number, plotfile_prefix)
    File "/home/honardoostma/.conda/envs/pysam-bzlinkerr/lib/python2.7/site-packages/umi_tools/umi_methods.py", line 264, in getKneeEstimate
    if local_min is not None:
    UnboundLocalError: local variable 'local_min' referenced before assignment



    Do you know what is the problem??!!

    Comment


    • #3
      There was a bug in the release version that made the `--set-cell-number` and `--plot-prefix` options incompatible. If you run without `--plot-prefix` (like in the TL;DR version of the guide) it should run fine - the plots are not required if you are fixing the number of cells anyway.

      I don't quite know how this made it past our testing of the guide (which I have now updated). I apologise for that.

      Comment


      • #4
        Thanks
        It worked.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X