Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is this a valid [CircularConsensus] Result Report?

    Hi all,

    I'm new to pacbio analysis. It's my first time to run SMRT analysis to get read of insert. I used the following command:
    Code:
    ConsensusTools.sh CircularConsensus  --minFullPasses 0  --minPredictedAccuracy 75 \
        --parameters /user/smrt/install/smrtanalysis_2.3.0.140936/analysis/etc/algorithm_parameters/2015-11 \
        --numThreads 48 --fofn /user/input.fofn \
        -o /user/output
    I put the above command in a lsf.q file, and ran it on a LSF cluster, using:
    Code:
    bsub -q b_large -o lsf.out -n 8 /user/lsf.q
    Then after two hours, I got the results:


    Successfully completed.

    Resource usage summary:

    CPU time : 166472.47 sec.
    Max Memory : 1900 MB
    Average Memory : 1330.23 MB
    Total Requested Memory : -
    Delta Memory : -
    (Delta: the difference between Total Requested Memory and Max Memory.)
    Max Swap : 3026 MB
    Max Processes : 4
    Max Threads : 43

    The output (if any) follows:

    ConsensusTools v2.3.0.149240 (c) 2014 Pacific Biosciences, Inc.
    02:17:47 [CircularConsensus] Result Report for the 163482 Zmws processed
    Zmw Result #-Zmws %-Zmws
    Successful - Quiver consensus found 55133 33.72 %
    Successful - But only 1 region, no true consensus 11694 7.15 %
    Failed - Exception thrown 0 0.00 %
    Failed - ZMW was not productive 90173 55.16 %
    Failed - Outside of SNR ranges 3923 2.40 %
    Failed - No insert regions found 5 0.00 %
    Failed - Not enough full passes 0 0.00 %
    Failed - Insert length too small 0 0.00 %
    Failed - Post POA requirements not met 0 0.00 %
    Failed - CCS Read below predicted accuracy 473 0.29 %
    Failed - CCS Read was palindrome 2081 1.27 %
    Failed - CCS Read below SNR threshold 0 0.00 %
    Failed - CCS Read too short or long 0 0.00 %

    The input file is 3 .bax.h5, each about 4G. The output are 3 fastq files, ~100M for each.
    Does this seem correct? I'm a bit confused about the "Failed" marks. Is there any document to discribe the report?

    Thanks!

  • #2
    I have not run a CCS analysis on the command line but the files you obtained are similar what I have got from using SMRTportal. Not every ZMW is productive so having 55% ZMW fail is not unexpected.

    @rhall (Dr. Hall) from PacBio participates here and may have detailed explanation for the results next week. Have you run concensustools.sh -h to see if there is inline help. There is some documentation here: https://github.com/PacificBioscience...-Documentation
    Last edited by GenoMax; 03-19-2016, 12:24 PM.

    Comment


    • #3
      Thank you GenoMax! I'm grad to know that my result is all right. I've read the document on github, no explain for the result report was found. I'll try to find more.

      Comment


      • #4
        The results look reasonable. A lot of the command line tools are not well documented, and as the CCS algorithm has fundamentally changed for the new software release the documentation is unlikely to be improved for this version. All percentages are of the total ZMWs (~150,000) given how ZMWs are loaded, the best that can be expected is ~40% (poisson statistics), item by item:
        Successful - Quiver consensus found 55133 33.72 %
        number of consensus sequences, more than one full passes of the insert.
        Successful - But only 1 region, no true consensus 11694 7.15 %
        single pass sequences, due to the '--minFullPasses 0' parameter, normally you would want multiple passes of the insert for a CCS dataset
        Failed - Exception thrown 0 0.00 %
        General catch for ZMWs that throw an error during the calculation
        Failed - ZMW was not productive 90173 55.16 %
        ZMWs that are not loaded with a sequencing template, 55% is reasonable for a well loaded sample
        Failed - Outside of SNR ranges 3923 2.40 %
        There is a per ZMW SNR filter, ZMWs that do not have high SNR are not used to generate consensus sequence
        Failed - No insert regions found 5 0.00 %
        Two adapter sequences joined together without an insert sequece
        Failed - Not enough full passes 0 0.00 %
        You set this as 0 so nothing is filtered
        Failed - Insert length too small 0 0.00 %
        minimum length parameter
        Failed - Post POA requirements not met 0 0.00 %
        I'm not exactly sure, unless this % is high I wouldn't worry about it
        Failed - CCS Read below predicted accuracy 473 0.29 %
        predicted accuracy parameter
        Failed - CCS Read was palindrome 2081 1.27 %
        Reads are palindromic, i.e. you sequence the forward and reverse strands without an adapter being read, this is likely due to sample prep, 1.27% is expected, much higher and sample prep should be looked at.
        Failed - CCS Read below SNR threshold 0 0.00 %
        If a SNR threshold is given as a parameter
        Failed - CCS Read too short or long 0 0.00 %
        Read length paramter

        Comment


        • #5
          Thank you very much, Dr. Hall.

          You said there is a new release of CSS algorithm. When I looked around Seqanswers and the GitHub--PacificBiosciences, there is a new program pbccs dealing with bam files (https://github.com/PacificBiosciences/pbccs). Is this pbccs the new release you mentioned?

          I need to get full length transcripts and do NGS correction after obtaining ccs. According to the tutorial on Github, the RS_IsoSeq pipeline can be easily applied for downstream analysis (I haven't tried yet). Since the new release is available and I haven't started processing my data yet, I think it may be good for me to use the new software. But is there (or do I need to use) other tools for downstream analysis if I switch to pbcss (or new release of css algorithm)?

          Comment


          • #6
            As a transcript analysis pipeline RS_IsoSeq is an end to end solution, you do not need to run ccs independently. The pbccs on github is the new algorithm, but I wouldn't worry about it for transcripts, RS_IsoSeq is sufficient.

            Comment


            • #7
              @rhall: Is RS_IsoSeq a SMRTportal only workflow? It appears that @xhuister is working on the command line and may not have SMRTportal installed.
              Last edited by GenoMax; 03-22-2016, 08:25 AM.

              Comment


              • #8
                It is possible to run RS_IsoSeq (or any SMRTPortal workflow) via the commandline, but if you don't have SMRTportal installed it is difficult to generate a valid parameter file. The recommended method for running a transcript analysis is to run in steps, https://github.com/PacificBioscience...ds#commandline I still wouldn't worry about using the new CCS algorithm unless you already have access to SMRT Link 3.0.

                Comment


                • #9
                  Thank you very much, GenoMax and rhall. I'll continue to use RS_ISOSeq as you suggested. I've installed and tried to run the SMRT Analysis on a Computer Cluster via LSF. Till now I've finished the "Getting full length reads" step via pbtranscript.py. Hope the subsequent analysis will go well~

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  31 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X