Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HGAP with fastQ reads

    Hi all,

    I have a barcoded SMRTcell in bas.h5 format which I had to de-barcode into .fastq files (you can't de-barcode into bas.h5 format). I now want to assemble these fastQ files with HGAP, but I keep getting an error;

    Code:
    [DEBUG] 2013-09-10 11:20:23,899 [pbpy.smrtpipe.SmrtPipeContext initializeContext 240] Initialize Context with dataUrl xml:input.xml
    [DEBUG] 2013-09-10 11:20:23,899 [pbpy.smrtpipe.SmrtPipeContext initializeContext 241] Initialize Context with settings file HGAP_Assembly_Advanced.1.xml
    [DEBUG] 2013-09-10 11:20:23,899 [pbpy.smrtpipe.InputData loadXml 197] Loading file input.xml
    [DEBUG] 2013-09-10 11:20:23,899 [pbpy.io.MetaAnalysisXml load 101] Loading input.xml
    [DEBUG] 2013-09-10 11:20:23,900 [pbpy.io.MetaAnalysisXml load 116] No header found in input.xml. Unable to load jobId
    [DEBUG] 2013-09-10 11:20:23,900 [pbpy.io.MetaAnalysisXml _parseRef 584] Successfully parsed run:0000000-0000 to run 0000000-0000
    [DEBUG] 2013-09-10 11:20:23,900 [pbpy.smrtpipe.InputData loadXml 214] Skipping assignment of JobId. Unable to find header in input.xml
    [DEBUG] 2013-09-10 11:20:23,900 [pbpy.smrtpipe.SmrtDataService setupOutputDirectories 113] SmrtDataService (oid 488599504) setting up output directories in /net/fs01/mnt/storage/rawdata/NGS_projects/
    [DEBUG] 2013-09-10 11:20:23,900 [pbpy.smrtpipe.SmrtDataService setupOutputDirectories 116] output directory /net/fs01/mnt/storage/rawdata/ has already been created. Skipping
    [DEBUG] 2013-09-10 11:20:23,900 [pbpy.smrtpipe.SmrtDataService put 191] SmrtDataService (oid 488599504) repository has 1 items
    [INFO] 2013-09-10 11:20:23,900 [pbpy.smrtpipe.SmrtDataService setupOutputDirectories 132] Creating /net/fs01/mnt/storage/rawdata/data
    [DEBUG] 2013-09-10 11:20:23,901 [pbpy.smrtpipe.SmrtDataService put 191] SmrtDataService (oid 488599504) repository has 2 items
    [INFO] 2013-09-10 11:20:23,901 [pbpy.smrtpipe.SmrtDataService setupOutputDirectories 132] Creating /net/fs01/mnt/storage/rawdata/results
    [DEBUG] 2013-09-10 11:20:23,902 [pbpy.smrtpipe.SmrtDataService put 191] SmrtDataService (oid 488599504) repository has 3 items
    [DEBUG] 2013-09-10 11:20:23,902 [pbpy.smrtpipe.SmrtDataService setupOutputDirectories 130] Skipping dir creation of log. Found /net/fs01/mnt/storage/rawdata/log
    [DEBUG] 2013-09-10 11:20:23,902 [pbpy.smrtpipe.SmrtDataService put 191] SmrtDataService (oid 488599504) repository has 4 items
    [INFO] 2013-09-10 11:20:23,902 [pbpy.smrtpipe.SmrtDataService setupOutputDirectories 136] Completed setting up DataServices in /net/fs01/mnt/storage/rawdata
    [DEBUG] 2013-09-10 11:20:23,902 [pbpy.smrtpipe.SmrtDataService put 191] SmrtDataService (oid 488599504) repository has 5 items
    [INFO] 2013-09-10 11:20:23,902 [pbpy.smrtpipe.SmrtPipeContext initializeContext 250] Input data is [ run/0000000-0000 ]
    [INFO] 2013-09-10 11:20:23,959 [pbpy.smrtpipe.Heartbeat __init__ 41] heartbeat sleepTime set to 60
    [DEBUG] 2013-09-10 11:20:23,959 [pbpy.smrtpipe.SmrtPipeContext initializeContext 307] No reference found or given
    [DEBUG] 2013-09-10 11:20:23,959 [pbpy.smrtpipe.SmrtPipeContext initializeContext 309] Completed initializeContext
    [DEBUG] 2013-09-10 11:20:23,960 [pbpy.smrtpipe.SmrtPipeContext movieFiles 360] Reading reads data from [ run/0000000-0000 ]
    [INFO] 2013-09-10 11:20:23,960 [pbpy.smrtpipe.SmrtPipeContext movieFiles 364] Pulling data from tags: *
    [INFO] 2013-09-10 11:20:23,960 [pbpy.smrtpipe.SmrtPipeMain _getBasVersions 463] Detected basH5 versions set([])
    [INFO] 2013-09-10 11:20:23,960 [pbpy.smrtpipe.SmrtPipeMain run 551] smrtpipe.py running on n003.genomics.pssclabs.com n003.genomics.pssclabs.com
    [INFO] 2013-09-10 11:20:23,960 [pbpy.smrtpipe.SmrtPipeMain run 567] SMRT Analysis v2.0.0 / SMRTpipe 1.58.120456
    [INFO] 2013-09-10 11:20:23,960 [pbpy.smrtpipe.SmrtPipeMain run 570] Starting smrtpipe version 1.58.120456
    [DEBUG] 2013-09-10 11:20:23,961 [pbpy.smrtpipe.SmrtReportService generateFinalReport 558] jobInfo None dataUrl xml:input.xml paramsFile HGAP_Assembly_Advanced.1.xml
    [INFO] 2013-09-10 11:20:23,961 [pbpy.smrtpipe.SmrtReportService generateFinalReport 593] Skipping JNLP creation because no job id or no cmp.h5
    [DEBUG] 2013-09-10 11:20:23,962 [pbpy.smrtpipe.SmrtReportService _generateIndexHtml 537] Writing Report XML to /net/fs01/mnt/storage/rawdata/toc.xml
    [DEBUG] 2013-09-10 11:20:23,963 [pbpy.smrtpipe.SmrtReportService _generateIndexHtml 546] Generating html file from xsl /opt/smrtanalysis/analysis/etc/xsl/toc.xsl
    [DEBUG] 2013-09-10 11:20:23,967 [pbpy.smrtpipe.SmrtWorker backticks 63] Running on n003.genomics.pssclabs.com with cmd saxonb9 -xsl:/opt/smrtanalysis/analysis/etc/xsl/toc.xsl -s:/net/fs01/mnt/storage/rawdata/toc.xml -o:/net/fs01/mnt/storage/rawdata/index.html
    [DEBUG] 2013-09-10 11:20:24,442 [pbpy.smrtpipe.SmrtWorker backticks 87] Successful output (Return code = 0) in 0.47 sec (0.01 min) of saxonb9 -xsl:/opt/smrtanalysis/analysis/etc/xsl/toc.xsl -s:/net/fs01/mnt/storage/rawdata/toc.xml -o:/net/fs01/mnt/storage/rawdata/index.html
    [DEBUG] 2013-09-10 11:20:24,442 [pbpy.smrtpipe.SmrtPipeContext exe 449] Output = []
    [DEBUG] 2013-09-10 11:20:24,442 [pbpy.smrtpipe.SmrtPipeContext exe 451] Error Code = 0
    [DEBUG] 2013-09-10 11:20:24,442 [pbpy.smrtpipe.SmrtPipeContext exe 452] Error Message = 
    [DEBUG] 2013-09-10 11:20:24,443 [pbpy.smrtpipe.SmrtReportService _toRdf 314] Adding Auxiliary Files to RDF.
    [DEBUG] 2013-09-10 11:20:24,443 [pbpy.smrtpipe.SmrtReportService _toRdf 355] No barcodeFasta to add to ReportService
    [DEBUG] 2013-09-10 11:20:24,444 [pbpy.smrtpipe.SmrtReportService generateFinalReport 611] Writing Metadata XML to /net/fs01/mnt/storage/rawdata/metadata.rdf
    [DEBUG] 2013-09-10 11:20:24,446 [pbpy.smrtpipe.engine.SmrtPipeFiles __call__ 53] Adding instance file://jobId/input.fofn with key ('file', 'input.fofn', None)
    [DEBUG] 2013-09-10 11:20:24,446 [pbpy.smrtpipe.engine.SmrtPipeFiles __call__ 53] Adding instance file://jobId/results/overview.xml with key ('file', 'results/overview.xml', None)
    [DEBUG] 2013-09-10 11:20:24,446 [pbpy.smrtpipe.engine.SmrtPipeFiles __call__ 53] Adding instance file://jobId/data/chemistry_mapping.xml with key ('file', 'data/chemistry_mapping.xml', None)
    [DEBUG] 2013-09-10 11:20:24,446 [pbpy.smrtpipe.engine.SmrtPipeTasks task 48] Setting default Task type to LocalTask
    [DEBUG] 2013-09-10 11:20:24,446 [pbpy.smrtpipe.engine.SmrtPipeTasks task 48] Setting default Task type to LocalTask
    [DEBUG] 2013-09-10 11:20:24,446 [pbpy.smrtpipe.engine.SmrtPipeTasks task 48] Setting default Task type to LocalTask
    [DEBUG] 2013-09-10 11:20:24,447 [pbpy.smrtpipe.SmrtPipeMain _dynamicallyLoadModule 895] Loaded pbpy
    [DEBUG] 2013-09-10 11:20:24,447 [pbpy.smrtpipe.SmrtPipeMain _dynamicallyLoadModule 902] Dynamically loaded module smrtpipe
    [DEBUG] 2013-09-10 11:20:24,447 [pbpy.smrtpipe.SmrtPipeMain _dynamicallyLoadModule 902] Dynamically loaded module modules
    [DEBUG] 2013-09-10 11:20:24,447 [pbpy.smrtpipe.SmrtPipeMain _dynamicallyLoadModule 902] Dynamically loaded module P_Fetch
    [INFO] 2013-09-10 11:20:24,447 [pbpy.smrtpipe.SmrtPipeMain _dynamicallyLoadModule 913] Dynamically Loaded P_Fetch module
    [DEBUG] 2013-09-10 11:20:24,447 [pbpy.smrtpipe.modules.P_Module setting 76] P_Fetch v1.1.120456 looking for setting p_fetch.getChemistry
    [DEBUG] 2013-09-10 11:20:24,447 [pbpy.smrtpipe.modules.P_Module setting 83] Found p_fetch.getChemistry with True for P_Fetch v1.1.120456 settings.
    [DEBUG] 2013-09-10 11:20:24,447 [pbpy.smrtpipe.modules.P_Module setting 76] P_Fetch v1.1.120456 looking for setting p_fetch.overviewRpt
    [DEBUG] 2013-09-10 11:20:24,447 [pbpy.smrtpipe.modules.P_Module setting 83] Found p_fetch.overviewRpt with True for P_Fetch v1.1.120456 settings.
    [DEBUG] 2013-09-10 11:20:24,448 [pbpy.smrtpipe.modules.P_Module setting 76] P_Fetch v1.1.120456 looking for setting p_fetch.toFofn
    [DEBUG] 2013-09-10 11:20:24,448 [pbpy.smrtpipe.modules.P_Module setting 83] Found p_fetch.toFofn with True for P_Fetch v1.1.120456 settings.
    [DEBUG] 2013-09-10 11:20:24,448 [pbpy.smrtpipe.modules.P_Fetch validateSettings 69] Reordered inputFiles to:
    [ERROR] 2013-09-10 11:20:24,448 [pbpy.smrtpipe.modules.P_Fetch validateSettings 74] P_Fetch failed with errors No input bas.h5 files found. Analysis cannot be completed.
    [ERROR] 2013-09-10 11:20:24,448 [pbpy.smrtpipe.SmrtPipeMain _loadModules 198] No input bas.h5 files found. Analysis cannot be completed.
    [INFO] 2013-09-10 11:20:24,448 [pbpy.smrtpipe.SmrtPipeMain exit 780] Beginning exit process
    [INFO] 2013-09-10 11:20:24,448 [pbpy.smrtpipe.SmrtPipeMain shutdown 800] beginning Shutdown Process
    [INFO] 2013-09-10 11:20:24,448 [pbpy.smrtpipe.SmrtPipeMain _generate_final_report 731] Writing final TOC page
    [DEBUG] 2013-09-10 11:20:24,448 [pbpy.smrtpipe.SmrtReportService merge_from_serialized_datastore 192] Unable to find DataStore /net/fs01/mnt/storage/rawdata/data/data.items.pickle for merging.
    [DEBUG] 2013-09-10 11:20:24,449 [pbpy.smrtpipe.SmrtReportService generateFinalReport 558] jobInfo None dataUrl xml:input.xml paramsFile HGAP_Assembly_Advanced.1.xml
    [INFO] 2013-09-10 11:20:24,449 [pbpy.smrtpipe.SmrtReportService generateFinalReport 593] Skipping JNLP creation because no job id or no cmp.h5
    [DEBUG] 2013-09-10 11:20:24,450 [pbpy.smrtpipe.SmrtReportService _generateIndexHtml 537] Writing Report XML to /net/fs01/mnt/storage/rawdata/toc.xml
    [DEBUG] 2013-09-10 11:20:24,450 [pbpy.smrtpipe.SmrtReportService _generateIndexHtml 546] Generating html file from xsl /opt/smrtanalysis/analysis/etc/xsl/toc.xsl
    [DEBUG] 2013-09-10 11:20:24,454 [pbpy.smrtpipe.SmrtWorker backticks 63] Running on n003.genomics.pssclabs.com with cmd saxonb9 -xsl:/opt/smrtanalysis/analysis/etc/xsl/toc.xsl -s:/net/fs01/mnt/storage/rawdata/toc.xml -o:/net/fs01/mnt/storage/rawdata/index.html
    [DEBUG] 2013-09-10 11:20:24,929 [pbpy.smrtpipe.SmrtWorker backticks 87] Successful output (Return code = 0) in 0.48 sec (0.01 min) of saxonb9 -xsl:/opt/smrtanalysis/analysis/etc/xsl/toc.xsl -s:/net/fs01/mnt/storage/rawdata/toc.xml -o:/net/fs01/mnt/storage/rawdata/index.html
    [DEBUG] 2013-09-10 11:20:24,930 [pbpy.smrtpipe.SmrtPipeContext exe 449] Output = []
    [DEBUG] 2013-09-10 11:20:24,930 [pbpy.smrtpipe.SmrtPipeContext exe 451] Error Code = 0
    [DEBUG] 2013-09-10 11:20:24,930 [pbpy.smrtpipe.SmrtPipeContext exe 452] Error Message = 
    [DEBUG] 2013-09-10 11:20:24,930 [pbpy.smrtpipe.SmrtReportService _toRdf 314] Adding Auxiliary Files to RDF.
    [DEBUG] 2013-09-10 11:20:24,930 [pbpy.smrtpipe.SmrtReportService _toRdf 355] No barcodeFasta to add to ReportService
    [DEBUG] 2013-09-10 11:20:24,931 [pbpy.smrtpipe.SmrtReportService generateFinalReport 611] Writing Metadata XML to /net/fs01/mnt/storage/rawdata/metadata.rdf
    [INFO] 2013-09-10 11:20:24,932 [pbpy.smrtpipe.SmrtPipeMain shutdown 835] Completed shutdown gracefully
    [CRITICAL] 2013-09-10 11:20:24,932 [pbpy.smrtpipe.SmrtPipeMain exit 789] hard exiting smrtpipe v1.58.120456 with returncode -1
    [DEBUG] 2013-09-10 11:20:24,933 [pbpy.smrtpipe.SmrtPipeMain _releaseLock 417] -- Released lock (/net/fs01/mnt/storage/rawdata/smrt.lock) for this output directory.
    [DEBUG] 2013-09-10 11:20:24,934 [pbpy.smrtpipe.SmrtPipeMain run 688] Shutting down heartbeat service
    [INFO] 2013-09-10 11:21:23,991 [pbpy.smrtpipe.Heartbeat _startHeartbeat 65] Heartbeat attempting to stop
    The protocol I use is the 'HGAP_Assembly_Advanced.1.xml' with the fastQ file as input. I've also used the 'RS_HGAP_Assembly.1.xml as input, but that did cause an error too.

    Does anyone know if I should change a parameter in 'HGAP_Assembly_Advanced.1.xml' protocol in order to use fastQ files as input for HGAP?


    Regards and thanks in advance,
    Boetsie

  • #2
    "Rhall" from PacBio participates in this forum and will likely notice this query but you should open a ticket with PacBio tech support in case you have not done so.

    Comment


    • #3
      What is the use case? Given the difficulty in barcoding long, none ccs reads, the usefulness of HGAP for barcoded reads is limited? HGAP in SMRT Analysis requires bas.h5 / bax.h5 input, the rich quality information is needed. It is possible to filter for specific reads - originally implemented for removing contamination:
      GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

      It is also technically possible to use custom region tables for trimming the bas.h5, but unless you are very comfortable with h5 files this may be difficult.
      Outside of SMRT Analysis, the development toolkit for HGAP allows much more freedom, and will work from fasta files, but it can be difficult to use.
      GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

      Comment


      • #4
        The goal was to assemble plasmid sequences. Multiple plasmid-pools were multiplexed on a single SMRT cell. After demultiplexing, we had several fastQ files, each belonging to a different plasmid-pool.

        For those interested, the fastQ files could not be assembled with the HGAP software, since HGAP requires the bas.h5 files. Therefore, we had to use the original bas.h5 where all pooled plasmids are stored (and thus not demultiplexed). We had to provide the HGAP protocol a set of read-names (so-called whitelist) to indicate which reads to filter out. See this page about whitelists;
        Code:
        https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP-Whitelisting-Tutorial
        We had four samples multiplexed on this SMRTcell, so for each sample we made a whitelist containing the read-names of the other three samples. To get the read-names out of the fastQ files, we did;

        Code:
        awk -F '/' '/^@/ {print substr($1,2)"/"$2}' <FASTQ-file>| uniq > whitelist.txt
        Next, we added the whitelist to the HGAP protocol (XML);

        Code:
        <param name="whiteList" label="Minimum Subread Length">
                    <value>/PATH-TO/whitelist.txt</value>
                    </param>
        So, this will filter out the reads belonging to the different samples, but still the barcode of the to-be-assembled sample is present. Therefore, we added an extra trimming for the barcodes (within modulestage "assembly"):
        Code:
        <param name="layoutOpts" hidden="false">
              <value> --overlapTolerance 100 --trimHit 50 </value>
        </param>
        These were the steps we had taken (with the help of PacBio support) to get the results. Hope this helps for anyone in the future.

        Regards,
        Boetsie

        Comment


        • #5
          Hi I would like to tie in with part of the question.
          Similarly like you, I discovered that HGAP2 would not take filtered sequences

          One might wonder about the use of such a scenario. But e.g. I want to filter my reads and only assemble the chloroplastic genome. So, now I have to take ALL of my 30 smrtcells and then filter later ?? --> not very smart, but hmm...

          My problem is in particular with the whitelist. Whenever I launch it, I get errors and it wont succeed. In the end I followed the tutorial step by step, but was still unlucky.

          @boetsie: Could you please give me a little "head" of how your data looked like ? I am desperately looking for my mistake.

          Error:
          Code:
           smrtpipe.py --params=HGAP2.xml xml:input.xml
          Traceback (most recent call last):
            File "/software/UHTS/PacBio/SMRTPipe/2.0.1/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/smrtpipe/engine/SmrtPipeWorkflow.py", line 609, in execute
              raise WorkflowError(task.error)
          WorkflowError: task://jobId/moduleName/subreadRpt Failed
          Whitelist:
          Code:
          m130905_150152_42182_c100546852550000001823086111241313_s1_p0/0
          m130905_150152_42182_c100546852550000001823086111241313_s1_p0/1
          m130905_150152_42182_c100546852550000001823086111241313_s1_p0/2
          m130905_150152_42182_c100546852550000001823086111241313_s1_p0/3
          Snippet of settings:
          Code:
              <moduleStage name="filtering" editable="true">
                  <module label="PreAssembler Filter v1" id="P_Filter" editableInJob="true">
                      <description>Filter reads for use in the pre-assembly step of HGAP, the hierarchical genome assembly process.</description>
                      <param name="minSubReadLength" label="Minimum Subread Length">
                          <title>The minimum subread length. Shorter subreads will be filtered and excluded from further analysis.</title>
                          <value>500</value>
                          <input type="text" size="3"/>
                          <rule type="number" min="0.0" message="Value must be a positive integer"/>
                      </param>
                      <param name="readScore" label="Minimum Polymerase Read Quality">
                          <title>The minimum polymerase read quality determines the quality cutoff. Polymerase reads with lower quality will be filtered and excluded from further analysis.</title>
                          <value>0.80</value>
                          <input type="text" size="3"/>
                          <rule type="number" min="0.0" message="Value must be between 0 and 1" max="1.0"/>
                      </param>
                      <param name="minLength" label="Minimum Polymerase Read Length">
                          <title>The minimum polymerase read length. Shorter polymerase reads will be excluded from further analysis.</title>
                          <value>500</value>
                          <input type="text" size="3"/>
                          <rule type="number" min="0.0" message="Value must be a positive integer"/>
                      </param>
                          <param name="whiteList" label="Minimum Subread Length">
              <value>/local/scratch/eschmid/HGAP_chloro/whitelist.txt</value>
          </param>
                  </module>
                  <module label="PreAssemblerSFilter Reports v1" id="P_FilterReports" editableInJob="false"/>
              </moduleStage>
          Remark: If I remove my whitelist, everything works well

          Comment


          • #6
            This is an obvious question but I will still ask. Is "/local/scratch/eschmid/HGAP_chloro/whitelist.txt" readable by "smrtanalysis" user that runs the pipeline?

            Comment


            • #7
              Hi GenoMax
              Thanks for the reply.
              Yes, since I am running as a local user (eschmid) the pipeline - not in the smartportal but terminal mode. It is actually the current working directory in which I put all the necessary data.

              Nevertheless I changed the access of the file to "everybody", to avoid it might complicate things within the pipeline - no change
              Last edited by ebioman; 12-11-2013, 06:12 AM.

              Comment


              • #8
                Hi ebioman,

                My whitelist looks like:
                HTML Code:
                m130906_063502_42179_c100578172550000001823093601191416_s1_p0/513
                m130906_063502_42179_c100578172550000001823093601191416_s1_p0/1219
                m130906_063502_42179_c100578172550000001823093601191416_s1_p0/2063
                m130906_063502_42179_c100578172550000001823093601191416_s1_p0/2266
                m130906_063502_42179_c100578172550000001823093601191416_s1_p0/2572
                m130906_063502_42179_c100578172550000001823093601191416_s1_p0/3088
                I used the attached .xml (I had to rename it to .txt b/c seqanswers could not upload XML files). The command I used is;

                HTML Code:
                smrtpipe.py --params=HGAP_Assembly.txt xml:input.xml
                Input.xml looks something like;

                HTML Code:
                <?xml version="1.0"?>
                <pacbioAnalysisInputs>
                  <dataReferences>
                    <url ref="run:0000000-0002"><location>/home/boetsie/files/m130906.1.bax.h5</location></url>
                    <url ref="run:0000000-0002"><location>/home/boetsie/files/m130906.2.bax.h5</location></url>
                    <url ref="run:0000000-0002"><location>/home/boetsie/files/m130906.3.bax.h5</location></url>
                  </dataReferences>
                </pacbioAnalysisInputs>
                Hope this helps.
                Boetsie
                Attached Files

                Comment


                • #9
                  Thanks a lot boetsie
                  Indeed your whitelist looks very similar, as does the config and start-command.

                  I tried your config file which worked, but stopped at the same position and gave me again an error.
                  I think there might be really some problem with my whitelist - maybe it does not leave enough long reads or something similar. I will try to move from my minimal example to a bigger set to see whether it is a scaling problem.

                  Comment


                  • #10
                    It does not look like the filtering is failing, but rather that the filtering step does not result in any data.
                    task://jobId/moduleName/subreadRpt Failed
                    Is post whitelisting and filtering.
                    Does ./data/filtered_subreads.fasta file have any contents?
                    You can check specific errors in the ./log/P_Filter directory, subreadRpt.log should have a more verbose error.

                    Comment


                    • #11
                      Hello rhall
                      Indeed I did not realize before that the filtered_subreads.fasta file was empty.

                      It is strange I broke down everything to a more minimal style

                      Code:
                      <?xml version="1.0"?>
                      <pacbioAnalysisInputs>
                        <dataReferences>
                          <url ref="run:0000000-0000"><location>/home/eschmid/Napoleom/raw/PacBio/0xJIqqkV_m130905_172422_42182_c100546852550000001823086111241314_s1_p0.2.bax.h5</location></url>
                        </dataReferences>
                      </pacbioAnalysisInputs>
                      So, only one cell and the whitelist with only one entry from this cell:

                      Code:
                      m130905_172422_42182_c100546852550000001823086111241314_s1_p0/100233
                      Then I checked the filtered_subreads_summary.csv for the particular ZMV and it is present:

                      Code:
                      m130905_172422_42182_c100546852550000001823086111241314_s1_p0,100233,17452,20647,3195,0
                      m130905_172422_42182_c100546852550000001823086111241314_s1_p0,100258,1538,2287,749,0
                      m130905_172422_42182_c100546852550000001823086111241314_s1_p0,100258,2329,4161,1832,0
                      same for filtered_summary.csv

                      Code:
                      0xJIqqkV_m130905_172422_42182_c100546852550000001823086111241314_s1_p0,0xJIqqkV_m130905_172422_42182_c100546852550000001823086111241314_s1_p0/100233,20658,3195,0.7850,1,1,0,0
                      0xJIqqkV_m130905_172422_42182_c100546852550000001823086111241314_s1_p0,0xJIqqkV_m130905_172422_42182_c100546852550000001823086111241314_s1_p0/100234,863,0,0.0000,1,0,0,0
                      But then the log file for the filter Report (/log/P_FilterReports/subreadRPT.log) gives an error (see as well attachment):

                      Code:
                      echo 'Finished on' `date -u`;
                      # Success
                      exit 0
                      Running task://Anonymous/P_FilterReports/subreadRpt on Linux dgmserv01.vital-it.ch 2.6.18-348.16.1.el5 #1 SMP Wed Aug 21 04:00:25 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
                      Started on Thu Dec 12 08:34:03 UTC 2013
                      Validating existence of Input Files
                      Successfully found /local/scratch/eschmid/HGAP_chloro/data/filtered_subread_summary.csv
                      Successfully validated input files
                      Traceback (most recent call last):
                        File "/software/UHTS/PacBio/SMRTPipe/2.0.1/analysis/bin/makeFilterSubreadReport.py", line 5, in <module>
                          pkg_resources.run_script('pbpy==0.1', 'makeFilterSubreadReport.py')
                        File "/software/UHTS/PacBio/SMRTPipe/2.0.1/redist/python2.7/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg/pkg_resources.py", line 489, in run_script
                          keys.append(dist.key)
                        File "/software/UHTS/PacBio/SMRTPipe/2.0.1/redist/python2.7/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg/pkg_resources.py", line 1207, in run_script
                          def __init__(self,module):
                        File "/software/UHTS/PacBio/SMRTPipe/2.0.1/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/makeFilterSubreadReport.py", line 205, in <module>
                          sys.exit(main(sys.argv))
                        File "/software/UHTS/PacBio/SMRTPipe/2.0.1/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/makeFilterSubreadReport.py", line 195, in main
                          state = app.run()
                        File "/software/UHTS/PacBio/SMRTPipe/2.0.1/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/makeFilterSubreadReport.py", line 126, in run
                          dpi=self.dpi, useSdf=useSdf)
                        File "/software/UHTS/PacBio/SMRTPipe/2.0.1/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/makeFilterSubreadReport.py", line 163, in _makeHistogram
                          useSdf=useSdf, useMegabases=True)
                        File "/software/UHTS/PacBio/SMRTPipe/2.0.1/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/plot/PlotHelpers.py", line 460, in makeHistogramWithCumulativePngThemeLpr
                          hy, hx, hp = ax.hist(data, bins=60, ec=edgeColor, fc=barcolor)
                        File "/software/UHTS/PacBio/SMRTPipe/2.0.1/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/axes.py", line 7582, in hist
                      subreadRpt.log
                      I really have difficulties to break down the problem here

                      Comment


                      • #12
                        If you look at the filtered_subreads.csv file, you need at least one entry with 1,1,1 in the final three columns. If a ZMW is included in the white list it does not necessarily produce sequence, if no sequence is produced the report fails. The last three columns of the filtered_subreads.csv indicate:
                        1. If the ZMW is a sequencing ZMW - A set number of ZMWs on the chip are non sequencing
                        2. The productivity of the ZMW (loading) - 0 = no polymerase, 1 = correctly loaded, 2 = multi loaded.
                        3. If the ZMW passed filter - This included the whitelist, RQ, subread and readlength filters.

                        Comment


                        • #13
                          Thanks for clearing a little bit the fog

                          I went through the list and found a few reads which hat 0,0,0 but most of them were either
                          1,1,0 or 1,2,0 (similar for the ones in my example). What bothers me is, that I have no clue why he does not take my whitelist correctly - or, what I thought would be correct.

                          Since he succeeds if I remove the whitelist, my guess would be that the last ZMV pass filter doesnt fail due to any normal filter but my whitelist.

                          Comment


                          • #14
                            Have you tired whitelisting all the ZMWs? 0-150000

                            Comment


                            • #15
                              I think I get slowly closer to the underlying problem rhall.
                              I checked for the particular smartcell how many ZMWs were present:

                              Code:
                              h5ls -r XWLUnE4W_m130905_172422_42182_c100546852550000001823086111241314_s1_p0.3.bax.h5
                              
                              /PulseData/BaseCalls/ZMW/HoleNumber Dataset {54494/Inf}
                              Then I generated the whitelist:

                              Code:
                              for i in {0..54494} ; do echo m130905_172422_42182_c100546852550000001823086111241314_s1_p0/$i >> whitelist.txt; done
                              Which resulted in a format such as the one from @boetsie:

                              Code:
                              m130905_172422_42182_c100546852550000001823086111241314_s1_p0/0
                              m130905_172422_42182_c100546852550000001823086111241314_s1_p0/1
                              m130905_172422_42182_c100546852550000001823086111241314_s1_p0/2
                              m130905_172422_42182_c100546852550000001823086111241314_s1_p0/3
                              And again it fails with the same error:
                              WorkflowError: task://jobId/moduleName/subreadRpt Failed

                              I tried both, my protocol and the one from @boetsie which both succeed if I remove the whitelist.

                              The problem is reflected as well in the filtered_summary.csv. If I run it without the whitelist,
                              most reads get 1,1,1 whereas here again I got 1,1,0.

                              I have the impression that there is somehow a problem with the format of the whitelist or the position of the "module" in the protocol (which I would dismiss as well since @boetsies protocol produced the same error).
                              Attached Files

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X