Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    The algorithms perform well with the files of small sizes in my environment. The problem is that the file size is reported as zero. If the file size is zero, following error correction does not work as expected. If the path does not exist due to typos in the path, it reports zero as file size. However, I do not think that you have made a mistake in typing the path. I will look into it after my interview on Thursday.
    Thanks so much, Mr. Bushnell for your rigorous testing.
    Last edited by abysslover; 06-23-2015, 12:02 PM.

    Comment


    • #17
      I have tested with following commands:
      wget https://s3.amazonaws.com/public.ged....-trim.fastq.gz
      gunzip ecoli_ref-5m-trim.fastq.gz
      echo ecoli_ref-5m-trim.fastq > ecoli_list && trowel2 -f ecoli_list

      Code:
      [BlockReader.set_k] K-mer size: 11
      [BlockReader.calculate_all_file_sizes] Total: 802243778 bytes
      [BlockReader.calculate_all_file_sizes] Ends at 2015-06-25.20:06:25, Real: 0.005 sec, CPU: 0.004 sec, RSS: 0.001(GB)/212.872(GB), CPU: 64
      [BlockReader.create_all_blocks] starts at 2015-06-25.20:06:25
      [BlockReader.create_all_blocks] Ends at 2015-06-25.20:06:25, Real: 0.004 sec, CPU: 0.008 sec, RSS: 0.001(GB)/212.868(GB), CPU: 64
      [BlockReader.count_all_n_reads] starts at 2015-06-25.20:06:25
      [BlockReader.count_n_reads_to_target] Blocks: 11(4766908)
      [BlockReader.count_n_reads_to_target] Ends at 2015-06-25.20:06:25, Real: 0.573 sec, CPU: 2.776 sec, RSS: 0.001(GB)/211.994(GB), CPU: 64
      [BlockReader.count_all_n_reads] Ends at 2015-06-25.20:06:25, Real: 0.574 sec, CPU: 2.776 sec, RSS: 0.001(GB)/211.993(GB), CPU: 64
      [BlockReader.create_all_read_ids] starts at 2015-06-25.20:06:25
      [BlockReader.create_read_ids_to_target_alt] /ebio/abt6_projects8/crosscut_elim/tmp/programs/_castle/ecoli_ref-5m-trim.fastq
      [BlockReader.create_read_ids_to_target_alt] Ends at 2015-06-25.20:06:29, Real: 3.559 sec, CPU: 3.428 sec, RSS: 0.034(GB)/211.378(GB), CPU: 64
      [BlockReader.create_all_read_ids] Ends at 2015-06-25.20:06:29, Real: 3.566 sec, CPU: 3.432 sec, RSS: 0.001(GB)/211.411(GB), CPU: 64
      [BlockReader.compute_premers_all] starts at 2015-06-25.20:06:29
      [BlockReader.compute_premers_alt] Files: 1023
      [BlockReader.generate_premer_array_all] starts at 2015-06-25.20:06:29
      [BlockReader.find_premers_in_blocks_direct_alt] Ends at 2015-06-25.20:06:35, Real: 6.174 sec, CPU: 26.452 sec, RSS: 0.037(GB)/212.952(GB), CPU: 64
      [BlockReader.generate_premer_array_all] # Premers: 90882
      [BlockReader.calculate_premers_cluster_size_in_blocks_direct] starts at 2015-06-25.20:06:36
      [BlockReader.calculate_premers_cluster_size_in_blocks_direct] Ends at 2015-06-25.20:06:40, Real: 4.262 sec, CPU: 18.272 sec, RSS: 0.042(GB)/211.664(GB), CPU: 64
      [BlockReader.generate_premer_array_all] Final Premers: 85735
      [BlockReader.generate_premer_array_all] Est. Cluster Size: 821016, Total Cluster Size: 799903631
      [BlockReader.generate_premer_array_all] Ends at 2015-06-25.20:06:40, Real: 10.540 sec, CPU: 45.296 sec, RSS: 0.023(GB)/211.671(GB), CPU: 64
      [BlockReader.compute_premers_alt] # Premers: array: 85735, inverted: 85735, count: 90883
      [BlockReader.create_premer_blocks_given_file_id] starts at 2015-06-25.20:06:40
      [BlockReader.create_premer_blocks_given_file_id] Ends at 2015-06-25.20:06:50, Real: 9.763 sec, CPU: 33.408 sec, RSS: 0.131(GB)/211.090(GB), CPU: 64
      [BlockReader.correct_errors_only] starts at 2015-06-25.20:06:50
      [BlockReader.remove_bwts] starts at 2015-06-25.20:07:03
      [BlockReader.remove_bwts] Ends at 2015-06-25.20:07:04, Real: 0.775 sec, CPU: 0.360 sec, RSS: 0.657(GB)/210.064(GB), CPU: 64
      [BlockReader.correct_errors_only] Ends at 2015-06-25.20:07:04, Real: 14.227 sec, CPU: 516.712 sec, RSS: 0.657(GB)/210.064(GB), CPU: 64
      [BlockReader.revert_to_reads_all] starts at 2015-06-25.20:07:04
      [BlockReader.split_premer_clusters] starts at 2015-06-25.20:07:04
      [BlockReader.split_premer_clusters] Ends at 2015-06-25.20:07:07, Real: 2.744 sec, CPU: 34.028 sec, RSS: 0.657(GB)/210.216(GB), CPU: 64
      [BlockReader.combine_segmented_clusters_alt] starts at 2015-06-25.20:07:07
      [BlockReader.split_reads_by_interval] starts at 2015-06-25.20:07:07
      [BlockReader.split_reads_by_interval] Ends at 2015-06-25.20:07:10, Real: 3.388 sec, CPU: 63.416 sec, RSS: 0.658(GB)/210.095(GB), CPU: 64
      [BlockReader.sort_reads_by_interval] starts at 2015-06-25.20:07:10
      [BlockReader.sort_reads_by_interval] Ends at 2015-06-25.20:07:12, Real: 2.100 sec, CPU: 57.216 sec, RSS: 0.658(GB)/210.557(GB), CPU: 64
      [BlockReader.merge_reads_by_interval] starts at 2015-06-25.20:07:12
      [BlockReader.reassign_original_read_ids] starts at 2015-06-25.20:07:12
      [BlockReader.reassign_original_read_ids] Ends at 2015-06-25.20:07:15, Real: 2.553 sec, CPU: 26.496 sec, RSS: 0.658(GB)/210.102(GB), CPU: 64
      [BlockReader.merge_reads_by_interval] Ends at 2015-06-25.20:07:18, Real: 5.877 sec, CPU: 28.468 sec, RSS: 0.658(GB)/209.249(GB), CPU: 64
      [BlockReader.combine_segmented_clusters_alt] Ends at 2015-06-25.20:07:18, Real: 11.366 sec, CPU: 149.104 sec, RSS: 0.658(GB)/209.249(GB), CPU: 64
      [BlockReader.revert_to_reads_all] Ends at 2015-06-25.20:07:18, Real: 14.111 sec, CPU: 183.132 sec, RSS: 0.658(GB)/209.249(GB), CPU: 64
      [BlockReader.remove_temporary_files] starts at 2015-06-25.20:07:18
      [BlockReader.remove_bwts] starts at 2015-06-25.20:07:18
      [BlockReader.remove_bwts] Ends at 2015-06-25.20:07:18, Real: 0.360 sec, CPU: 0.140 sec, RSS: 0.658(GB)/209.247(GB), CPU: 64
      [BlockReader.remove_temporary_files] Ends at 2015-06-25.20:07:20, Real: 1.700 sec, CPU: 0.924 sec, RSS: 0.658(GB)/211.571(GB), CPU: 64
      [Correction.Main] Ends at 2015-06-25.20:07:20, Real: 54.782 sec, CPU: 785.772 sec, RSS: 0.658(GB)/211.571(GB), CPU: 64

      Comment


      • #18
        Hi,

        I did test trowel in first step on some PE and LMP "raw" fastq files of a bird genome.
        It runs quite fast (~5h on 32 CPUs with 2x225mio Reads of a non-deduplicated mate-pair -lib).

        I miss some kind of statistics / report after error correction.
        Maybe an additional tag in the fastq sequence headers (optional).
        So currently I have no idea to what extent my library has been corrected..

        The resulting fastq files contain Phred+64 encoded quality values. Why?
        (Though I think it is the better encoding). Illumina data itself is now Phred+33 (sanger) encoded.
        I am not sure if the common downstream tools all can deal with Phred+64 encoded fastq files.

        best,
        Sven
        Last edited by sklages; 07-27-2015, 12:34 AM.

        Comment


        • #19
          Hi, thanks for testing.

          I agree with the first point that some error correction statistics would be beneficial to revert some aggressive error corrected reads in the final stage. An aggressive error correction means that the number of error corrected bases in a read is unusually higher than known error rates for Illumina sequencer. However, I am currently little busy due to the upcoming dead line of an international conference. I will try to add the feature afterward.
          Trowel 2 changes all encoding schemes to Phred+64 in order to adjust differences in quality standards among samples. For instance, my testing datasets contain Phred+64 encoded library and Phred+32 encoded one. I simply selected Phred+64 as a representative quality score. I think that changing the quality standard is not a big deal for post-analysis. Simply by subtracting each quality value by 31 we can get Phred+33 standard.
          I think that most well-known bioinformatics tools can handle both standards without many troubles. In addition, I prefer to preserve the number of options as minimal as possible to keep easy-to-use strategy. The choice of quality standard actually has no effects on the accuracy of error correction. Thus, I may change the encoding scheme to Phred+33 but I cannot promise.

          Regards,
          Euncheon

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 08:47 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X