Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Get the original read IDs from corrected reads by AllPath-LG error correction module

    Hello SEQanswers Community,

    I am a newbie to develop bioinformatics tools although I have many experiences in computer and electrical engineering.

    I've developed an error corrector for the Illumina sequencing data. For the evaluation of the module, the read IDs are mandatory. I've generated several corrected DNA sequences with Quake, SOAPec, Musket, Coral, Hybrid SHREC, and AllPath-LG EC(Error corrector).

    The problem is that AllPath-LG EC and Hybrid SHREC changes the read IDs. I only need the original read IDs corrected by AllPath-LG EC.

    In the output folder, there are following files:
    1. Input Files: uncorrected_1.fastq, uncorrected_2.fastq
    2. Output Files: uncorrected.paired.A.fastq, uncorrected.paired.B.fastq, uncorrected.unpaired.fastq, uncorrected.fastq, uncorrected.fastq.ids

    The content of uncorrected.fastq.ids file is looks like:
    #New_ID, Original_ID
    0, 0
    1, 1
    2, 2
    3, 3
    4, 6
    5, 7
    6, 8
    7, 9
    8, 10
    9, 12
    10, 13
    ...

    Unfortunately, there is no file such as 'uncorrected.paired.A.fastq.ids' and 'uncorrected.paired.B.fastq.ids'

    Do you know how to get the original read IDs with these files?

    Bests,
    Euncheon
    Last edited by abysslover; 07-14-2013, 04:08 AM.

  • #2
    Hello,
    Can anybody find solution for this error of ALLPATHLG.
    Dump of stack:

    0. CRD::exit(int), in Exit.cc:49
    1. our_new_handler(), in RunTime.cc:577
    2. __gnu_cxx::new_allocator<BaseVec>::allocate( ... ), in new_allocator.h:89
    3. OuterVec<BaseVec, MempoolOwner<unsigned char>, std::allocator<BaseVec> >::reserve( ... ), in OuterVec.h:212
    4. OuterVec<BaseVec, MempoolOwner<unsigned char>, std::allocator<BaseVec> >::resize( ... ), in OuterVec.h:189

    make: *** [/home/assembly/data/all_paths_lg/sample/wild_chickpea/chickpeawild.genome/data/run/ASSEMBLIES/test/linear_scaffolds0.clean.remodel.applied.tag.fixed.local.assembly.efasta] Error 1

    Sat May 24 22:32:43 2014 : make process finished.

    Sat May 24 22:32:44 2014: Computing runtime statistics.

    e_time(sec) u_time(sec) vmrss(MB) vmsize(MB) module name
    --------------------------------------------------------------------------------
    300 137 18310 18437 ValidateAllPathsInputs
    1308 412 46858 47330 RemoveDodgyReads
    17059 534223 124432 144626 FindErrors
    2430 56101 104897 125823 CleanCorrectedReads
    19645 603300 124126 133523 PathReads
    3146 96977 85915 95543 FillFragments
    8014 21546 106541 149749 CommonPather
    37 26 1818 2511 MakeRcDb
    117 108 4750 4880 Unipather
    12235 45739 95660 135382 CloseUnipathGaps
    1667 10524 81894 118501 ShaveUnipathGraph
    12 1 2047 2159 ReplacePairsStats
    6470 1806 125357 134834 RemoveDodgyReads
    621 565 43304 46439 SamplePairedReadStats
    9644 41045 121759 139476 UnipathPatcher
    14295 27455 124924 186173 CommonPather
    9 6 462 575 MakeRcDb
    62 55 1780 1905 Unipather
    70 38 2233 2578 FilterPrunedReads
    406 359 13476 15942 CreateLookupTab
    6724 180176 52146 62651 ErrorCorrectJump
    36 19 1505 1644 SplitUnibases
    102 47 1172 1758 MergeReadSets
    173 114 6608 8416 MakeRcDb
    9694 18331 76791 104845 UnibaseCopyNumber3
    301 2058 11064 19734 UnipathLocsLG
    136 230 5352 12347 SamplePairedReadDistributions
    709 453 15791 15904 BuildUnipathLinkGraphsLG
    11151 2703 15678 23531 SelectSeeds
    50182 202553 25415 32482 LocalizeReadsLG
    13070 8445 80776 111905 MergeNeighborhoods1
    53 238 3648 18715 MergeNeighborhoods2
    259 253 7211 9207 MergeNeighborhoods3
    590 2796 13674 30373 RecoverUnipaths
    33 31 494 665 FlattenHKP
    168 1497 31228 38936 AlignPairsToFasta
    1171 7802 47162 67112 RemoveHighCNAligns
    1222 1602 12016 19741 MakeScaffoldsLG
    673 2261 9383 17301 CleanAssembly
    3057 124226 18239 25946 RemodelGaps
    4844 1385 50618 59228 PostPatcher
    182 176 1384 1496 ApplyGapPatches
    3531 96877 41758 68465 AlignReads
    239 796 6935 14621 TagCircularScaffolds
    1231 656 40623 49270 FixPrecompute
    2544 22519 64569 72119 FixSomeIndels
    65 61 1196 1316 ApplyAssemblyEdits
    4434 89029 41544 68317 AlignReads
    2982 85867 18353 26065 FixLocal
    --------------------------------------------------------------------------------
    217122 2293644 125357 186173 Total/Peak 49 modules

    Sat May 24 22:32:46 2014: Compiling assembly report.

    ------------------ FindErrors -> frag_reads_edit.fastb

    251943192 total number of original fragment reads
    101.0 mean length of original fragment reads in bases
    35.1 % gc content of fragment reads
    0.0 % of bases pre-corrected
    870830049 estimated genome size in bases
    66.0 % genome estimated to be repetitive (at K=25 scale)
    22 estimated genome coverage by fragment reads
    0.37 estimated standard deviation of sequencing bias (at K=25 scale)
    61.6 % of bases confirmed in cycle 0
    0.08 % of bases corrected in cycle 0
    0.00 % of bases with conflicting corrections in cycle 0
    62.5 % of bases confirmed in cycle 1
    0.05 % of bases corrected in cycle 1
    0.00 % of bases with conflicting corrections in cycle 1

    ------------------ CleanCorrectedReads -> frag_reads_corr.25mer.kspec

    1.3 % of reads removed because of low frequency kmers

    ------------------ FillFragments -> filled_reads.fastb

    14.4 % of fragment pairs that were filled

    ------------------ SamplePairedReadStats -> jump_reads_filt.outies

    Paired Read Separation Stats:
    Lib OrigSep NewSep NewDev 3sigma% %NonJumps %ReadsAlgnd
    MP_10Kb 9798 8335 1179 68 0 10
    MP_3Kb 2798 2596 479 100 0 10

    ------------------ ErrorCorrectJump -> jump_reads_ec.fastb

    20.92 % of jump reads pairs that are error corrected

    ------------------ SamplePairedReadDistributions -> jump_reads_ec.distribs

    Libraries statistics tables:

    Table 1: library names, number of pairs (N), original (L0) and new sizes (L)

    --------------------------------------------------------------------------
    id library name num pairs N orig size L0 new size L
    --- --------------------- ------------ ----------------- -----------------
    0 MP_10Kb 4771107 8537 +/- 1179 9166 +/- 2374
    1 MP_3Kb 31237581 2798 +/- 479 2740 +/- 545

    tot total 36008688
    --------------------------------------------------------------------------


    Table 2: fraction of reads in each length interval

    ---------------------------------------------------------------------------
    id <L> L < 0 0-500 500-1k 1k-2k 2k-4k 4k-8k 8k-16k >16k
    --- ----- ------- ------- ------- ------- ------- ------- ------- -------
    0 9166 0.2% 0.2% 0.8% 1.6% 2.8% 13.3% 81.2%
    1 2740 6.4% 92.9% 0.6%
    ---------------------------------------------------------------------------


    Table 3: number of bridging links over a specific gap size

    --------------------------------------------------------------------
    id <L> <= 0 0 1k 2k 3k 4k 6k 8k 12k 16k
    --- ----- ---- ----- ----- ----- ----- ----- ----- ----- ----- -----
    0 9166 0% 132 117 103 89 76 49 23 1
    1 2740 306 195 84 14
    tot 438 312 187 103 76 49 23 1
    --------------------------------------------------------------------

    ------------------ Memory and CPU usage

    48 available cpus
    504.8 GB of total available memory
    14110.2 GB of available disk space
    60.24 hours of total elapsed time
    60.31 hours of total per-module elapsed time
    637.12 hours of total per-module user time
    10.56 effective parallelization factor
    122.42 GB memory usage peak



    Sat May 24 22:32:47 2014 : ALLPATHS-LG Pipeline Finished.

    Run directory: /home/assembly/data/all_paths_lg/sample/wild_chickpea/chickpeawild.genome/data/run
    Log directory: /home/assembly/data/all_paths_lg/sample/wild_chickpea/chickpeawild.genome/make_log/data/run/test/2014-05-22T10:13:27

    *** Make encountered an error, see above for error messages. ***

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    22 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    24 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    20 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    52 views
    0 likes
    Last Post seqadmin  
    Working...
    X