Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predict Haplo 1.0 Issues

    Hi everyone,

    I have a question regarding the haplotype reconstruction algorithm PredictHaplo-1.0 (http://bmda.cs.unibas.ch/HivHaploTyper/)


    I've been using this tool to run Roche 454 RLX data and it works well. However, when I switched to Illumina TruSeq, I keep getting the below segmentation fault errors.

    After parsing the reads in file /home/DataFiles/PredictHaplo_Files/087.sam: average read length= -nan
    First read considered in the analysis starts at position 100000. Last read ends at position 0
    There are 0 reads
    Average read length: -nan
    Local window size: -2147483648
    min_overlap : -2083059138
    Reconstruction starts at position 100000 and stops at position 0
    Segmentation fault (core dumped)
    I've had a brief look at the .sam file, but I don't see anything that seems out of the ordinary. What I don't understand is it fails to retrieve average read length when I have 2m+ reads inside the .sam file. I tried sorting and not sort the .sam file to see if it's confused about reads positioning, but I still get the same error message.

    I've tried with another TruSeq data set, and the same error message appears. Might this be a TruSeq thing? I managed to borrow some Nextera XT data to see if the algorithm runs on Illumina data, that set worked.

    I'm very confused, any help would be greatly appreciated.

  • #2
    Did you ever find a solution to this? I am having the same problem for both 454 and Illumina MiSeq data. Thank you!

    Comment


    • #3
      Have you written to the author(s) directly? You probably have a better chance of getting a resolution that way.

      Comment


      • #4
        What is the command you are using?

        Comment


        • #5
          Yes I have contacted the authors, waiting to hear back.

          The command is running the predict haplo executable on the config file:
          PredictHaplo-Paired config.txt

          To give context, I align my reads using the Mosaik aligner. For my 454 data I know the problem can be resolved by using bwa aligner. However, I would like to know how to run PredictHaplo on the sam files produced by Mosaik.

          Thanks for your help!

          Comment


          • #6
            Hi,

            I've emailed the authors directly before, but I have never gotten a reply from them. I emailed them first before I posted the question in this forum.

            In the end, no. Unfortunately I am still as clueless as to why it doesn't work on my set of data. So instead of PredictHaplo, I switched algorithm to use QuasiRecomb (https://github.com/armintoepfer/QuasiRecomb/).

            I wasn't able to understand why the figures reported from the output were like that. Recalling from before:
            After parsing the reads in file /home/DataFiles/PredictHaplo_Files/087.sam: average read length= -nan
            First read considered in the analysis starts at position 100000. Last read ends at position 0
            There are 0 reads
            Apologies for not solving the problem, but I decided I had to move on to something else otherwise I could be stuck for a long time hahahaha .

            If there's anyone that does understand what's happening, I am still very interested in finding out what's happening. >_<

            Comment


            • #7
              Thank you. I am now using QuasiRecomb as well and am having an issue detecting paired reads.

              I run:
              java -jar QuasiRecomb.jar -i alignment.sorted.bam
              and get the following:
              00:01:42 Parsing done
              00:01:42 Start pairing
              00:01:56 End pairing
              00:01:56 Begin sorting
              00:01:57 Finished sorting
              00:01:57 Modifying reads 100%
              00:01:59 Computing entropy 100%
              00:02:00 Allel frequencies 100%
              00:02:00 Alignment entropy 0.082
              00:02:00 Unique reads 330664
              00:02:00 Paired reads 0
              00:02:00 Insert size 146 (±220)
              00:02:00 Merged reads 305158
              When I check properly aligned mate pairs in my alignment I do find properly paired mates:
              samtools flagstat alignment.sorted.bam
              2642674 + 0 in total (QC-passed reads + QC-failed reads)
              0 + 0 duplicates
              1743911 + 0 mapped (65.99%:-nan%)
              2642674 + 0 paired in sequencing
              1321337 + 0 read1
              1321337 + 0 read2
              143036 + 0 properly paired (5.41%:-nan%)
              1679356 + 0 with itself and mate mapped
              64555 + 0 singletons (2.44%:-nan%)
              0 + 0 with mate mapped to a different chr
              0 + 0 with mate mapped to a different chr (mapQ>=5)
              Are you able to use QuasiRecomb to detect paired mates?
              Thanks again.

              Comment


              • #8
                Hello everyone,

                I solved that issue changing the "%min_readlength" in the configuration file. It has 220 by default but my HiSeq Illumina reads only have 100 nt length, so that was the solution.

                Changing this parameter, PredictHaplo worked perfectly.

                Comment


                • #9
                  We had the same issue - looks like it was down to the sam file format - our reads were originally aligned with bowtie2 which gave the PredictHaplo error - but using bwa instead resolved the error

                  Comment


                  • #10
                    Hi everyone,
                    I found this thread after testing Bowtie2 and PredictHaplo.

                    Using BWA I am having similar issues – only a tiny proportion of reads are being recognised by PredictHaplo. In this test case of 2x150 NextSeq viral sequences, only 154 of the 70k mapped reads in this subsampled SAM are detected according to the output (see below). In Tablet everything looks fine with the SAM and the pairings are recognised.

                    I have even tried an older build of BWA to see if un update might have caused the issue. I don't have any strange characters or line endings in my reference sequence, and am at a loss as to what could be causing this issue.

                    Does anyone have any ideas? Has anyone had responses from the authors?

                    bede@ubuntu:~/ph/PredictHaplo-Paired-0.4$ ./PredictHaplo-Paired config_test
                    config_test
                    0 hrv_21_sub_
                    0 % filename of reference sequence (FASTA)
                    1 /home/bede/hrv_21/hrv1b.cns.fa
                    1 % do_visualize (1 = true, 0 = false)
                    2 1
                    2 % filname of the aligned reads (sam format)
                    3 /home/bede/hrv_21/SM_21A_S14.1pc.bwa_old.sam
                    3 % have_true_haplotypes (1 = true, 0 = false)
                    4 1
                    4 % filname of the true haplotypes (MSA in FASTA format) (fill in any dummy filename if there is no "true" haplotypes)
                    5 truehaps.fasta
                    5 % do_local_analysis (1 = true, 0 = false) (must be 1 in the first run)
                    6 1
                    6 % max_reads_in_window;
                    7 10000
                    7 % entropy_threshold
                    8 4e-2
                    8 %reconstruction_start
                    9 9
                    9 %reconstruction_stop
                    10 6950
                    10 %min_mapping_qual
                    11 20
                    11 %min_readlength
                    12 50
                    12 %max_gap_fraction (relative to alignment length)
                    13 0.05
                    13 %min_align_score_fraction (relative to read length)
                    14 0.35
                    14 %alpha_MN_local (prior parameter for multinomial tables over the nucleotides)
                    15 25
                    15 %min_overlap_factor (reads must have an overlap with the local reconstruction window of at least this factor times the window size)
                    16 0.85
                    16 %local_window_size_factor (size of local reconstruction window relative to the median of the read lengths)
                    17 0.7
                    17 % max number of clusters (in the truncated Dirichlet process)
                    18 25
                    18 % MCMC iterations
                    19 501
                    19 % include deletions (0 = no, 1 = yes)
                    20 1
                    20
                    rm: cannot remove ‘hrv_21_sub_*.fas’: No such file or directory
                    rm: cannot remove ‘hrv_21_sub_*.lab’: No such file or directory
                    rm: cannot remove ‘hrv_21_sub_*.reads’: No such file or directory
                    rm: cannot remove ‘hrv_21_sub_*.html’: No such file or directory
                    rm: cannot remove ‘hrv_21_sub_*.pgm’: No such file or directory
                    After parsing the reads in file /home/bede/hrv_21/SM_21A_S14.1pc.bwa_old.sam: average read length= 104.409 154
                    First read considered in the analysis starts at position 9. Last read ends at position 6950
                    There are 154 reads
                    Median of read lengths: 104.500
                    Local window size: 73
                    Minimum overlap of reads to local analysis windows: 62
                    terminate called after throwing an instance of 'std::bad_alloc'
                    what(): std::bad_alloc
                    Aborted (core dumped)
                    Last edited by bede; 03-01-2016, 08:11 AM.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:47 AM
                    0 responses
                    12 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    59 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    54 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X