SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Protein 3D structure predict Thanh Hang Bioinformatics 1 07-07-2014 12:22 AM
Predict bacterial promoter Sabz Bioinformatics 3 06-06-2013 02:24 PM
Predict 5' and 3' UTR's using RNA-seq? James Bioinformatics 2 08-24-2011 06:22 AM
a software to predict whether a sequence is circular dina Bioinformatics 0 09-22-2009 10:41 AM

Reply
 
Thread Tools
Old 10-26-2014, 03:35 PM   #1
NeoneX
Junior Member
 
Location: Australia

Join Date: Oct 2014
Posts: 2
Default Predict Haplo 1.0 Issues

Hi everyone,

I have a question regarding the haplotype reconstruction algorithm PredictHaplo-1.0 (http://bmda.cs.unibas.ch/HivHaploTyper/)


I've been using this tool to run Roche 454 RLX data and it works well. However, when I switched to Illumina TruSeq, I keep getting the below segmentation fault errors.

Quote:
After parsing the reads in file /home/DataFiles/PredictHaplo_Files/087.sam: average read length= -nan
First read considered in the analysis starts at position 100000. Last read ends at position 0
There are 0 reads
Average read length: -nan
Local window size: -2147483648
min_overlap : -2083059138
Reconstruction starts at position 100000 and stops at position 0
Segmentation fault (core dumped)
I've had a brief look at the .sam file, but I don't see anything that seems out of the ordinary. What I don't understand is it fails to retrieve average read length when I have 2m+ reads inside the .sam file. I tried sorting and not sort the .sam file to see if it's confused about reads positioning, but I still get the same error message.

I've tried with another TruSeq data set, and the same error message appears. Might this be a TruSeq thing? I managed to borrow some Nextera XT data to see if the algorithm runs on Illumina data, that set worked.

I'm very confused, any help would be greatly appreciated.
NeoneX is offline   Reply With Quote
Old 02-11-2015, 11:27 AM   #2
viral2143
Junior Member
 
Location: boston MA

Join Date: Feb 2015
Posts: 3
Default

Did you ever find a solution to this? I am having the same problem for both 454 and Illumina MiSeq data. Thank you!
viral2143 is offline   Reply With Quote
Old 02-11-2015, 11:32 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,076
Default

Have you written to the author(s) directly? You probably have a better chance of getting a resolution that way.
GenoMax is offline   Reply With Quote
Old 02-11-2015, 11:59 AM   #4
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

What is the command you are using?
Richard Finney is offline   Reply With Quote
Old 02-11-2015, 12:04 PM   #5
viral2143
Junior Member
 
Location: boston MA

Join Date: Feb 2015
Posts: 3
Default

Yes I have contacted the authors, waiting to hear back.

The command is running the predict haplo executable on the config file:
PredictHaplo-Paired config.txt

To give context, I align my reads using the Mosaik aligner. For my 454 data I know the problem can be resolved by using bwa aligner. However, I would like to know how to run PredictHaplo on the sam files produced by Mosaik.

Thanks for your help!
viral2143 is offline   Reply With Quote
Old 02-11-2015, 02:19 PM   #6
NeoneX
Junior Member
 
Location: Australia

Join Date: Oct 2014
Posts: 2
Default

Hi,

I've emailed the authors directly before, but I have never gotten a reply from them. I emailed them first before I posted the question in this forum.

In the end, no. Unfortunately I am still as clueless as to why it doesn't work on my set of data. So instead of PredictHaplo, I switched algorithm to use QuasiRecomb (https://github.com/armintoepfer/QuasiRecomb/).

I wasn't able to understand why the figures reported from the output were like that. Recalling from before:
Quote:
After parsing the reads in file /home/DataFiles/PredictHaplo_Files/087.sam: average read length= -nan
First read considered in the analysis starts at position 100000. Last read ends at position 0
There are 0 reads
Apologies for not solving the problem, but I decided I had to move on to something else otherwise I could be stuck for a long time hahahaha .

If there's anyone that does understand what's happening, I am still very interested in finding out what's happening. >_<
NeoneX is offline   Reply With Quote
Old 02-18-2015, 10:54 AM   #7
viral2143
Junior Member
 
Location: boston MA

Join Date: Feb 2015
Posts: 3
Default

Thank you. I am now using QuasiRecomb as well and am having an issue detecting paired reads.

I run:
Quote:
java -jar QuasiRecomb.jar -i alignment.sorted.bam
and get the following:
Quote:
00:01:42 Parsing done
00:01:42 Start pairing
00:01:56 End pairing
00:01:56 Begin sorting
00:01:57 Finished sorting
00:01:57 Modifying reads 100%
00:01:59 Computing entropy 100%
00:02:00 Allel frequencies 100%
00:02:00 Alignment entropy 0.082
00:02:00 Unique reads 330664
00:02:00 Paired reads 0
00:02:00 Insert size 146 (220)
00:02:00 Merged reads 305158
When I check properly aligned mate pairs in my alignment I do find properly paired mates:
Quote:
samtools flagstat alignment.sorted.bam
2642674 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1743911 + 0 mapped (65.99%:-nan%)
2642674 + 0 paired in sequencing
1321337 + 0 read1
1321337 + 0 read2
143036 + 0 properly paired (5.41%:-nan%)
1679356 + 0 with itself and mate mapped
64555 + 0 singletons (2.44%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Are you able to use QuasiRecomb to detect paired mates?
Thanks again.
viral2143 is offline   Reply With Quote
Old 03-20-2015, 03:38 AM   #8
Luiky
Junior Member
 
Location: Spain

Join Date: Mar 2015
Posts: 1
Default

Hello everyone,

I solved that issue changing the "%min_readlength" in the configuration file. It has 220 by default but my HiSeq Illumina reads only have 100 nt length, so that was the solution.

Changing this parameter, PredictHaplo worked perfectly.
Luiky is offline   Reply With Quote
Old 02-11-2016, 04:22 AM   #9
rjorton
Junior Member
 
Location: UK

Join Date: Jan 2012
Posts: 2
Default

We had the same issue - looks like it was down to the sam file format - our reads were originally aligned with bowtie2 which gave the PredictHaplo error - but using bwa instead resolved the error
rjorton is offline   Reply With Quote
Old 03-01-2016, 06:48 AM   #10
bede
Junior Member
 
Location: Manchester

Join Date: Mar 2016
Posts: 1
Default

Hi everyone,
I found this thread after testing Bowtie2 and PredictHaplo.

Using BWA I am having similar issues – only a tiny proportion of reads are being recognised by PredictHaplo. In this test case of 2x150 NextSeq viral sequences, only 154 of the 70k mapped reads in this subsampled SAM are detected according to the output (see below). In Tablet everything looks fine with the SAM and the pairings are recognised.

I have even tried an older build of BWA to see if un update might have caused the issue. I don't have any strange characters or line endings in my reference sequence, and am at a loss as to what could be causing this issue.

Does anyone have any ideas? Has anyone had responses from the authors?

Quote:
bede@ubuntu:~/ph/PredictHaplo-Paired-0.4$ ./PredictHaplo-Paired config_test
config_test
0 hrv_21_sub_
0 % filename of reference sequence (FASTA)
1 /home/bede/hrv_21/hrv1b.cns.fa
1 % do_visualize (1 = true, 0 = false)
2 1
2 % filname of the aligned reads (sam format)
3 /home/bede/hrv_21/SM_21A_S14.1pc.bwa_old.sam
3 % have_true_haplotypes (1 = true, 0 = false)
4 1
4 % filname of the true haplotypes (MSA in FASTA format) (fill in any dummy filename if there is no "true" haplotypes)
5 truehaps.fasta
5 % do_local_analysis (1 = true, 0 = false) (must be 1 in the first run)
6 1
6 % max_reads_in_window;
7 10000
7 % entropy_threshold
8 4e-2
8 %reconstruction_start
9 9
9 %reconstruction_stop
10 6950
10 %min_mapping_qual
11 20
11 %min_readlength
12 50
12 %max_gap_fraction (relative to alignment length)
13 0.05
13 %min_align_score_fraction (relative to read length)
14 0.35
14 %alpha_MN_local (prior parameter for multinomial tables over the nucleotides)
15 25
15 %min_overlap_factor (reads must have an overlap with the local reconstruction window of at least this factor times the window size)
16 0.85
16 %local_window_size_factor (size of local reconstruction window relative to the median of the read lengths)
17 0.7
17 % max number of clusters (in the truncated Dirichlet process)
18 25
18 % MCMC iterations
19 501
19 % include deletions (0 = no, 1 = yes)
20 1
20
rm: cannot remove ‘hrv_21_sub_*.fas’: No such file or directory
rm: cannot remove ‘hrv_21_sub_*.lab’: No such file or directory
rm: cannot remove ‘hrv_21_sub_*.reads’: No such file or directory
rm: cannot remove ‘hrv_21_sub_*.html’: No such file or directory
rm: cannot remove ‘hrv_21_sub_*.pgm’: No such file or directory
After parsing the reads in file /home/bede/hrv_21/SM_21A_S14.1pc.bwa_old.sam: average read length= 104.409 154
First read considered in the analysis starts at position 9. Last read ends at position 6950
There are 154 reads
Median of read lengths: 104.500
Local window size: 73
Minimum overlap of reads to local analysis windows: 62
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)

Last edited by bede; 03-01-2016 at 07:11 AM.
bede is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO