SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
MiSeq gDNA reads still fail "Kmer content" and "per base seq content" after trimming" ysnapus Illumina/Solexa 4 11-12-2014 07:25 AM
How to calculate number of unique reads from FastQC "sequence duplication level" nucacidhunter Bioinformatics 4 05-27-2014 08:50 AM
DEXSeq error in estimateDispersions: match.arg(start.method, c("log(y)", "mean")) fpadilla Bioinformatics 14 07-03-2013 02:11 PM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 07:55 AM
Getting Gene-Level "Expression" Value srp33 RNA Sequencing 1 10-08-2010 01:58 PM

Reply
 
Thread Tools
Old 06-24-2016, 01:47 AM   #1
lingling huang
Member
 
Location: changsha

Join Date: Mar 2016
Posts: 46
Default no "combined/ "subdirectory after isoform level clustering by using pbtranscript-tofu

"tofu_wrap.py" can divides input into different size bins, runs clustering on the individual bins and combines them later.The question is that there is no 'combined/' subdirectory in tofu_wrap output result.So,i can't get final output files i want to work with.
The command i used is below:
tofu_wrap.py --nfl_fa isoseq_nfl.fasta --ccs_fofn reads_of_insert.fofn --bas_fofn input.fofn -d clusterOut --quiver --gmap_db /zs32/data-analysis /liucy_group /llhuang/Reflib/gmapdb --gmap_name hg19 isoseq_flnc.fasta final.consensus.fa Because of my sever is high-powered single-node computer, I can't install SGE successfully and therefore don't select parameter '--use_sge'. I don't know whether this is the cause of the problem. By the way, I've already tried to add '--bin_manual' back in my command following Bowhan's advice(thank you),but it still only have no 'combined/' subdirectory in output files. There are "
0to1kb/
1to2kb/
2to3kb/
3to4kb/
4to5kb/
fasta_fofn_files/" in the directory clusterOut.
Moreover, what should i do next if i run "tofu_wrap.py" successfully. I want to obtain the difference of transcripts from the third sequencing data between human and mouse.
Any advice will be appreciated, thank you in advance!
lingling huang is offline   Reply With Quote
Old 06-27-2016, 11:18 AM   #2
bowhan
Member
 
Location: San Francisco

Join Date: Sep 2015
Posts: 27
Default

can you please paste the log? and the content of the output directory (the output of `tree` command for example).
bowhan is offline   Reply With Quote
Old 06-29-2016, 05:49 PM   #3
lingling huang
Member
 
Location: changsha

Join Date: Mar 2016
Posts: 46
Default

Quote:
Originally Posted by bowhan View Post
can you please paste the log? and the content of the output directory (the output of `tree` command for example).
I have tried so many times, and it always presented the following error message:"Segmentation fault (core dumped)". There is no log file .
BUG:
bug.png
"clusterOut "
Cluster out.png
all files:
all files.png
lingling huang is offline   Reply With Quote
Old 06-29-2016, 06:41 PM   #4
bowhan
Member
 
Location: San Francisco

Join Date: Sep 2015
Posts: 27
Default

Quote:
Originally Posted by lingling huang View Post
I have tried so many times, and it always presented the following error message:"Segmentation fault (core dumped)". There is no log file .
BUG:
Attachment 4392
"clusterOut "
Attachment 4390
all files:
Attachment 4391
The job failed because the system call on the `blasr` command failed.
It actually gave an error message complaining that "m151230.../29032/536_60_CCS" is not unique. This is weird if you didn't intervene with the Iso-Seq runs.

Can you please check the # of appearance of this header in your input `isoseq_flnc.fa` file? perhaps with
Code:
grep '/29032/536_60_CCS' isoseq_flnc.fasta
And see how many times it has appeared.

Nonetheless, I am not sure if it has anything to do with the segmentation fault, which is usually caused by memory issue. But let's see if fixing the duplicate fasta entries can make your issue go away.
bowhan is offline   Reply With Quote
Old 06-29-2016, 07:07 PM   #5
lingling huang
Member
 
Location: changsha

Join Date: Mar 2016
Posts: 46
Default

two times it has appeared.
lingling huang is offline   Reply With Quote
Old 06-29-2016, 07:14 PM   #6
bowhan
Member
 
Location: San Francisco

Join Date: Sep 2015
Posts: 27
Default

Quote:
Originally Posted by lingling huang View Post
two times it has appeared.
can you please check how many of them are duplicated? perhaps with
Code:
awk '/>/{++a[$1]}END{for(b in a) if(a[b]>1) printf "%s\t%d\n", b, a[b]}'  isoseq_flnc.fa
Thanks
bowhan is offline   Reply With Quote
Old 06-29-2016, 07:22 PM   #7
lingling huang
Member
 
Location: changsha

Join Date: Mar 2016
Posts: 46
Default

Quote:
Originally Posted by bowhan View Post
can you please check how many of them are duplicated? perhaps with
Code:
awk '/>/{++a[$1]}END{for(b in a) if(a[b]>1) printf "%s\t%d\n", b, a[b]}'  isoseq_flnc.fa
Thanks
I don't understand the results
result.png
lingling huang is offline   Reply With Quote
Old 06-29-2016, 07:28 PM   #8
bowhan
Member
 
Location: San Francisco

Join Date: Sep 2015
Posts: 27
Default

Quote:
Originally Posted by lingling huang View Post
I don't understand the results
Attachment 4394
The awk command parses the input file line by line, counting how many times each header appears. At last, it prints out all the headers (with their times of appearances) if it appears more than once.

Looks like all of your sequences are duplicated.

Can you please check your `input.fofn` file (the one you fed into `ConsensusTools.sh CircularConsensus`) to see if each line (which is a path to a `bax.h5` file) is unique? Or you have each file appearing twice.
bowhan is offline   Reply With Quote
Old 06-29-2016, 07:39 PM   #9
lingling huang
Member
 
Location: changsha

Join Date: Mar 2016
Posts: 46
Default

Quote:
Originally Posted by bowhan View Post
The awk command parses the input file line by line, counting how many times each header appears. At last, it prints out all the headers (with their times of appearances) if it appears more than once.

Looks like all of your sequences are duplicated.

Can you please check your `input.fofn` file (the one you fed into `ConsensusTools.sh CircularConsensus`) to see if each line (which is a path to a `bax.h5` file) is unique? Or you have each file appearing twice.
each line in my input.fofn file seems to be unique.
input.png
lingling huang is offline   Reply With Quote
Old 06-29-2016, 07:46 PM   #10
bowhan
Member
 
Location: San Francisco

Join Date: Sep 2015
Posts: 27
Default

Quote:
Originally Posted by lingling huang View Post
each line in my input.fofn file seems to be unique.
Attachment 4396
You seems to have one `bax.h5` missing and one `bas.h5` there. Can you please fix that and try again?
You should already see the duplication after `CircularConsensus` run (no need to run pbtranscript and tofu)
bowhan is offline   Reply With Quote
Old 06-29-2016, 08:03 PM   #11
lingling huang
Member
 
Location: changsha

Join Date: Mar 2016
Posts: 46
Default

Quote:
Originally Posted by bowhan View Post
You seems to have one `bax.h5` missing and one `bas.h5` there. Can you please fix that and try again?
You should already see the duplication after `CircularConsensus` run (no need to run pbtranscript and tofu)
I'll try it.Thank you so much. You are a great help to me!
lingling huang is offline   Reply With Quote
Old 07-04-2016, 05:27 PM   #12
lingling huang
Member
 
Location: changsha

Join Date: Mar 2016
Posts: 46
Default

Quote:
Originally Posted by bowhan View Post
You seems to have one `bax.h5` missing and one `bas.h5` there. Can you please fix that and try again?
You should already see the duplication after `CircularConsensus` run (no need to run pbtranscript and tofu)
Hi,bowhan, Sorry for disturbing you again,I have tried what you told,I fixed my input.fofn file and each line is unique. However,when running tofu_wrap.py , it appears an error again:
error.png

log files:
c7to372.sh.elog and c7to372.sh.olog
c7to372.sh.elog.png c7to372.sh.olog.png
lingling huang is offline   Reply With Quote
Old 07-05-2016, 08:23 AM   #13
bowhan
Member
 
Location: San Francisco

Join Date: Sep 2015
Posts: 27
Default

Quote:
Originally Posted by lingling huang View Post
Hi,bowhan, Sorry for disturbing you again,I have tried what you told,I fixed my input.fofn file and each line is unique. However,when running tofu_wrap.py , it appears an error again:
Attachment 4398

log files:
c7to372.sh.elog and c7to372.sh.olog
Attachment 4399 Attachment 4400
It clearly had a different error now, which said that you didn't have pysam.
Were you following the instruction here to install tofu? It asks you to install pysam with `pip install pysam`. If not, please follow the instructions to install it. Make sure you do it under smrtshell and virtualenv.
bowhan is offline   Reply With Quote
Old 07-06-2016, 06:43 PM   #14
lingling huang
Member
 
Location: changsha

Join Date: Mar 2016
Posts: 46
Default

Quote:
Originally Posted by bowhan View Post
It clearly had a different error now, which said that you didn't have pysam.
Were you following the instruction here to install tofu? It asks you to install pysam with `pip install pysam`. If not, please follow the instructions to install it. Make sure you do it under smrtshell and virtualenv.
HI,bowhan, how is it going? I'm going out of my mind! I came across a new problem at below.

read.png
lingling huang is offline   Reply With Quote
Old 08-08-2016, 09:29 AM   #15
Magdoll
Member
 
Location: Bay Area

Join Date: Aug 2011
Posts: 30
Default

Not sure if this problem was solved, but I almost wonder if some of the .bax.h5 files were actually missing (not just missing in input.fofn, but that they don't actually exist). This has occasionally happened before in file transfers.

If you are still experiencing issue, try checking the input.fofn files all exist.

You can do so by downloading this simple script:
https://github.com/Magdoll/PacBio-ge...file_exists.py

Then do
`python file_exists.py input.fofn`

If all files listed exist, it would say check passed. Otherwise it will tell you which ones are missing. Remove the missing .bax.h5 files from input.fofn.
Magdoll is offline   Reply With Quote
Old 08-08-2016, 07:36 PM   #16
lingling huang
Member
 
Location: changsha

Join Date: Mar 2016
Posts: 46
Default

Quote:
Originally Posted by Magdoll View Post
Not sure if this problem was solved, but I almost wonder if some of the .bax.h5 files were actually missing (not just missing in input.fofn, but that they don't actually exist). This has occasionally happened before in file transfers.

If you are still experiencing issue, try checking the input.fofn files all exist.

You can do so by downloading this simple script:
https://github.com/Magdoll/PacBio-ge...file_exists.py

Then do
`python file_exists.py input.fofn`

If all files listed exist, it would say check passed. Otherwise it will tell you which ones are missing. Remove the missing .bax.h5 files from input.fofn.
Thank you for your reply, I've got my problem solved and produced final result.
lingling huang is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO