Originally posted by frymor
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Originally posted by frymor View PostI would sugget putting it in the same directory as the other LinkerFilter file, just with a different name.
But to understand it correctly, If I have a fastq file as in my example above, where the structure of the read is as such
HTML Code:tag <->linker[A|B]<->linker[A|B]<->tag
So basically I need to cut the reads myself between the two linker sequences and add the /1|/2 at the head of the header so that the tool can work with them?
Will that suffice to run the program in a PE mode?
Assa
PS
It will be great if we can test the single-end script for the tool
Comment
-
Originally posted by frymor View PostAfter we manage to solve the problem with the java script (thanks a lot Guoliang), I am encountering another problem with the script batman.py.
This is the last input in my log file:
Code:2013-03-25 12:04:36,212 INFO [CSA Mapper/chr1] Merging the outputs and converting it to format recognized by ChIA-PET pipeline... 2013-03-25 12:04:36,212 DEBUG [CSA Mapper/chr1] START [cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map] 2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] Execution failed: 'cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map' 2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] > Traceback (most recent call last): File "/export/chiapet/src/python/pre/batmap.py", line 43, in <module> sys.exit(main()) File "/export/chiapet/src/python/pre/batmap.py", line 25, in main id, hseq, tseq = line.split('\t') ValueError: too many values to unpack cat: write error: Broken pipe
This is how it looks in the *.bat file, where the error happens:
Code:>chr1.15945:1 AAAAAAGGGCTGCAAAATATGTTGGATCCGATATCGC GAGATATCGGATCCAACATAAATCCACTCAGGCTCAA H - chr1:61671579; 0 19 H - chr1:88019527; 2 19 H + chr1:14799656; 1 19 H + chr1:94966502; 1 19 @ [B]>chr1.15949:1 ---AAAAAAGGGGCGCGATATC AACGTTGGATCCGATATCG CG CG[/B] @ >chr1.15953:1 AAAAAAGGGGGGATTGAAATGGTTGGATCCGATATCG CCCGATATCGGATCCAACGCAGCTACTTGGGAGGCTG H - chr1:79407253; 0 19 H - chr1:51165829; 2 19 H + chr1:238895172; 2 19 H + chr1:116055838; 2 19 @
This is the part of the *link file which is extracted into this part of the *bat file, where this problem happens (I think).
Code:GTCCTTCAGAGATGTCTCAA TTTTGTTATGTTCTCTCCAA GTCCTTCAGAGATGTCTCAAAAAAGGGGCGCGATATC CGATATCGGATCCAACGTTTTGTTATGTTCTCTCCAA AAAAAAGGGGCGCGATATC AACGTTGGATCCGATATCG Score: 23 ---AAAAAAGGGGCGCGATATC AACGTTGGATCCGATATCG CG CG GTT------GGATC-CGATATC ---GTTGGATCCGATATCG ||XX| ||||||| |||||||||||||||| 12 19 8 15 3 19 0 16
Thanks for any help.
Assa
The main information is from this step. Need to confirm the linker filtering output format for the mapping. The expected information format is each row with two DNA fragments separated by TAB as follows:
AGAATGTCGTATAGTTGANA TAACCCCCAAAGTGATTGTA
Comment
-
Originally posted by guoliang View PostThe main information is from this step. Need to confirm the linker filtering output format for the mapping.
The way I see it, the pipeline tajes the *.link files, divide them in four parts (****linked****). It is followed by the decode script to create *.encoded files.
For that the tool uses the given genome files. I have created my own files for the human genome, but only for chromosome 1.
For that I added the genome files as mentioned in the config.py file (I added a file in genome/size, genome/gap, genome/gene and genome/chrband - I just copied the chr1 part from the complete hg19 genome files).
I also made a csa.index file using the instructions given in the reference guide, which created several files. These I copied to the data/batman/chr1 folder.
These creates several *.bat files.
The betmap.py script try to merge the *.bat files into a map file. It starts working, but than gives this error.
Why does it happens? I can't figure it out.
Any suggestions?
Thanks
Assa
Comment
-
I have put the linker filtering for single reads to the web linker: http://blog.sciencenet.cn/home.php?m...rd=1&id=674053
You may download the script for the data processing.
Comment
-
-
I have put the linker filtering for single reads in the web link: http://blog.sciencenet.cn/home.php?m...rd=1&id=674053
You can download the Java program and read the "readme.txt" file for how to use it.
Comment
-
Hi again,
Originally posted by guoliang View PostPlease check the output format from the linker filtering step. It may not be in the right format.
This is the command I am using:
Code:python /export/chiapet/src/python/main/csa_mapper.py --asm chr1 --lib chr1 --proc 4 --head IHH015_1r56_headseq.txt --tail IHH015_1r56_taseq.txt --run 2-4 --linker linker_a --seqlen=38 1> csamapper26032013.log 2>&1
The mapping is done for three files unfiltered, linker_a.1, linker_a.2.
Question - why is it being done also for the unfiltered file - what is this file anyway?
This is how the file chr1.linker_a.1.link
Code:TTGCATGTTAACTTTATCTG ATCAAAGTCAGGGTACAGGC TTGCATGTTAACTTTATCTGGTTGTATCCGATATCGC GCGATATCGGATCCAACATCAAAGTCAGGGTACAGGC TGGTTGTATCCGATATCGC ATGTTGGATCCGATATCGC Score: 32 TGGTTGTATCCGATATCGC ATGTTGGATCCGATATCGC CG CG --GTTGGATCCGATATCGC --GTTGGATCCGATATCGC ||||X|||||||||||| ||||||||||||||||| 2 19 0 17 2 19 0 17 CCACATTGTCAAAGGGTCAC AGAAATAGGTCATTTGAGAC CCACATTGTCAAAGGGTCACGTTGGATCCGATATCGC GCGATATCGGATACAACAGAAATAGGTCATTTGAGAC ACGTTGGATCCGATATCGC CTGTTGTATCCGATATCGC Score: 32 ACGTTGGATCCGATATCGC CTGTTGTATCCGATATCGC CG CG --GTTGGATCCGATATCGC --GTTGGATCCGATATCGC ||||||||||||||||| ||||X|||||||||||| 2 19 0 17 2 19 0 17
Code:>chr1.1:1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA H chr1 + 62390903 0 H chr1 + 62390902 0 H chr1 - 186796803 0 H chr1 - 203800090 0 T chr1 + 62390903 0 T chr1 + 62390902 0 T chr1 - 186796803 0 T chr1 - 203800090 0 >chr1.5:1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ACACACTCTTGCCTACCTCTCCTCCTATTTTGGTTTG H chr1 + 62390903 0 H chr1 + 62390902 0 H chr1 - 186796803 0 H chr1 - 203800090 0 >chr1.9:1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ACTCTGCCCACATACAACATACTACCCAACTCTAACT H chr1 + 62390903 0 H chr1 + 62390902 0 H chr1 - 186796803 0 H chr1 - 203800090 0 >chr1.13:1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ATACCGTACTTTTTATATCTTGCCTGTTTTTTTTGTT H chr1 + 62390903 0 H chr1 + 62390902 0 H chr1 - 186796803 0 H chr1 - 203800090 0
The next steps are the analysis of the chr1.linker_a.1.link file
Here it is the same procedure.
The files are being flatten using /flatten_fasta.py and separated into four parts. Than the encoded files are made and afterwards the bat files. all that run with no problems. Than the concatenation of the four bat files into bat is started. It goes well for a while, but than it stopped. here:
This is the last input in the map file:
Code:>chr1.12509:1 AAAAAAAAAATTACATCTCG ATATCTTCTAGTAAGCAAAC H chr1 + 102182243 1 H chr1 + 200626391 1 H chr1 - 215778009 1 H chr1 - 16192379 1 T chr1 - 25767370 2 T chr1 + 234802276 2 T chr1 + 50585810 2
Code:>chr1.12509:1 AAAAAAAAAATTACATCTCG ATATCTTCTAGTAAGCAAAC H + chr1:102182243; 1 19 H + chr1:200626391; 1 19 H - chr1:215778009; 1 19 H - chr1:16192379; 1 19 T - chr1:25767370; 2 19 T + chr1:234802276; 2 19 T + chr1:50585810; 2 19 @ >chr1.12513:1 -AAAAAAAAAATTGGATCCG CCGTTGGATCCGATATCG CG CG H - chr1:27980263; 1 19 H - chr1:23295789; 2 19 H + chr1:223996436; 1 19 H + chr1:59008663; 2 19 @
Am I doing something wrong?
I added my log file so that you can follow the pipeline output, which describe the same workflow. Maybe you'll find something I didn't think about or maybe I missed another step in the analysis.
Thanks
AssaAttached Files
Comment
-
There are four different types of input in the *.bat files.
Code:>chr1.4:1 ||||||||||||||||||| | ||||||||||||||| >chr1.8:1 ||||||||||||||||||| |||||||||||||||| || >chr1.11456:19876 0 13 >chr1.11460:1019476 0 17 >chr1.11612:1 AAAAAAAAAAAAAAAAAAAAAAAGGAACCGATATCCC GCGATATCGGATCCAACTAGGTGAAGAAAGTATGAAT >chr1.11616:1 AAAAAAAAAAAAAAAAAAAA AAATTGATTAAGAAAGGCTT >chr1.38440:1 AAAAGTTGGATCCGATATC TGGTGGGTCCGATATCGCG >chr1.38444:1 AAAAGTTGGATCCGATATC TGGTTGGATCCGATATCGC CG CG >chr1.38448:1 AAAAGTTGGATCCGATATC TTGGTTGGGTCCGGTTTCG >chr1.38452:5 AAAAGTTGGATCCGATATC TTGTTGGATCCGATATCGC CG CG
I would appreciate your help.
Assa
Any comments on that problem?
Does this happens before to other users?Last edited by frymor; 04-01-2013, 11:29 PM.
Comment
-
Any comments on that problem?
As there are no comments on this problem, I would like you ask you for a favor, which will save us and also future users both time and efforts.
I was thinking about whether or not it is possible to get a working VM of the tool. This way We do not need to worry about installations and dependencies for the tool, as everything will be delivered with the VM.
I would very like to hear your opinions about it.
Thanks
Assa
Comment
-
we are getting there
Ok,
I am one step closer to victory.
I succeeded, running the mapper script all the way to the end.
Now all I have to do is to manage to run a chiapet.py script with 8 (!!!) different steps.
It didn't take much time and here is the first error massage.
Code:013-04-12 10:40:34,126 INFO [ChIA-PET/chr1] *** START chr1 *** 2013-04-12 10:40:34,127 INFO [ChIA-PET/chr1] Arguments: Namespace(asm='chr1', cutoff=-1, database='chiapetdb', extlen=-1, force=False, gff_minsup=2, group_id='chr1', java_maxheap=None, lib='chr1', map_file=['prep/chr1/chr1.linker_a.1.link.chr1.map', 'prep/chr1/chr1.linker_a.2.link.chr1.map', 'prep/chr1/chr1.linker_a.c.link.chr1.map'], run='1-8', target='POLII') 2013-04-12 10:40:34,127 INFO [ChIA-PET/chr1] Performing section (1) 2013-04-12 10:43:27,981 INFO [ChIA-PET/chr1] Calculated cut-off value is 3000 2013-04-12 10:43:27,981 INFO [ChIA-PET/chr1] Automatically determining extension length from data in chr1_intensity_dist.txt... 2013-04-12 10:43:27,983 INFO [ChIA-PET/chr1] Calculated extension length is 420 2013-04-12 10:43:27,983 INFO [ChIA-PET/chr1] Updating .info file... Traceback (most recent call last): File "src/python/main/chiapet.py", line 242, in <module> retcode, msg = main() File "src/python/main/chiapet.py", line 155, in main run_v2(mapfiles, args.lib, libworkdir, args.asm, args.cutoff, args.extlen, group_id, shell) File "src/python/main/v2_runner.py", line 102, in run_v2 sh('{p} {script} {infofile} {distfile}'.format(p=sys.executable, **locals()), label='Compatibility hack') TypeError: 'tuple' object is not callable
Assa
Comment
-
the never ending story
so, here is the next step of the analysis
Now I was somehow able to recover from this error (no thanks to any help from the authors), just by changing the format of lines of the script.
I always get the feeling, that no one has ever tested these scripts before uploading them. It is just not possible, as so many small errors are in these scripts, some of them are easy to detect, others are a bit more difficult.
such as the next one here:
This is the error massage i get, when running the chiapet.py script with only step 2
the command:
Code:python $CHIAPETPATH/src/python/main/chiapet.py --asm chr1 --target POLII --lib chr1 --database chiapetdb --group-id chr1 --run 2 $CHIAPETPATH/prep/chr1/*.map > chiapet12042013-step2_1.log 2>&1
Code:2013-04-23 09:50:31,047 INFO [ChIA-PET/chr1] *** START chr1 *** 2013-04-23 09:50:31,047 INFO [ChIA-PET/chr1] Arguments: Namespace(asm='chr1', cutoff=-1, database='chiapetdb', extlen=-1, force=False, gff_minsup=2, group_id='chr1', java_maxheap=None, lib='chr1', map_file=['chiapet/prep/chr1/chr1.linker_a.1.link.chr1.map', 'chiapet/prep/chr1/chr1.linker_a.2.link.chr1.map', 'chiapet/prep/chr1/chr1.linker_a.c.link.chr1.map'], run='2', target='POLII') 2013-04-23 09:50:31,047 INFO [ChIA-PET/chr1] Sieving out multiply-mapped PETs... 2013-04-23 09:50:31,047 INFO [ChIA-PET/chr1] Reading extension length and cut-off values... 2013-04-23 09:50:31,051 INFO [ChIA-PET/chr1] Preparing to check for intersection with satellite regions... Traceback (most recent call last): File "chiapet/src/python/main/chiapet.py", line 242, in <module> retcode, msg = main() File "chiapet/src/python/main/chiapet.py", line 173, in main run_v3(uniqfile, args.lib, libworkdir, args.asm, infofile, shell) File "chiapet/src/python/main/v3_runner.py", line 89, in run_v3 .format(j=Interpreter.Java, repeat=Assembly.SatelliteRepeatInfo[asm], KeyError: 'chr1'
But more than that i just can't find out. I don't know if it has anything to do with the fact, that chr1 is not in the sql DB. But i can't imagine, each time i would like to work with a different organism, I will need to reconstruct the mysql DB, and if so, There are instructions as to how do i add new organisms to the DB.
It might also be enough to add some single files to the different folders, where there is information on the organism, but i think I already did that.
So, again, I would like to ask for any kind of help from anyone.
Thanks,
Assa
Comment
-
another example of me not understanding how this scripts are working
Hi
finally got over the tuple error massage.
I keep writing in this forum for other people who might think working with this workflow is worth the troubles.
The problem was in script v2_runner.py, line 59-60.
Code:sh = sh('{j} {script} {input}'.format(j=Interpreter.Java, script=script, input=tmp.name)) #original cutoff = int(sh[0].strip())
All I needed to do was change the assignment of sh to this:
Code:sh1 = sh('{j} {script} {input}'.format(j=Interpreter.Java, script=script, input=tmp.name)) #original cutoff = int(sh1[0].strip())
Now step two of the chia-pet script is working fine.
Code:python $CHIAPETPATH/src/python/main/chiapet.py --asm chr1 --target POLII --lib chr1 --database chiapetdb --group-id chr1 --run 2 $CHIAPETPATH/prep/chr1/*.map
Comment
-
KeyError: 'chr1
I also managed to overcome this error.
this was made due to the fact, that I didn't have the right SatelliteRepeatInfo-file in the correct folder folder.
In the file structure under/data/genome/repeat/
So I just added the file with the name
Code:data/genome/repeat/satellite_repeat_chr1.txt
Comment
-
next error in step 3
This is the command I am using for step 3:
Code:python $CHIAPETPATH/src/python/main/chiapet.py --asm chr1 --target POLII --lib chr1 --database chiapetdb --group-id chr1 --run 3 $CHIAPETPATH/prep/chr1/*.map
2013-06-07 10:25:24,543 INFO [ChIA-PET/chr1] *** START chr1 ***
2013-06-07 10:25:24,543 INFO [ChIA-PET/chr1] Arguments: Namespace(asm='chr1', cutoff=-1, database='chiapetdb', extlen=-1, force=False, gff_minsup=2, group_id='chr1', java_maxheap=None, lib='chr1', map_file=['/chiapet/prep/chr1/chr1.linker_a.1.link.chr1.map', '/chiapet/prep/chr1/chr1.linker_a.2.link.chr1.map', '/chiapet/prep/chr1/chr1.linker_a.c.link.chr1.map'], run='3', target='POLII')
2013-06-07 10:25:24,544 INFO [ChIA-PET/chr1] Generating SQL files...
2013-06-07 10:25:24,544 INFO [ChIA-PET/chr1] Reading extension length and cut-off values...
2013-06-07 10:25:26,172 ERROR [ChIA-PET/chr1] Execution failed: 'mysql -u root -p*** -D chiapetdb'
2013-06-07 10:25:26,173 ERROR [ChIA-PET/chr1] ??? ERROR 29 (HY000) at line 3: File '/chiapet/work/chr1/chr1.peak' not found (Errcode: 13)
Traceback (most recent call last):
File "/chiapet/src/python/main/chiapet.py", line 242, in <module>
retcode, msg = main()
File "/chiapet/src/python/main/chiapet.py", line 182, in main
args.asm, args.target, args.force, shell)
File "/chiapet/src/python/main/sql_runner.py", line 57, in upload_sql
call_db(db, user, pswd, sql, shell, 'Uploading to table {0}'.format(table))
File "/chiapet/src/python/main/sql_runner.py", line 171, in call_db
return shell(cmd, sql, label=label)[0]
File "/chiapet/src/python/common/exec_util.py", line 61, in __call__
return super(TimedShell, self).__call__(cmd, *args, **kwargs)
File "/chiapet/src/python/common/exec_util.py", line 46, in __call__
raise Exception("Execution failed: '{0}'".format(cmd))
Exception: Execution failed: 'mysql -u root -p*** -D chiapetdb'
-rwxrwxrwx 1 ****** ******* 7722 Jun 6 16:53 chr1.peak*
thanks,
Assa
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
57 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
56 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment