Seqanswers Leaderboard Ad

**guoliang** · 03-25-2013, 06:53 AM

Originally posted by frymor View Post

Another quick question - how did you generate the head and tail fastq files? Did you have them from a paired-end experiment?

Just to know for the next time - does it make more sense to run a paired-end sequencing when working with ChIA-PET data?

Assa

The head and tail fastq files are from Paired-end experiments. The original protocol is for paired-end experiments, due to the sequencing length issue. Now the sequencers can generate longer sequences, and single-reads are fine for the study. Just need new scripts for single reads linker filtering.

**guoliang** · 03-25-2013, 06:57 AM

Originally posted by frymor View Post

I would sugget putting it in the same directory as the other LinkerFilter file, just with a different name.

But to understand it correctly, If I have a fastq file as in my example above, where the structure of the read is as such

HTML Code:

tag <->linker[A|B]<->linker[A|B]<->tag

I can't use it with this tool?

So basically I need to cut the reads myself between the two linker sequences and add the /1|/2 at the head of the header so that the tool can work with them?

Will that suffice to run the program in a PE mode?

Assa

PS
It will be great if we can test the single-end script for the tool

I am considering to put the script in a web space. Haven't integrate and test the pipeline. Currently the script is tested stand-alone and distributed individually. You may check your email for the script.

**guoliang** · 03-25-2013, 07:03 AM

Originally posted by frymor View Post

After we manage to solve the problem with the java script (thanks a lot Guoliang), I am encountering another problem with the script batman.py.

This is the last input in my log file:

Code:

2013-03-25 12:04:36,212  INFO [CSA Mapper/chr1] Merging the outputs and converting it to format recognized by ChIA-PET pipeline...
2013-03-25 12:04:36,212 DEBUG [CSA Mapper/chr1] START [cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map]
2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] Execution failed: 'cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map'
2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] > Traceback (most recent call last):
  File "/export/chiapet/src/python/pre/batmap.py", line 43, in <module>
    sys.exit(main())
  File "/export/chiapet/src/python/pre/batmap.py", line 25, in main
    id, hseq, tseq = line.split('\t')
ValueError: too many values to unpack
cat: write error: Broken pipe

This error occurs due to a formatting problem in the *.bat file. It expect it to have a certain format, which is strangely become a different one in specific rows.
This is how it looks in the *.bat file, where the error happens:

Code:

>chr1.15945:1	AAAAAAGGGCTGCAAAATATGTTGGATCCGATATCGC	GAGATATCGGATCCAACATAAATCCACTCAGGCTCAA
H	-	 chr1:61671579;	0	19
H	-	 chr1:88019527;	2	19
H	+	 chr1:14799656;	1	19
H	+	 chr1:94966502;	1	19
@
[B]>chr1.15949:1	---AAAAAAGGGGCGCGATATC	AACGTTGGATCCGATATCG	CG	CG[/B]
@
>chr1.15953:1	AAAAAAGGGGGGATTGAAATGGTTGGATCCGATATCG	CCCGATATCGGATCCAACGCAGCTACTTGGGAGGCTG
H	-	 chr1:79407253;	0	19
H	-	 chr1:51165829;	2	19
H	+	 chr1:238895172;	2	19
H	+	 chr1:116055838;	2	19
@

the header for each part has three elements separated by \t. In the middle header, there are more elements. I can't understand the reason they are there.

This is the part of the *link file which is extracted into this part of the *bat file, where this problem happens (I think).

Code:

GTCCTTCAGAGATGTCTCAA    TTTTGTTATGTTCTCTCCAA
GTCCTTCAGAGATGTCTCAAAAAAGGGGCGCGATATC   CGATATCGGATCCAACGTTTTGTTATGTTCTCTCCAA
AAAAAAGGGGCGCGATATC     AACGTTGGATCCGATATCG
Score: 23
---AAAAAAGGGGCGCGATATC  AACGTTGGATCCGATATCG     CG      CG
GTT------GGATC-CGATATC  ---GTTGGATCCGATATCG
         ||XX| |||||||     ||||||||||||||||
12      19
8       15
3       19
0       16

Can anyone explain this kind of problem?

Thanks for any help.

Assa

2013-03-25 12:04:36,212 INFO [CSA Mapper/chr1] Merging the outputs and converting it to format recognized by ChIA-PET pipeline...

The main information is from this step. Need to confirm the linker filtering output format for the mapping. The expected information format is each row with two DNA fragments separated by TAB as follows:
AGAATGTCGTATAGTTGANA TAACCCCCAAAGTGATTGTA

**frymor** · 03-25-2013, 08:23 AM

Originally posted by guoliang View Post

The main information is from this step. Need to confirm the linker filtering output format for the mapping.

Doesn't it happens automatically or do I need to do it myself?

The way I see it, the pipeline tajes the *.link files, divide them in four parts (****linked****). It is followed by the decode script to create *.encoded files.

For that the tool uses the given genome files. I have created my own files for the human genome, but only for chromosome 1.
For that I added the genome files as mentioned in the config.py file (I added a file in genome/size, genome/gap, genome/gene and genome/chrband - I just copied the chr1 part from the complete hg19 genome files).
I also made a csa.index file using the instructions given in the reference guide, which created several files. These I copied to the data/batman/chr1 folder.

These creates several *.bat files.

The betmap.py script try to merge the *.bat files into a map file. It starts working, but than gives this error.

Why does it happens? I can't figure it out.

Any suggestions?

Thanks
Assa

**guoliang** · 03-25-2013, 07:41 PM

I have put the linker filtering for single reads to the web linker: http://blog.sciencenet.cn/home.php?m...rd=1&id=674053

You may download the script for the data processing.

**guoliang** · 03-25-2013, 07:44 PM

Originally posted by frymor View Post

Doesn't it happens automatically or do I need to do it myself?

Why does it happens? I can't figure it out.

Any suggestions?

Thanks
Assa

Please check the output format from the linker filtering step. It may not be in the right format.

**guoliang** · 03-25-2013, 07:46 PM

I have put the linker filtering for single reads in the web link: http://blog.sciencenet.cn/home.php?m...rd=1&id=674053

You can download the Java program and read the "readme.txt" file for how to use it.

**frymor** · 03-26-2013, 07:35 AM

Hi again,

Originally posted by guoliang View Post

Please check the output format from the linker filtering step. It may not be in the right format.

This is easier said than done, as I am not sure what to expect.
This is the command I am using:

Code:

python /export/chiapet/src/python/main/csa_mapper.py --asm chr1 --lib chr1 --proc 4 --head IHH015_1r56_headseq.txt --tail IHH015_1r56_taseq.txt --run 2-4 --linker linker_a --seqlen=38 1> csamapper26032013.log 2>&1

Apparently there is something wrong with the file format.
The mapping is done for three files unfiltered, linker_a.1, linker_a.2.

Question - why is it being done also for the unfiltered file - what is this file anyway?

This is how the file chr1.linker_a.1.link

Code:

TTGCATGTTAACTTTATCTG	ATCAAAGTCAGGGTACAGGC		
TTGCATGTTAACTTTATCTGGTTGTATCCGATATCGC	GCGATATCGGATCCAACATCAAAGTCAGGGTACAGGC		
TGGTTGTATCCGATATCGC	ATGTTGGATCCGATATCGC		
Score: 32			
TGGTTGTATCCGATATCGC	ATGTTGGATCCGATATCGC	CG	CG
--GTTGGATCCGATATCGC	--GTTGGATCCGATATCGC		
  ||||X||||||||||||	  |||||||||||||||||		
2	19		
0	17		
2	19		
0	17		
CCACATTGTCAAAGGGTCAC	AGAAATAGGTCATTTGAGAC		
CCACATTGTCAAAGGGTCACGTTGGATCCGATATCGC	GCGATATCGGATACAACAGAAATAGGTCATTTGAGAC		
ACGTTGGATCCGATATCGC	CTGTTGTATCCGATATCGC		
Score: 32			
ACGTTGGATCCGATATCGC	CTGTTGTATCCGATATCGC	CG	CG
--GTTGGATCCGATATCGC	--GTTGGATCCGATATCGC		
  |||||||||||||||||	  ||||X||||||||||||		
2	19		
0	17		
2	19		
0	17

And this is how the first 24 lines of the file chr1.unfiltered.map looks like:

Code:

>chr1.1:1	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
H	chr1	+	62390903	0
H	chr1	+	62390902	0
H	chr1	-	186796803	0
H	chr1	-	203800090	0
T	chr1	+	62390903	0
T	chr1	+	62390902	0
T	chr1	-	186796803	0
T	chr1	-	203800090	0
>chr1.5:1	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	ACACACTCTTGCCTACCTCTCCTCCTATTTTGGTTTG
H	chr1	+	62390903	0
H	chr1	+	62390902	0
H	chr1	-	186796803	0
H	chr1	-	203800090	0
>chr1.9:1	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	ACTCTGCCCACATACAACATACTACCCAACTCTAACT
H	chr1	+	62390903	0
H	chr1	+	62390902	0
H	chr1	-	186796803	0
H	chr1	-	203800090	0
>chr1.13:1	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	ATACCGTACTTTTTATATCTTGCCTGTTTTTTTTGTT
H	chr1	+	62390903	0
H	chr1	+	62390902	0
H	chr1	-	186796803	0
H	chr1	-	203800090	0

The mapping of this files was also successfully completed.

The next steps are the analysis of the chr1.linker_a.1.link file

Here it is the same procedure.
The files are being flatten using /flatten_fasta.py and separated into four parts. Than the encoded files are made and afterwards the bat files. all that run with no problems. Than the concatenation of the four bat files into bat is started. It goes well for a while, but than it stopped. here:

This is the last input in the map file:

Code:

>chr1.12509:1	AAAAAAAAAATTACATCTCG	ATATCTTCTAGTAAGCAAAC		
H	chr1	+	102182243	1
H	chr1	+	200626391	1
H	chr1	-	215778009	1
H	chr1	-	16192379	1
T	chr1	-	25767370	2
T	chr1	+	234802276	2
T	chr1	+	50585810	2

and this is the same position as in the map file as well as the one after it:

Code:

>chr1.12509:1	AAAAAAAAAATTACATCTCG	ATATCTTCTAGTAAGCAAAC
H	+	 chr1:102182243;	1	19
H	+	 chr1:200626391;	1	19
H	-	 chr1:215778009;	1	19
H	-	 chr1:16192379;	1	19
T	-	 chr1:25767370;	2	19
T	+	 chr1:234802276;	2	19
T	+	 chr1:50585810;	2	19
@
>chr1.12513:1	-AAAAAAAAAATTGGATCCG	CCGTTGGATCCGATATCG	CG	CG
H	-	 chr1:27980263;	1	19
H	-	 chr1:23295789;	2	19
H	+	 chr1:223996436;	1	19
H	+	 chr1:59008663;	2	19
@

So as you can see, something happens to the linker files (bat,encoded) so that it doesn't have the correct format. Due to that the script batmap.py cannot concatenate them correctly.

Am I doing something wrong?

I added my log file so that you can follow the pipeline output, which describe the same workflow. Maybe you'll find something I didn't think about or maybe I missed another step in the analysis.

Thanks

Assa

Attached Files

csamapper26032013log.txt.zip (4.4 KB, 2 views)

**frymor** · 03-27-2013, 12:57 AM

There are four different types of input in the *.bat files.

Code:

>chr1.4:1			
|||||||||||||||||||	|   |||||||||||||||		
>chr1.8:1			
|||||||||||||||||||	 |||||||||||||||| ||		
			
>chr1.11456:19876			
0	13		
>chr1.11460:1019476			
0	17		
>chr1.11612:1			
AAAAAAAAAAAAAAAAAAAAAAAGGAACCGATATCCC	GCGATATCGGATCCAACTAGGTGAAGAAAGTATGAAT		
>chr1.11616:1			
AAAAAAAAAAAAAAAAAAAA	AAATTGATTAAGAAAGGCTT		
			
>chr1.38440:1			
AAAAGTTGGATCCGATATC	TGGTGGGTCCGATATCGCG		
>chr1.38444:1			
AAAAGTTGGATCCGATATC	TGGTTGGATCCGATATCGC	CG	CG
>chr1.38448:1			
AAAAGTTGGATCCGATATC	TTGGTTGGGTCCGGTTTCG		
>chr1.38452:5			
AAAAGTTGGATCCGATATC	TTGTTGGATCCGATATCGC	CG	CG

It is difficult to say if the link file is bad, as I can't compare the headers to it. In the link file there are no headers.

I would appreciate your help.

Assa

Any comments on that problem?
Does this happens before to other users?

**frymor** · 04-08-2013, 11:46 PM

Any comments on that problem?

I don't understand, how other users were able to work with the tool, if even trying to run it with the provided data the tool was apparently designed to work with, is not functioning.

As there are no comments on this problem, I would like you ask you for a favor, which will save us and also future users both time and efforts.

I was thinking about whether or not it is possible to get a working VM of the tool. This way We do not need to worry about installations and dependencies for the tool, as everything will be delivered with the VM.

I would very like to hear your opinions about it.

Thanks

Assa

**frymor** · 04-12-2013, 12:49 AM

we are getting there

Ok,

I am one step closer to victory.

I succeeded, running the mapper script all the way to the end.

Now all I have to do is to manage to run a chiapet.py script with 8 (!!!) different steps.
It didn't take much time and here is the first error massage.

Code:

013-04-12 10:40:34,126  INFO [ChIA-PET/chr1] *** START chr1 ***
2013-04-12 10:40:34,127  INFO [ChIA-PET/chr1] Arguments: Namespace(asm='chr1', cutoff=-1, database='chiapetdb', extlen=-1, force=False, gff_minsup=2, group_id='chr1', java_maxheap=None, lib='chr1', map_file=['prep/chr1/chr1.linker_a.1.link.chr1.map', 'prep/chr1/chr1.linker_a.2.link.chr1.map', 'prep/chr1/chr1.linker_a.c.link.chr1.map'], run='1-8', target='POLII')
2013-04-12 10:40:34,127  INFO [ChIA-PET/chr1] Performing section (1)
2013-04-12 10:43:27,981  INFO [ChIA-PET/chr1] Calculated cut-off value is 3000
2013-04-12 10:43:27,981  INFO [ChIA-PET/chr1] Automatically determining extension length from data in chr1_intensity_dist.txt...
2013-04-12 10:43:27,983  INFO [ChIA-PET/chr1] Calculated extension length is 420
2013-04-12 10:43:27,983  INFO [ChIA-PET/chr1] Updating .info file...
Traceback (most recent call last):
  File "src/python/main/chiapet.py", line 242, in <module>
    retcode, msg = main()
  File "src/python/main/chiapet.py", line 155, in main
    run_v2(mapfiles, args.lib, libworkdir, args.asm, args.cutoff, args.extlen, group_id, shell)
  File "src/python/main/v2_runner.py", line 102, in run_v2
    sh('{p} {script} {infofile} {distfile}'.format(p=sys.executable, **locals()), label='Compatibility hack')
TypeError: 'tuple' object is not callable

Has anyone any idea to the source of the problem?

Assa

**frymor** · 04-23-2013, 12:45 AM

the never ending story

so, here is the next step of the analysis
Now I was somehow able to recover from this error (no thanks to any help from the authors), just by changing the format of lines of the script.

I always get the feeling, that no one has ever tested these scripts before uploading them. It is just not possible, as so many small errors are in these scripts, some of them are easy to detect, others are a bit more difficult.

such as the next one here:
This is the error massage i get, when running the chiapet.py script with only step 2
the command:

Code:

python $CHIAPETPATH/src/python/main/chiapet.py --asm chr1 --target POLII --lib chr1 --database chiapetdb --group-id chr1 --run 2 $CHIAPETPATH/prep/chr1/*.map > chiapet12042013-step2_1.log 2>&1

and the error massgae:

Code:

2013-04-23 09:50:31,047  INFO [ChIA-PET/chr1] *** START chr1 ***
2013-04-23 09:50:31,047  INFO [ChIA-PET/chr1] Arguments: Namespace(asm='chr1', cutoff=-1, database='chiapetdb', extlen=-1, force=False, gff_minsup=2, group_id='chr1', java_maxheap=None, lib='chr1', map_file=['chiapet/prep/chr1/chr1.linker_a.1.link.chr1.map', 'chiapet/prep/chr1/chr1.linker_a.2.link.chr1.map', 'chiapet/prep/chr1/chr1.linker_a.c.link.chr1.map'], run='2', target='POLII')
2013-04-23 09:50:31,047  INFO [ChIA-PET/chr1] Sieving out multiply-mapped PETs...
2013-04-23 09:50:31,047  INFO [ChIA-PET/chr1] Reading extension length and cut-off values...
2013-04-23 09:50:31,051  INFO [ChIA-PET/chr1] Preparing to check for intersection with satellite regions...
Traceback (most recent call last):
  File "chiapet/src/python/main/chiapet.py", line 242, in <module>
    retcode, msg = main()
  File "chiapet/src/python/main/chiapet.py", line 173, in main
    run_v3(uniqfile, args.lib, libworkdir, args.asm, infofile, shell)
  File "chiapet/src/python/main/v3_runner.py", line 89, in run_v3
    .format(j=Interpreter.Java, repeat=Assembly.SatelliteRepeatInfo[asm],
KeyError: 'chr1'

As far as I can interpret this error, it has something to do with the fact that I am using the 'chr1' as a subset of the 'hg19' complete genome to test the program.
But more than that i just can't find out. I don't know if it has anything to do with the fact, that chr1 is not in the sql DB. But i can't imagine, each time i would like to work with a different organism, I will need to reconstruct the mysql DB, and if so, There are instructions as to how do i add new organisms to the DB.
It might also be enough to add some single files to the different folders, where there is information on the organism, but i think I already did that.

So, again, I would like to ask for any kind of help from anyone.

Thanks,
Assa

**frymor** · 06-07-2013, 12:01 AM

another example of me not understanding how this scripts are working

Hi

finally got over the tuple error massage.

I keep writing in this forum for other people who might think working with this workflow is worth the troubles.

The problem was in script v2_runner.py, line 59-60.

Code:

      sh = sh('{j} {script} {input}'.format(j=Interpreter.Java, script=script, input=tmp.name)) #original
      cutoff = int(sh[0].strip())

This is how it was. the sh command was assigned to a sh object, which as far I can understand overwrite the sh function. This of course causes the sh function next time it was called to be not callable

All I needed to do was change the assignment of sh to this:

Code:

      sh1 = sh('{j} {script} {input}'.format(j=Interpreter.Java, script=script, input=tmp.name)) #original
      cutoff = int(sh1[0].strip())

and it all works fine.

Now step two of the chia-pet script is working fine.

Code:

python $CHIAPETPATH/src/python/main/chiapet.py --asm chr1 --target POLII --lib chr1 --database chiapetdb --group-id chr1 --run 2 $CHIAPETPATH/prep/chr1/*.map

**frymor** · 06-07-2013, 12:22 AM

KeyError: 'chr1

I also managed to overcome this error.

this was made due to the fact, that I didn't have the right SatelliteRepeatInfo-file in the correct folder folder.

In the file structure under

/data/genome/repeat/

there are some files for the Satellite Repeats. If you want to use a different genome than the few there are there, you need to add a file here with the information.

So I just added the file with the name

Code:

data/genome/repeat/satellite_repeat_chr1.txt

and the problem was solved.

**frymor** · 06-07-2013, 12:43 AM

next error in step 3

This is the command I am using for step 3:

Code:

python $CHIAPETPATH/src/python/main/chiapet.py --asm chr1 --target POLII --lib chr1 --database chiapetdb --group-id chr1 --run 3 $CHIAPETPATH/prep/chr1/*.map

and here is the log output I am getting:

2013-06-07 10:25:24,543 INFO [ChIA-PET/chr1] *** START chr1 ***
2013-06-07 10:25:24,543 INFO [ChIA-PET/chr1] Arguments: Namespace(asm='chr1', cutoff=-1, database='chiapetdb', extlen=-1, force=False, gff_minsup=2, group_id='chr1', java_maxheap=None, lib='chr1', map_file=['/chiapet/prep/chr1/chr1.linker_a.1.link.chr1.map', '/chiapet/prep/chr1/chr1.linker_a.2.link.chr1.map', '/chiapet/prep/chr1/chr1.linker_a.c.link.chr1.map'], run='3', target='POLII')
2013-06-07 10:25:24,544 INFO [ChIA-PET/chr1] Generating SQL files...
2013-06-07 10:25:24,544 INFO [ChIA-PET/chr1] Reading extension length and cut-off values...
2013-06-07 10:25:26,172 ERROR [ChIA-PET/chr1] Execution failed: 'mysql -u root -p*** -D chiapetdb'
2013-06-07 10:25:26,173 ERROR [ChIA-PET/chr1] ??? ERROR 29 (HY000) at line 3: File '/chiapet/work/chr1/chr1.peak' not found (Errcode: 13)

Traceback (most recent call last):
File "/chiapet/src/python/main/chiapet.py", line 242, in <module>
retcode, msg = main()
File "/chiapet/src/python/main/chiapet.py", line 182, in main
args.asm, args.target, args.force, shell)
File "/chiapet/src/python/main/sql_runner.py", line 57, in upload_sql
call_db(db, user, pswd, sql, shell, 'Uploading to table {0}'.format(table))
File "/chiapet/src/python/main/sql_runner.py", line 171, in call_db
return shell(cmd, sql, label=label)[0]
File "/chiapet/src/python/common/exec_util.py", line 61, in __call__
return super(TimedShell, self).__call__(cmd, *args, **kwargs)
File "/chiapet/src/python/common/exec_util.py", line 46, in __call__
raise Exception("Execution failed: '{0}'".format(cmd))
Exception: Execution failed: 'mysql -u root -p*** -D chiapetdb'

The file is there! I can see it, it has read rights for all users

-rwxrwxrwx 1 ****** ******* 7722 Jun 6 16:53 chr1.peak*

Has someone any idea, why this happens?

thanks,
Assa

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News