SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PET data in IGV dphansti Bioinformatics 2 03-13-2012 02:13 AM
Experience with ChIA-PET? Boel Epigenetics 2 06-22-2011 11:19 PM
PET-seq analysis bogdan Bioinformatics 0 08-25-2009 09:36 PM
ChIP - PET data analysis bogdan Bioinformatics 0 05-18-2009 04:42 PM
NGS of paired-end tags (PET) for transcriptome and genome analyses Melissa Literature Watch 0 04-24-2009 05:46 AM

Reply
 
Thread Tools
Old 03-19-2013, 12:57 AM   #1
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Unhappy ChIA-PET tool

Hi,

I was wondering if anyone has tried the ChIA-PET Tools pipeline. We are trying to install it now and are having difficulties due to various problems.

As the manual is not of so much use, we would like to ask people here for their opinion.
Was anybody able to run it?

To make my problem somewhat clearer, I would like to know what are the head and tail sequences and how do I get them from the normal fastq file?
Do I need to split the file on my own or does the tool do it for me?

I would appreciate any help or suggestions.

Thanks
Assa
frymor is offline   Reply With Quote
Old 03-19-2013, 08:06 AM   #2
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Hi see here:
http://seqanswers.com/forums/showthread.php?t=17103

I would recommend Simon's Seqmonk for this kind of analysis.
crazyhottommy is offline   Reply With Quote
Old 03-19-2013, 11:04 AM   #3
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Thanks, but I have read this post already as well as a (very) few more about ChIA-PET data, but I would like to know how seqmonq works with this kind of data.

I have two fastq files, which I can't map, as they still have the two linkers inside them.

Did you work with ChIA-PET data in seqmonq?

I would appreciate any kind of help.

Assa

PS

In general I find it really amazing, that the technique is already a few years old, but still only a few people are working with it and even less are willing to share their information.
frymor is offline   Reply With Quote
Old 03-19-2013, 11:21 AM   #4
guoliang
Junior Member
 
Location: Singapore

Join Date: Mar 2009
Posts: 9
Default

Hi Frymor,

Could you please name the exact questions or issues you involved with ChIA-PET Tool?

You need the linker filtering script to identify the linker category and extract the real DNA tags.

Best regards,
Guoliang
guoliang is offline   Reply With Quote
Old 03-19-2013, 12:59 PM   #5
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Well that is exactly the point.

As far as I understand, it is something that happens automatically.

I can't even figure out how to run the program.
I am trying to work with the head and tail sequences provided by the people who created the tool.

The problem is that I always get the error massage, that the linker are not found.

This is the command I use:

Code:
python ~/chiapet/src/python/main/csa_mapper.py --asm hg18 --lib lib18233 --proc 4 --head IHH015_1r56_headseq.txt --tail IHH015_1r56_tailseq.txt --run 3-4 --linker linker_a
The linker I am using are these ones set in the config file:
Quote:
'linker_a.1': 'GTTGGATCCGATATCGCGG'
'linker_a.2': 'GTTGGATCATATATCGCGG'
But I always get the same error, that the linkers are not found:
Code:
cat: lib18233_link.part0002.GTTGGATCCGATATCGCGGCCGCGATATCGGATCCAAC: No such file or directory
cat: lib18233_link.part0003.GTTGGATCCGATATCGCGGCCGCGATATCGGATCCAAC: No such file or directory
cat: lib18233_link.part0004.GTTGGATCCGATATCGCGGCCGCGATATCGGATCCAAC: No such file or directory
The complete output is in the attachment.
It looks like the two linkers are being combined together, one in forward and one in reverse complement.

Am I using the wrong script?
Attached Files
File Type: txt error_massage_chiapet.txt (11.7 KB, 23 views)

Last edited by frymor; 03-19-2013 at 01:20 PM.
frymor is offline   Reply With Quote
Old 03-19-2013, 08:44 PM   #6
guoliang
Junior Member
 
Location: Singapore

Join Date: Mar 2009
Posts: 9
Default

The errors start from the linker filtering steps. Have you compiled the JAVA programs?
guoliang is offline   Reply With Quote
Old 03-20-2013, 06:56 AM   #7
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

and how do I do that?

As far as I know, I did it. in the directory, where $CHIAPETPATH is pointing to, I have the directory bin/LGL/chiapet/LinkerFilter.class.

As well as some more files.
What I don't understand and cannot find what it means is this error massage:

Code:
/export/chiapet/bin:/export/chiapet/lib/java/commons-cli-1.2.jar:/export/chiapet/lib/java/guava-r05.jar 
sg.edu.astar.gis.chiapet.LinkerFilter 
--flip-tail /export/chiapet/prep/lib18233/IHH015_1r56_headseq.txt.part0002.yut7eo 
/export/chiapet/prep/lib18233/IHH015_1r56_tailseq.txt.part0002.jvtNMT lib18233_link.part0002 GTTGGATCCGATATCGCGG GTTGGATCATATATCGCGG 1>/dev/null 2>&1]
Where is that coming from?

The file LinkerFilter.java exits at least two times in this structure. one under 3rd/LGL, one under src/javasg/edu/astar/gis/chiapet.
Than I have the LinkerFilter.class files both in:
Code:
export/chiapet/bin/LGL/chiapet/LinkerFilter.class
export/chiapet/bin/sg/edu/astar/gis/chiapet/LinkerFilter.class
It will be helpful to know which one of the two do I need to take and also how to point the config.py file toward it.

Thanks

Assa
frymor is offline   Reply With Quote
Old 03-20-2013, 11:49 AM   #8
guoliang
Junior Member
 
Location: Singapore

Join Date: Mar 2009
Posts: 9
Default

With the compiled Java programs, you found different messages.

You'd better to use the LinkerFiltering program from the original package: export/chiapet/bin/sg/edu/astar/gis/chiapet/LinkerFilter.class.

Best regards,
Guoliang
guoliang is offline   Reply With Quote
Old 03-21-2013, 05:42 AM   #9
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

yes, I managed to do it now, no thanks to the manual.

I've had to put some output command in my python and java files to finally find out the hard-coded commands written in the csa.mapper.py file are not the right one. This was the command:
Code:
#    t = tshell('''{i} {script} --flip-tail {fasq1} {fasq2} {output} 
#                  {link1} {link2} 1>/dev/null 2>&1'''
and I cahnged it to that (with new parameters):
Code:
    t = tshell('''{i} {script} {fasq1} {fasq2} {output} 
                  {link1} {link2}  --bar-start_1 9 --bar-start_2 9 --bar-length_1 2 --bar-length_2 2 --flip-tail 1>tem.txt 2>&1'''
Thanks for help and the advice about compiling the java scripts.
frymor is offline   Reply With Quote
Old 03-21-2013, 05:51 AM   #10
guoliang
Junior Member
 
Location: Singapore

Join Date: Mar 2009
Posts: 9
Default

Good to know you have fixed the issue. The manual is outdated. A updated version of the manual should be expected.
guoliang is offline   Reply With Quote
Old 03-21-2013, 05:58 AM   #11
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Angry Now to the next problem

Until now I worked with the given files from the tool web site (the separate head and tail files).

Now I would like to analyze my data. This data is in single-end fastq format.

I don't have a clue as to how to run the command now, as there is no mentioning of fastq files in the manuals.
the fastq file contains both tags and linker in one long read.

Can I run the program with just the
Code:
--head filename.fastq
option?

Do I need to cut it on my own?

The fasq (this is not a typing error) which were given with the program look like that (the head file):
Code:
@GA001-PE-R00056-26052008-F:1:1:298:699/1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+GA001-PE-R00056-26052008-F:1:1:298:699/1
VVVVVVVVVVVVVVVXXXXXTTTTTMMMMMHHHHHHH
@GA001-PE-R00056-26052008-F:1:1:121:314/1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+GA001-PE-R00056-26052008-F:1:1:121:314/1
VVVVVVVVVVVVVVVUUUUUKNKKSNNKLLECGCGEG
Here I have a shorter read with only one tag+linker pair. Each header ends with the /1 to show that this part is the header.

My files look like that:
Code:
@HWI-ST225:523:D1AY5ACXX:8:1101:1566:2149 1:N:0:GTGGCC
GCATACCCTCCCTGTCTCAGTTGCTGTTGAAAGAAGAAATGTTGGATAAGATATCGCGGCCGCGATATCTTATCCAACGAAGCCAAAACCCTCGCAGTCTG
+
??@DDDDDHHHHHIGIHGHICHEGAHC<HHII9?FB3?F4C<FDD>0?9D*99*??D@;AA9=?'''3@@>@A@>::?B?2?-?CC3<A??8?B@B#####
@HWI-ST225:523:D1AY5ACXX:8:1101:1629:2247 1:N:0:GTGGCC
TGTCCTGTTGCGTGTCTCAGTCAATCGTGAATACATAACATGTTGGATAAGATATCGCGGCCGCGATATCTTATCCAACTAAGGCGATTCTCTCTGCAGCC
+
@@@DFDEEHFHGF1<CFHIIIGFB4CFFHII>@GGGHICGBG?BFH@FGBHGHGIIIFGEHEFC@B?@BC:3@AC@CCCCCCC<3<<<BCCCCCCA@####
In these I have both linkers and both tags in one read. Does the program knows how to handle these files?

I would appreciate any kind of help.

Assa
frymor is offline   Reply With Quote
Old 03-21-2013, 07:31 AM   #12
guoliang
Junior Member
 
Location: Singapore

Join Date: Mar 2009
Posts: 9
Default

You can't use the Java program in the published ChIA-PET Tool package for the single-read linker filtering.

I have a Java program for single-read linker filtering. Considering where to put it.
guoliang is offline   Reply With Quote
Old 03-21-2013, 07:54 AM   #13
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

I would sugget putting it in the same directory as the other LinkerFilter file, just with a different name.

But to understand it correctly, If I have a fastq file as in my example above, where the structure of the read is as such

HTML Code:
tag <->linker[A|B]<->linker[A|B]<->tag
I can't use it with this tool?

So basically I need to cut the reads myself between the two linker sequences and add the /1|/2 at the head of the header so that the tool can work with them?

Will that suffice to run the program in a PE mode?

Assa

PS
It will be great if we can test the single-end script for the tool
frymor is offline   Reply With Quote
Old 03-21-2013, 07:57 AM   #14
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Another quick question - how did you generate the head and tail fastq files? Did you have them from a paired-end experiment?

Just to know for the next time - does it make more sense to run a paired-end sequencing when working with ChIA-PET data?

Assa
frymor is offline   Reply With Quote
Old 03-25-2013, 04:51 AM   #15
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default next problem

After we manage to solve the problem with the java script (thanks a lot Guoliang), I am encountering another problem with the script batman.py.

This is the last input in my log file:
Code:
2013-03-25 12:04:36,212  INFO [CSA Mapper/chr1] Merging the outputs and converting it to format recognized by ChIA-PET pipeline...
2013-03-25 12:04:36,212 DEBUG [CSA Mapper/chr1] START [cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map]
2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] Execution failed: 'cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map'
2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] > Traceback (most recent call last):
  File "/export/chiapet/src/python/pre/batmap.py", line 43, in <module>
    sys.exit(main())
  File "/export/chiapet/src/python/pre/batmap.py", line 25, in main
    id, hseq, tseq = line.split('\t')
ValueError: too many values to unpack
cat: write error: Broken pipe
This error occurs due to a formatting problem in the *.bat file. It expect it to have a certain format, which is strangely become a different one in specific rows.
This is how it looks in the *.bat file, where the error happens:
Code:
>chr1.15945:1	AAAAAAGGGCTGCAAAATATGTTGGATCCGATATCGC	GAGATATCGGATCCAACATAAATCCACTCAGGCTCAA
H	-	 chr1:61671579;	0	19
H	-	 chr1:88019527;	2	19
H	+	 chr1:14799656;	1	19
H	+	 chr1:94966502;	1	19
@
>chr1.15949:1	---AAAAAAGGGGCGCGATATC	AACGTTGGATCCGATATCG	CG	CG
@
>chr1.15953:1	AAAAAAGGGGGGATTGAAATGGTTGGATCCGATATCG	CCCGATATCGGATCCAACGCAGCTACTTGGGAGGCTG
H	-	 chr1:79407253;	0	19
H	-	 chr1:51165829;	2	19
H	+	 chr1:238895172;	2	19
H	+	 chr1:116055838;	2	19
@
the header for each part has three elements separated by \t. In the middle header, there are more elements. I can't understand the reason they are there.

This is the part of the *link file which is extracted into this part of the *bat file, where this problem happens (I think).

Code:
GTCCTTCAGAGATGTCTCAA    TTTTGTTATGTTCTCTCCAA
GTCCTTCAGAGATGTCTCAAAAAAGGGGCGCGATATC   CGATATCGGATCCAACGTTTTGTTATGTTCTCTCCAA
AAAAAAGGGGCGCGATATC     AACGTTGGATCCGATATCG
Score: 23
---AAAAAAGGGGCGCGATATC  AACGTTGGATCCGATATCG     CG      CG
GTT------GGATC-CGATATC  ---GTTGGATCCGATATCG
         ||XX| |||||||     ||||||||||||||||
12      19
8       15
3       19
0       16
Can anyone explain this kind of problem?

Thanks for any help.

Assa
frymor is offline   Reply With Quote
Old 03-25-2013, 06:53 AM   #16
guoliang
Junior Member
 
Location: Singapore

Join Date: Mar 2009
Posts: 9
Default

Quote:
Originally Posted by frymor View Post
Another quick question - how did you generate the head and tail fastq files? Did you have them from a paired-end experiment?

Just to know for the next time - does it make more sense to run a paired-end sequencing when working with ChIA-PET data?

Assa
The head and tail fastq files are from Paired-end experiments. The original protocol is for paired-end experiments, due to the sequencing length issue. Now the sequencers can generate longer sequences, and single-reads are fine for the study. Just need new scripts for single reads linker filtering.
guoliang is offline   Reply With Quote
Old 03-25-2013, 06:57 AM   #17
guoliang
Junior Member
 
Location: Singapore

Join Date: Mar 2009
Posts: 9
Default

Quote:
Originally Posted by frymor View Post
I would sugget putting it in the same directory as the other LinkerFilter file, just with a different name.

But to understand it correctly, If I have a fastq file as in my example above, where the structure of the read is as such

HTML Code:
tag <->linker[A|B]<->linker[A|B]<->tag
I can't use it with this tool?

So basically I need to cut the reads myself between the two linker sequences and add the /1|/2 at the head of the header so that the tool can work with them?

Will that suffice to run the program in a PE mode?

Assa

PS
It will be great if we can test the single-end script for the tool
I am considering to put the script in a web space. Haven't integrate and test the pipeline. Currently the script is tested stand-alone and distributed individually. You may check your email for the script.
guoliang is offline   Reply With Quote
Old 03-25-2013, 07:03 AM   #18
guoliang
Junior Member
 
Location: Singapore

Join Date: Mar 2009
Posts: 9
Default

Quote:
Originally Posted by frymor View Post
After we manage to solve the problem with the java script (thanks a lot Guoliang), I am encountering another problem with the script batman.py.

This is the last input in my log file:
Code:
2013-03-25 12:04:36,212  INFO [CSA Mapper/chr1] Merging the outputs and converting it to format recognized by ChIA-PET pipeline...
2013-03-25 12:04:36,212 DEBUG [CSA Mapper/chr1] START [cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map]
2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] Execution failed: 'cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map'
2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] > Traceback (most recent call last):
  File "/export/chiapet/src/python/pre/batmap.py", line 43, in <module>
    sys.exit(main())
  File "/export/chiapet/src/python/pre/batmap.py", line 25, in main
    id, hseq, tseq = line.split('\t')
ValueError: too many values to unpack
cat: write error: Broken pipe
This error occurs due to a formatting problem in the *.bat file. It expect it to have a certain format, which is strangely become a different one in specific rows.
This is how it looks in the *.bat file, where the error happens:
Code:
>chr1.15945:1	AAAAAAGGGCTGCAAAATATGTTGGATCCGATATCGC	GAGATATCGGATCCAACATAAATCCACTCAGGCTCAA
H	-	 chr1:61671579;	0	19
H	-	 chr1:88019527;	2	19
H	+	 chr1:14799656;	1	19
H	+	 chr1:94966502;	1	19
@
>chr1.15949:1	---AAAAAAGGGGCGCGATATC	AACGTTGGATCCGATATCG	CG	CG
@
>chr1.15953:1	AAAAAAGGGGGGATTGAAATGGTTGGATCCGATATCG	CCCGATATCGGATCCAACGCAGCTACTTGGGAGGCTG
H	-	 chr1:79407253;	0	19
H	-	 chr1:51165829;	2	19
H	+	 chr1:238895172;	2	19
H	+	 chr1:116055838;	2	19
@
the header for each part has three elements separated by \t. In the middle header, there are more elements. I can't understand the reason they are there.

This is the part of the *link file which is extracted into this part of the *bat file, where this problem happens (I think).

Code:
GTCCTTCAGAGATGTCTCAA    TTTTGTTATGTTCTCTCCAA
GTCCTTCAGAGATGTCTCAAAAAAGGGGCGCGATATC   CGATATCGGATCCAACGTTTTGTTATGTTCTCTCCAA
AAAAAAGGGGCGCGATATC     AACGTTGGATCCGATATCG
Score: 23
---AAAAAAGGGGCGCGATATC  AACGTTGGATCCGATATCG     CG      CG
GTT------GGATC-CGATATC  ---GTTGGATCCGATATCG
         ||XX| |||||||     ||||||||||||||||
12      19
8       15
3       19
0       16
Can anyone explain this kind of problem?

Thanks for any help.

Assa
2013-03-25 12:04:36,212 INFO [CSA Mapper/chr1] Merging the outputs and converting it to format recognized by ChIA-PET pipeline...

The main information is from this step. Need to confirm the linker filtering output format for the mapping. The expected information format is each row with two DNA fragments separated by TAB as follows:
AGAATGTCGTATAGTTGANA TAACCCCCAAAGTGATTGTA
guoliang is offline   Reply With Quote
Old 03-25-2013, 08:23 AM   #19
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Quote:
Originally Posted by guoliang View Post
The main information is from this step. Need to confirm the linker filtering output format for the mapping.
Doesn't it happens automatically or do I need to do it myself?

The way I see it, the pipeline tajes the *.link files, divide them in four parts (****linked****). It is followed by the decode script to create *.encoded files.

For that the tool uses the given genome files. I have created my own files for the human genome, but only for chromosome 1.
For that I added the genome files as mentioned in the config.py file (I added a file in genome/size, genome/gap, genome/gene and genome/chrband - I just copied the chr1 part from the complete hg19 genome files).
I also made a csa.index file using the instructions given in the reference guide, which created several files. These I copied to the data/batman/chr1 folder.

These creates several *.bat files.

The betmap.py script try to merge the *.bat files into a map file. It starts working, but than gives this error.

Why does it happens? I can't figure it out.

Any suggestions?

Thanks
Assa
frymor is offline   Reply With Quote
Old 03-25-2013, 07:44 PM   #20
guoliang
Junior Member
 
Location: Singapore

Join Date: Mar 2009
Posts: 9
Default

Quote:
Originally Posted by frymor View Post
Doesn't it happens automatically or do I need to do it myself?

Why does it happens? I can't figure it out.

Any suggestions?

Thanks
Assa
Please check the output format from the linker filtering step. It may not be in the right format.
guoliang is offline   Reply With Quote
Reply

Tags
chia-pet, chiapet, chromosome conformation

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:51 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO