Seqanswers Leaderboard Ad

**crazyhottommy** · 03-19-2013, 08:06 AM

Hi see here:

visualization of 3C interactions - SEQanswers

http://seqanswers.com/forums/showthread.php?t=17103

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

I would recommend Simon's Seqmonk for this kind of analysis.

**frymor** · 03-19-2013, 11:04 AM

Thanks, but I have read this post already as well as a (very) few more about ChIA-PET data, but I would like to know how seqmonq works with this kind of data.

I have two fastq files, which I can't map, as they still have the two linkers inside them.

Did you work with ChIA-PET data in seqmonq?

I would appreciate any kind of help.

Assa

PS

In general I find it really amazing, that the technique is already a few years old, but still only a few people are working with it and even less are willing to share their information.

**guoliang** · 03-19-2013, 11:21 AM

Hi Frymor,

Could you please name the exact questions or issues you involved with ChIA-PET Tool?

You need the linker filtering script to identify the linker category and extract the real DNA tags.

Best regards,
Guoliang

**frymor** · 03-19-2013, 12:59 PM

Well that is exactly the point.

As far as I understand, it is something that happens automatically.

I can't even figure out how to run the program.
I am trying to work with the head and tail sequences provided by the people who created the tool.

The problem is that I always get the error massage, that the linker are not found.

This is the command I use:

Code:

python ~/chiapet/src/python/main/csa_mapper.py --asm hg18 --lib lib18233 --proc 4 --head IHH015_1r56_headseq.txt --tail IHH015_1r56_tailseq.txt --run 3-4 --linker linker_a

The linker I am using are these ones set in the config file:

'linker_a.1': 'GTTGGATCCGATATCGCGG'
'linker_a.2': 'GTTGGATCATATATCGCGG'

But I always get the same error, that the linkers are not found:

Code:

cat: lib18233_link.part0002.GTTGGATCCGATATCGCGGCCGCGATATCGGATCCAAC: No such file or directory
cat: lib18233_link.part0003.GTTGGATCCGATATCGCGGCCGCGATATCGGATCCAAC: No such file or directory
cat: lib18233_link.part0004.GTTGGATCCGATATCGCGGCCGCGATATCGGATCCAAC: No such file or directory

The complete output is in the attachment.
It looks like the two linkers are being combined together, one in forward and one in reverse complement.

Am I using the wrong script?

Attached Files

error_massage_chiapet.txt (11.7 KB, 23 views)

**guoliang** · 03-19-2013, 08:44 PM

The errors start from the linker filtering steps. Have you compiled the JAVA programs?

**frymor** · 03-20-2013, 06:56 AM

and how do I do that?

As far as I know, I did it. in the directory, where $CHIAPETPATH is pointing to, I have the directory bin/LGL/chiapet/LinkerFilter.class.

As well as some more files.
What I don't understand and cannot find what it means is this error massage:

Code:

/export/chiapet/bin:/export/chiapet/lib/java/commons-cli-1.2.jar:/export/chiapet/lib/java/guava-r05.jar 
[B]sg.edu.astar.gis.chiapet.LinkerFilter[/B] 
--flip-tail /export/chiapet/prep/lib18233/IHH015_1r56_headseq.txt.part0002.yut7eo 
/export/chiapet/prep/lib18233/IHH015_1r56_tailseq.txt.part0002.jvtNMT lib18233_link.part0002 GTTGGATCCGATATCGCGG GTTGGATCATATATCGCGG 1>/dev/null 2>&1]

Where is that coming from?

The file LinkerFilter.java exits at least two times in this structure. one under 3rd/LGL, one under src/javasg/edu/astar/gis/chiapet.
Than I have the LinkerFilter.class files both in:

Code:

export/chiapet/bin/LGL/chiapet/LinkerFilter.class
export/chiapet/bin/sg/edu/astar/gis/chiapet/LinkerFilter.class

It will be helpful to know which one of the two do I need to take and also how to point the config.py file toward it.

Thanks

Assa

**guoliang** · 03-20-2013, 11:49 AM

With the compiled Java programs, you found different messages.

You'd better to use the LinkerFiltering program from the original package: export/chiapet/bin/sg/edu/astar/gis/chiapet/LinkerFilter.class.

Best regards,
Guoliang

**frymor** · 03-21-2013, 05:42 AM

yes, I managed to do it now, no thanks to the manual.

I've had to put some output command in my python and java files to finally find out the hard-coded commands written in the csa.mapper.py file are not the right one. This was the command:

Code:

#    t = tshell('''{i} {script} --flip-tail {fasq1} {fasq2} {output} 
#                  {link1} {link2} 1>/dev/null 2>&1'''

and I cahnged it to that (with new parameters):

Code:

    t = tshell('''{i} {script} {fasq1} {fasq2} {output} 
                  {link1} {link2}  --bar-start_1 9 --bar-start_2 9 --bar-length_1 2 --bar-length_2 2 --flip-tail 1>tem.txt 2>&1'''

Thanks for help and the advice about compiling the java scripts.

**guoliang** · 03-21-2013, 05:51 AM

Good to know you have fixed the issue. The manual is outdated. A updated version of the manual should be expected.

**frymor** · 03-21-2013, 05:58 AM

Now to the next problem

Until now I worked with the given files from the tool web site (the separate head and tail files).

Now I would like to analyze my data. This data is in single-end fastq format.

I don't have a clue as to how to run the command now, as there is no mentioning of fastq files in the manuals.
the fastq file contains both tags and linker in one long read.

Can I run the program with just the

Code:

--head filename.fastq

option?

Do I need to cut it on my own?

The fasq (this is not a typing error) which were given with the program look like that (the head file):

Code:

@GA001-PE-R00056-26052008-F:1:1:298:699/1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+GA001-PE-R00056-26052008-F:1:1:298:699/1
VVVVVVVVVVVVVVVXXXXXTTTTTMMMMMHHHHHHH
@GA001-PE-R00056-26052008-F:1:1:121:314/1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+GA001-PE-R00056-26052008-F:1:1:121:314/1
VVVVVVVVVVVVVVVUUUUUKNKKSNNKLLECGCGEG

Here I have a shorter read with only one tag+linker pair. Each header ends with the /1 to show that this part is the header.

My files look like that:

Code:

@HWI-ST225:523:D1AY5ACXX:8:1101:1566:2149 1:N:0:GTGGCC
GCATACCCTCCCTGTCTCAGTTGCTGTTGAAAGAAGAAATGTTGGATAAGATATCGCGGCCGCGATATCTTATCCAACGAAGCCAAAACCCTCGCAGTCTG
+
??@DDDDDHHHHHIGIHGHICHEGAHC<HHII9?FB3?F4C<FDD>0?9D*99*??D@;AA9=?'''3@@>@A@>::?B?2?-?CC3<A??8?B@B#####
@HWI-ST225:523:D1AY5ACXX:8:1101:1629:2247 1:N:0:GTGGCC
TGTCCTGTTGCGTGTCTCAGTCAATCGTGAATACATAACATGTTGGATAAGATATCGCGGCCGCGATATCTTATCCAACTAAGGCGATTCTCTCTGCAGCC
+
@@@DFDEEHFHGF1<CFHIIIGFB4CFFHII>@GGGHICGBG?BFH@FGBHGHGIIIFGEHEFC@B?@BC:3@AC@CCCCCCC<3<<<BCCCCCCA@####

In these I have both linkers and both tags in one read. Does the program knows how to handle these files?

I would appreciate any kind of help.

Assa

**guoliang** · 03-21-2013, 07:31 AM

You can't use the Java program in the published ChIA-PET Tool package for the single-read linker filtering.

I have a Java program for single-read linker filtering. Considering where to put it.

**frymor** · 03-21-2013, 07:54 AM

I would sugget putting it in the same directory as the other LinkerFilter file, just with a different name.

But to understand it correctly, If I have a fastq file as in my example above, where the structure of the read is as such

HTML Code:

tag <->linker[A|B]<->linker[A|B]<->tag

I can't use it with this tool?

So basically I need to cut the reads myself between the two linker sequences and add the /1|/2 at the head of the header so that the tool can work with them?

Will that suffice to run the program in a PE mode?

Assa

PS
It will be great if we can test the single-end script for the tool

**frymor** · 03-21-2013, 07:57 AM

Another quick question - how did you generate the head and tail fastq files? Did you have them from a paired-end experiment?

Just to know for the next time - does it make more sense to run a paired-end sequencing when working with ChIA-PET data?

Assa

**frymor** · 03-25-2013, 04:51 AM

next problem

After we manage to solve the problem with the java script (thanks a lot Guoliang), I am encountering another problem with the script batman.py.

This is the last input in my log file:

Code:

2013-03-25 12:04:36,212  INFO [CSA Mapper/chr1] Merging the outputs and converting it to format recognized by ChIA-PET pipeline...
2013-03-25 12:04:36,212 DEBUG [CSA Mapper/chr1] START [cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map]
2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] Execution failed: 'cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map'
2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] > Traceback (most recent call last):
  File "/export/chiapet/src/python/pre/batmap.py", line 43, in <module>
    sys.exit(main())
  File "/export/chiapet/src/python/pre/batmap.py", line 25, in main
    id, hseq, tseq = line.split('\t')
ValueError: too many values to unpack
cat: write error: Broken pipe

This error occurs due to a formatting problem in the *.bat file. It expect it to have a certain format, which is strangely become a different one in specific rows.
This is how it looks in the *.bat file, where the error happens:

Code:

>chr1.15945:1	AAAAAAGGGCTGCAAAATATGTTGGATCCGATATCGC	GAGATATCGGATCCAACATAAATCCACTCAGGCTCAA
H	-	 chr1:61671579;	0	19
H	-	 chr1:88019527;	2	19
H	+	 chr1:14799656;	1	19
H	+	 chr1:94966502;	1	19
@
[B]>chr1.15949:1	---AAAAAAGGGGCGCGATATC	AACGTTGGATCCGATATCG	CG	CG[/B]
@
>chr1.15953:1	AAAAAAGGGGGGATTGAAATGGTTGGATCCGATATCG	CCCGATATCGGATCCAACGCAGCTACTTGGGAGGCTG
H	-	 chr1:79407253;	0	19
H	-	 chr1:51165829;	2	19
H	+	 chr1:238895172;	2	19
H	+	 chr1:116055838;	2	19
@

the header for each part has three elements separated by \t. In the middle header, there are more elements. I can't understand the reason they are there.

This is the part of the *link file which is extracted into this part of the *bat file, where this problem happens (I think).

Code:

GTCCTTCAGAGATGTCTCAA    TTTTGTTATGTTCTCTCCAA
GTCCTTCAGAGATGTCTCAAAAAAGGGGCGCGATATC   CGATATCGGATCCAACGTTTTGTTATGTTCTCTCCAA
AAAAAAGGGGCGCGATATC     AACGTTGGATCCGATATCG
Score: 23
---AAAAAAGGGGCGCGATATC  AACGTTGGATCCGATATCG     CG      CG
GTT------GGATC-CGATATC  ---GTTGGATCCGATATCG
         ||XX| |||||||     ||||||||||||||||
12      19
8       15
3       19
0       16

Can anyone explain this kind of problem?

Thanks for any help.

Assa

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

ChIA-PET tool

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News