Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ChIA-PET tool

    Hi,

    I was wondering if anyone has tried the ChIA-PET Tools pipeline. We are trying to install it now and are having difficulties due to various problems.

    As the manual is not of so much use, we would like to ask people here for their opinion.
    Was anybody able to run it?

    To make my problem somewhat clearer, I would like to know what are the head and tail sequences and how do I get them from the normal fastq file?
    Do I need to split the file on my own or does the tool do it for me?

    I would appreciate any help or suggestions.

    Thanks
    Assa

  • #2
    Hi see here:
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    I would recommend Simon's Seqmonk for this kind of analysis.

    Comment


    • #3
      Thanks, but I have read this post already as well as a (very) few more about ChIA-PET data, but I would like to know how seqmonq works with this kind of data.

      I have two fastq files, which I can't map, as they still have the two linkers inside them.

      Did you work with ChIA-PET data in seqmonq?

      I would appreciate any kind of help.

      Assa

      PS

      In general I find it really amazing, that the technique is already a few years old, but still only a few people are working with it and even less are willing to share their information.

      Comment


      • #4
        Hi Frymor,

        Could you please name the exact questions or issues you involved with ChIA-PET Tool?

        You need the linker filtering script to identify the linker category and extract the real DNA tags.

        Best regards,
        Guoliang

        Comment


        • #5
          Well that is exactly the point.

          As far as I understand, it is something that happens automatically.

          I can't even figure out how to run the program.
          I am trying to work with the head and tail sequences provided by the people who created the tool.

          The problem is that I always get the error massage, that the linker are not found.

          This is the command I use:

          Code:
          python ~/chiapet/src/python/main/csa_mapper.py --asm hg18 --lib lib18233 --proc 4 --head IHH015_1r56_headseq.txt --tail IHH015_1r56_tailseq.txt --run 3-4 --linker linker_a
          The linker I am using are these ones set in the config file:
          'linker_a.1': 'GTTGGATCCGATATCGCGG'
          'linker_a.2': 'GTTGGATCATATATCGCGG'
          But I always get the same error, that the linkers are not found:
          Code:
          cat: lib18233_link.part0002.GTTGGATCCGATATCGCGGCCGCGATATCGGATCCAAC: No such file or directory
          cat: lib18233_link.part0003.GTTGGATCCGATATCGCGGCCGCGATATCGGATCCAAC: No such file or directory
          cat: lib18233_link.part0004.GTTGGATCCGATATCGCGGCCGCGATATCGGATCCAAC: No such file or directory
          The complete output is in the attachment.
          It looks like the two linkers are being combined together, one in forward and one in reverse complement.

          Am I using the wrong script?
          Attached Files
          Last edited by frymor; 03-19-2013, 01:20 PM.

          Comment


          • #6
            The errors start from the linker filtering steps. Have you compiled the JAVA programs?

            Comment


            • #7
              and how do I do that?

              As far as I know, I did it. in the directory, where $CHIAPETPATH is pointing to, I have the directory bin/LGL/chiapet/LinkerFilter.class.

              As well as some more files.
              What I don't understand and cannot find what it means is this error massage:

              Code:
              /export/chiapet/bin:/export/chiapet/lib/java/commons-cli-1.2.jar:/export/chiapet/lib/java/guava-r05.jar 
              [B]sg.edu.astar.gis.chiapet.LinkerFilter[/B] 
              --flip-tail /export/chiapet/prep/lib18233/IHH015_1r56_headseq.txt.part0002.yut7eo 
              /export/chiapet/prep/lib18233/IHH015_1r56_tailseq.txt.part0002.jvtNMT lib18233_link.part0002 GTTGGATCCGATATCGCGG GTTGGATCATATATCGCGG 1>/dev/null 2>&1]
              Where is that coming from?

              The file LinkerFilter.java exits at least two times in this structure. one under 3rd/LGL, one under src/javasg/edu/astar/gis/chiapet.
              Than I have the LinkerFilter.class files both in:
              Code:
              export/chiapet/bin/LGL/chiapet/LinkerFilter.class
              export/chiapet/bin/sg/edu/astar/gis/chiapet/LinkerFilter.class
              It will be helpful to know which one of the two do I need to take and also how to point the config.py file toward it.

              Thanks

              Assa

              Comment


              • #8
                With the compiled Java programs, you found different messages.

                You'd better to use the LinkerFiltering program from the original package: export/chiapet/bin/sg/edu/astar/gis/chiapet/LinkerFilter.class.

                Best regards,
                Guoliang

                Comment


                • #9
                  yes, I managed to do it now, no thanks to the manual.

                  I've had to put some output command in my python and java files to finally find out the hard-coded commands written in the csa.mapper.py file are not the right one. This was the command:
                  Code:
                  #    t = tshell('''{i} {script} --flip-tail {fasq1} {fasq2} {output} 
                  #                  {link1} {link2} 1>/dev/null 2>&1'''
                  and I cahnged it to that (with new parameters):
                  Code:
                      t = tshell('''{i} {script} {fasq1} {fasq2} {output} 
                                    {link1} {link2}  --bar-start_1 9 --bar-start_2 9 --bar-length_1 2 --bar-length_2 2 --flip-tail 1>tem.txt 2>&1'''
                  Thanks for help and the advice about compiling the java scripts.

                  Comment


                  • #10
                    Good to know you have fixed the issue. The manual is outdated. A updated version of the manual should be expected.

                    Comment


                    • #11
                      Now to the next problem

                      Until now I worked with the given files from the tool web site (the separate head and tail files).

                      Now I would like to analyze my data. This data is in single-end fastq format.

                      I don't have a clue as to how to run the command now, as there is no mentioning of fastq files in the manuals.
                      the fastq file contains both tags and linker in one long read.

                      Can I run the program with just the
                      Code:
                      --head filename.fastq
                      option?

                      Do I need to cut it on my own?

                      The fasq (this is not a typing error) which were given with the program look like that (the head file):
                      Code:
                      @GA001-PE-R00056-26052008-F:1:1:298:699/1
                      AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                      +GA001-PE-R00056-26052008-F:1:1:298:699/1
                      VVVVVVVVVVVVVVVXXXXXTTTTTMMMMMHHHHHHH
                      @GA001-PE-R00056-26052008-F:1:1:121:314/1
                      AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                      +GA001-PE-R00056-26052008-F:1:1:121:314/1
                      VVVVVVVVVVVVVVVUUUUUKNKKSNNKLLECGCGEG
                      Here I have a shorter read with only one tag+linker pair. Each header ends with the /1 to show that this part is the header.

                      My files look like that:
                      Code:
                      @HWI-ST225:523:D1AY5ACXX:8:1101:1566:2149 1:N:0:GTGGCC
                      GCATACCCTCCCTGTCTCAGTTGCTGTTGAAAGAAGAAATGTTGGATAAGATATCGCGGCCGCGATATCTTATCCAACGAAGCCAAAACCCTCGCAGTCTG
                      +
                      ??@DDDDDHHHHHIGIHGHICHEGAHC<HHII9?FB3?F4C<FDD>0?9D*99*??D@;AA9=?'''3@@>@A@>::?B?2?-?CC3<A??8?B@B#####
                      @HWI-ST225:523:D1AY5ACXX:8:1101:1629:2247 1:N:0:GTGGCC
                      TGTCCTGTTGCGTGTCTCAGTCAATCGTGAATACATAACATGTTGGATAAGATATCGCGGCCGCGATATCTTATCCAACTAAGGCGATTCTCTCTGCAGCC
                      +
                      @@@DFDEEHFHGF1<CFHIIIGFB4CFFHII>@GGGHICGBG?BFH@FGBHGHGIIIFGEHEFC@B?@BC:3@AC@CCCCCCC<3<<<BCCCCCCA@####
                      In these I have both linkers and both tags in one read. Does the program knows how to handle these files?

                      I would appreciate any kind of help.

                      Assa

                      Comment


                      • #12
                        You can't use the Java program in the published ChIA-PET Tool package for the single-read linker filtering.

                        I have a Java program for single-read linker filtering. Considering where to put it.

                        Comment


                        • #13
                          I would sugget putting it in the same directory as the other LinkerFilter file, just with a different name.

                          But to understand it correctly, If I have a fastq file as in my example above, where the structure of the read is as such

                          HTML Code:
                          tag <->linker[A|B]<->linker[A|B]<->tag
                          I can't use it with this tool?

                          So basically I need to cut the reads myself between the two linker sequences and add the /1|/2 at the head of the header so that the tool can work with them?

                          Will that suffice to run the program in a PE mode?

                          Assa

                          PS
                          It will be great if we can test the single-end script for the tool

                          Comment


                          • #14
                            Another quick question - how did you generate the head and tail fastq files? Did you have them from a paired-end experiment?

                            Just to know for the next time - does it make more sense to run a paired-end sequencing when working with ChIA-PET data?

                            Assa

                            Comment


                            • #15
                              next problem

                              After we manage to solve the problem with the java script (thanks a lot Guoliang), I am encountering another problem with the script batman.py.

                              This is the last input in my log file:
                              Code:
                              2013-03-25 12:04:36,212  INFO [CSA Mapper/chr1] Merging the outputs and converting it to format recognized by ChIA-PET pipeline...
                              2013-03-25 12:04:36,212 DEBUG [CSA Mapper/chr1] START [cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map]
                              2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] Execution failed: 'cat /export/chiapet/prep/chr1/chr1.linker_a.1.linkd4PDdJ.part0001._Qe3V9.bat | /usr/bin/python /export/chiapet/src/python/pre/batmap.py chr1 >> /export/chiapet/prep/chr1/chr1.linker_a.1.link.map'
                              2013-03-25 12:04:37,565 ERROR [CSA Mapper/chr1] > Traceback (most recent call last):
                                File "/export/chiapet/src/python/pre/batmap.py", line 43, in <module>
                                  sys.exit(main())
                                File "/export/chiapet/src/python/pre/batmap.py", line 25, in main
                                  id, hseq, tseq = line.split('\t')
                              ValueError: too many values to unpack
                              cat: write error: Broken pipe
                              This error occurs due to a formatting problem in the *.bat file. It expect it to have a certain format, which is strangely become a different one in specific rows.
                              This is how it looks in the *.bat file, where the error happens:
                              Code:
                              >chr1.15945:1	AAAAAAGGGCTGCAAAATATGTTGGATCCGATATCGC	GAGATATCGGATCCAACATAAATCCACTCAGGCTCAA
                              H	-	 chr1:61671579;	0	19
                              H	-	 chr1:88019527;	2	19
                              H	+	 chr1:14799656;	1	19
                              H	+	 chr1:94966502;	1	19
                              @
                              [B]>chr1.15949:1	---AAAAAAGGGGCGCGATATC	AACGTTGGATCCGATATCG	CG	CG[/B]
                              @
                              >chr1.15953:1	AAAAAAGGGGGGATTGAAATGGTTGGATCCGATATCG	CCCGATATCGGATCCAACGCAGCTACTTGGGAGGCTG
                              H	-	 chr1:79407253;	0	19
                              H	-	 chr1:51165829;	2	19
                              H	+	 chr1:238895172;	2	19
                              H	+	 chr1:116055838;	2	19
                              @
                              the header for each part has three elements separated by \t. In the middle header, there are more elements. I can't understand the reason they are there.

                              This is the part of the *link file which is extracted into this part of the *bat file, where this problem happens (I think).

                              Code:
                              GTCCTTCAGAGATGTCTCAA    TTTTGTTATGTTCTCTCCAA
                              GTCCTTCAGAGATGTCTCAAAAAAGGGGCGCGATATC   CGATATCGGATCCAACGTTTTGTTATGTTCTCTCCAA
                              AAAAAAGGGGCGCGATATC     AACGTTGGATCCGATATCG
                              Score: 23
                              ---AAAAAAGGGGCGCGATATC  AACGTTGGATCCGATATCG     CG      CG
                              GTT------GGATC-CGATATC  ---GTTGGATCCGATATCG
                                       ||XX| |||||||     ||||||||||||||||
                              12      19
                              8       15
                              3       19
                              0       16
                              Can anyone explain this kind of problem?

                              Thanks for any help.

                              Assa

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X