Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Fabrice ODEFREY View Post
    Hi KaiYe,

    I'm working with SOLiD data...and would like to use Pindel but couldn't find anything about it. is Pindel only for Illumina data?
    thanks in advance for your reply.
    Fabrice
    hi Fabrice,

    I don't have a procedure with SOLiD data but would explore this together with you.

    First you need to convert the data from color space to sequence space.

    Second, convert the sequence to the correct strand. Pindel assume the data is paired-end as illumina so that the reads are facing each other rather than on the same strand.

    You may then try my sam2pindel.cpp to extract reads and run Pindel.

    Please visit https://trac.nbic.nl/pindel and register as a Pindel user.

    Kai

    Comment


    • #17
      thanks a lot Kai for your quick reply.
      for the second step is there a tool to do that?
      thanks again!
      Fabrice

      Comment


      • #18
        Originally posted by Fabrice ODEFREY View Post
        thanks a lot Kai for your quick reply.
        for the second step is there a tool to do that?
        thanks again!
        Fabrice
        The second step is rather straight forward but requires knowledge on your SOLiD data about the strand. I know that some SOLiD data satisfies the second requirement without any modification but the others need additional a converting step.

        You may need to write a script to do that.

        Kai

        Comment


        • #19
          alright, that's what I thought, thanks!
          Fabrice

          Comment


          • #20
            Originally posted by KaiYe View Post
            I will send you my source code via email.
            Hi KaiYe,

            I am having the same problem as jtjli (http://seqanswers.com/forums/showthr...0820#post30820). I did as follows:

            1) Download all files from http://www.ebi.ac.uk/~kye/pindel/v_0.2.0/. I aligned with BWA, processed with samtools and filtered by MAPQ quality (<30).
            2) ran bam2pindel.pl on one paired-end samples (aligned using BWA). My bam file is sorted and duplicates are removed but it does not have the header expected by your program, so i used the -om to force the script to run. A file for each chromosome was generated: e.g. myprefix.1.txt (chr1)
            3) I downloaded your source code from sourceforge (with svn) and compiled your pindel from scratch. It seems to work.
            4) I run the following comand
            /home/Pindel_source_v0.2.2/pindel -f /home/hg19.fa -i /s_4_QC_sort_pind.bam_chr1.txt -o ./s4 -c chr1 empty

            but whichever chromosome i try, i always get "There are no reads for this chromosome":

            BreakDancer events: 0
            Processing chromosome: chr10
            Skipping chromosome: chr10
            ...

            Processing chromosome: chr1
            Chromosome Size: 249250621
            26926 10000
            Looking at chromosome chr1 bases 0 to 10000000.
            BinBorder 0 10000000
            There are no reads for this bin.
            Looking at chromosome chr1 bases 10000000 to 20000000.
            BinBorder 10000000 20000000
            There are no reads for this bin.
            ....
            Loading genome sequences and reads: 0 seconds.
            Mining, Sorting and output results: 0 seconds.

            What I am doing wrong? How did you solve jtjli's problem?
            Last edited by chariko; 05-09-2011, 11:36 PM. Reason: Incomplete

            Comment


            • #21
              Originally posted by chariko View Post
              Hi KaiYe,

              I am having the same problem as jtjli (http://seqanswers.com/forums/showthr...0820#post30820). I did as follows:

              1) Download all files from http://www.ebi.ac.uk/~kye/pindel/v_0.2.0/. I aligned with BWA, processed with samtools and filtered by MAPQ quality (<30).
              2) ran bam2pindel.pl on one paired-end samples (aligned using BWA). My bam file is sorted and duplicates are removed but it does not have the header expected by your program, so i used the -om to force the script to run. A file for each chromosome was generated: e.g. myprefix.1.txt (chr1)
              3) I downloaded your source code from sourceforge (with svn) and compiled your pindel from scratch. It seems to work.
              4) I run the following comand
              /home/Pindel_source_v0.2.2/pindel -f /home/hg19.fa -i /s_4_QC_sort_pind.bam_chr1.txt -o ./s4 -c chr1 empty

              but whichever chromosome i try, i always get "There are no reads for this chromosome":

              BreakDancer events: 0
              Processing chromosome: chr10
              Skipping chromosome: chr10
              ...

              Processing chromosome: chr1
              Chromosome Size: 249250621
              26926 10000
              Looking at chromosome chr1 bases 0 to 10000000.
              BinBorder 0 10000000
              There are no reads for this bin.
              Looking at chromosome chr1 bases 10000000 to 20000000.
              BinBorder 10000000 20000000
              There are no reads for this bin.
              ....
              Loading genome sequences and reads: 0 seconds.
              Mining, Sorting and output results: 0 seconds.

              What I am doing wrong? How did you solve jtjli's problem?
              hi,

              You should use -p for extracted reads. -i is for configuration file.

              Pindel accepts two types of input:
              1. extracted reads with sam2pindel or bam2pindel, using -p
              2. a configure file for a list of BAMs, using -i
              the format of the configure file
              /path/to/bam_1/BAM_1 400 sample_1
              /path/to/bam_2/BAM_2 400 sample_2
              ...
              /path/to/bam_n/BAM_n 400 sample_n

              you may also use -c chrN:start-end to specify a small region of the region to parallelize the computation.

              Kai

              Comment


              • #22
                sam2pindel

                Originally posted by KaiYe View Post
                Would you please inform me your email address? I have cpp code to extract reads from sam files for Pindel.

                Thanks.

                Hi KaiYe,

                I'm trying to convert my BAM file (illumina single-end reads, aligned using Novoalign) to the pindel format, using sam2pindel but the output file is empty.

                I used the following command:
                ./sam2pindel novo.sam Output4Pindel.txt 300 test 0

                What am I doing wrong?


                Thanks in advance for your reply,

                Inbar

                Comment


                • #23
                  Originally posted by icg View Post
                  Hi KaiYe,

                  I'm trying to convert my BAM file (illumina single-end reads, aligned using Novoalign) to the pindel format, using sam2pindel but the output file is empty.

                  I used the following command:
                  ./sam2pindel novo.sam Output4Pindel.txt 300 test 0

                  What am I doing wrong?


                  Thanks in advance for your reply,

                  Inbar
                  hi Inbar,

                  sam2pindel requires the mate information stored in each record. I guess novoalign doesn't report that.

                  can you provide a few lines of sam records?

                  Kai

                  Comment


                  • #24
                    Originally posted by KaiYe View Post
                    hi,

                    You should use -p for extracted reads. -i is for configuration file.

                    Pindel accepts two types of input:
                    1. extracted reads with sam2pindel or bam2pindel, using -p
                    2. a configure file for a list of BAMs, using -i
                    the format of the configure file
                    /path/to/bam_1/BAM_1 400 sample_1
                    /path/to/bam_2/BAM_2 400 sample_2
                    ...
                    /path/to/bam_n/BAM_n 400 sample_n

                    you may also use -c chrN:start-end to specify a small region of the region to parallelize the computation.

                    Kai
                    I finally managed it to work. After following your instructions I had to change also my input files (those generated by bam2pindel) because when comparing them with the demodata, mines had only one "@" in each line instead of 2 which had the demodata. I don´t know why did that happen because I obtained those input files with bam2pindel but anyway now it worked

                    Thanks a lot

                    Comment


                    • #25
                      Hi Kai,

                      Thank you for the quick reply!

                      Here's the first 20 lines of my sam file.

                      Many thanks,
                      Inbar

                      @HD VN:1.0 SO:unsorted
                      @PG ID:novoalign VN:V2.07.05 CL:novoalign -d NC_007530.fna.nix -f output4.fastq -r ALL -o SAM
                      @SQ SN:gi|50196905|ref|NC_007530.2| AS:NC_007530.fna.nix LN:5227419
                      @SQ SN:gi|47566322|ref|NC_007322.2| AS:NC_007530.fna.nix LN:181677
                      @SQ SN:gi|50163691|ref|NC_007323.3| AS:NC_007530.fna.nix LN:94830
                      4:1:1169:930:Y 4 * 0 0 * * 0 0 NAAACAGTGAAGTATATAACGTACATGTCNAANNNNNNNNNNNNNNGNNNNNNNNNANNNNNNNNNNNNNNNNN #+,)-23444@@8@@@@@@@C@@@C################################################# PG:Z:novoalign ZS:Z:NM
                      4:1:1205:937:Y 0 gi|50196905|ref|NC_007530.2| 3730834 150 1S73M * 0 0 NAAAGAAGAATTACATCGCCATCTGTAGAATGAGCATAAGCTTTCACTACCGCTTCATCTAAAGTATCGACACT #(()'3..22@@@@@@@@7@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ PG:Z:novoalign AS:i:10 UQ:i:10 NM:i:0 MD:Z:73
                      4:1:1231:930:Y 16 gi|50196905|ref|NC_007530.2| 899640 150 73M1S * 0 0 GAAAAGACCGAATTATCAGAATGTGTCGAATCTTCTTTTGAGAAAGTTCTTGATAACGAATGGTTTTGTATAGN 22CC@@@@@@C@@C@@@@C@C@CCC@@@@@@222@@@@@CC@@@C@C@@C@@CCC@@@@@CC2257777+000# PG:Z:novoalign AS:i:6 UQ:i:6 NM:i:0 MD:Z:73
                      4:1:1259:938:Y 4 * 0 0 * * 0 0 NAGCAAGGCAATGTAAAAGGCGAAAGACAAACAGCGGAAAGAGAAATTGAAATACAAAATAAATTAAGAAATAC ########################################################################## PG:Z:novoalign ZS:Z:QC
                      4:1:1290:941:Y 0 gi|47566322|ref|NC_007322.2| 169212 150 1S73M * 0 0 NGAAAATGCTCTTCAACTATTTGTATAGTCTATGTCACTCTTTTTTGGACTTTCCATATTGGGAGGGATGATTA #(*()00322@@@@C@C@@C@C@C@C@@@C@@@C@@@@@@@@@2222@@@C@C@@@@@C@@@@:@222:<@@@@ PG:Z:novoalign AS:i:10 UQ:i:10 NM:i:0 MD:Z:73
                      4:1:1307:933:Y 0 gi|50196905|ref|NC_007530.2| 4528377 150 1S73M * 0 0 NACGTAGTGGAATAGTTGAAAATTTAGATGAAGCTGATCCAGAAATTATTTTCTACACAAAAAAGCTCAGAGCA #(.((*,*))77755/0/00@@@@@@@@@@@@@:@@@@@@57055@@@@@22222@@@@@@2222@@@@@@7@@ PG:Z:novoalign AS:i:13 UQ:i:13 NM:i:0 MD:Z:73
                      4:1:1349:932:Y 0 gi|50196905|ref|NC_007530.2| 5190228 150 1S73M * 0 0 NGAACTATTTGAAAGATTATCTACGACTATAATTTTATAATTATTATTTAATAATTCTACACATGTATGACTAC #)*,.3103.@@@7@3<<<:@@@@@@@@@@@@@@@@@22@@@@@@@@@22@@@@@@@@@@<<<:::::::@@@@ PG:Z:novoalign AS:i:8 UQ:i:8 NM:i:0 MD:Z:73
                      4:1:1427:930:Y 4 * 0 0 * * 0 0 NATGTATTTGAATTATAACGTGATTCAATTTGGTTCTGGCGCAAGGAACCCAAGGGAGTTATAACTAACTCCCT ########################################################################## PG:Z:novoalign ZS:Z:QC
                      4:1:1455:932:Y 16 gi|50196905|ref|NC_007530.2| 4436146 150 73M1S * 0 0 CAAGACCTCCGGAATATGCTAATACAACTTTTTTCTTCTCCATTTTGCATCCCCCTAAAGAATAAATATTCATN @C@@@@C@CC@CC@@C@C@@@@@@@@@22222@@@@@@@@CC@@@CC@@@CCC@@22CC22C@C55566*(,,# PG:Z:novoalign AS:i:8 UQ:i:8 NM:i:0 MD:Z:73
                      4:1:1503:932:Y 0 gi|50196905|ref|NC_007530.2| 5174187 150 1S73M * 0 0 NAGAAGGAGAAACTTCAAATACAGTGAAACACCGCGATGGCCGTGTTTATGCGGAAGTAAGTGCAAAACTAACA #(*)&)*)*+<77<:58777:::::<:<<:8888885888:<:<:<<3<<:::1:@@@@@@@@@@@@@@@::<< PG:Z:novoalign AS:i:15 UQ:i:15 NM:i:0 MD:Z:73
                      4:1:1513:948:Y 16 gi|50196905|ref|NC_007530.2| 2854792 150 73M1S * 0 0 TGTAGAAAGTGAAAGTAAAAAAGATTCCAAAGACGCTCGTCCTTTTTCTCTATGAAATTCTTCTGCAAAATAAN C@C@C@CC@@@@@@@2222@C@@CC@@@C@C@@C@C@C@@@222C@@@C@@CCC@CCC@@@@@C71115-///# PG:Z:novoalign AS:i:6 UQ:i:6 NM:i:0 MD:Z:73
                      4:1:1536:944:Y 16 gi|50163691|ref|NC_007323.3| 77515 150 73M1S * 0 0 TTGCTTCAAGAAGGCGAAGAACAAATTTCTCTTTTCGATAATGTCACGCAACGAGAACAAGAAGTAAAGCTTAN @CC@@@@C@@CCC@@@C@22C@@@@@@@22@@C@@C@CC@@@@@@@@@@C@C@CC@@@@C@@C@58454,0*,# PG:Z:novoalign AS:i:7 UQ:i:7 NM:i:0 MD:Z:73
                      4:1:1696:942:Y 4 * 0 0 * * 0 0 NCTTATCTGCAATTGAAGGAATTAAAGTAGACAAACATTCAACTGGTGGTGTTGGTGATACAACAACATTAGTA ########################################################################## PG:Z:novoalign ZS:Z:QC
                      4:1:1724:952:Y 0 gi|50196905|ref|NC_007530.2| 641128 150 1S73M * 0 0 NAGATCTATTTTCGATAAAAATAACGAATGAAATTCCTACAATTGTGATGGACCAGAGAACGCCGACAAATGTA #+++-32223C22CC@@CC222222@@@@0:::::CC@@@CC@C@@@C@@CC@@CCC@CC@C@C@C@@@@@@@@ PG:Z:novoalign AS:i:6 UQ:i:6 NM:i:0 MD:Z:73
                      4:1:1766:932:Y 4 * 0 0 * * 0 0 NCATTAAGAAGTTTCATCATGTCCGCTGTAAACTGTTGTTCTAGTTCGTTACTTAAGACGCTTCCCTTTGAAAG ########################################################################## PG:Z:novoalign ZS:Z:QC

                      Comment


                      • #26
                        Originally posted by chariko View Post
                        I finally managed it to work. After following your instructions I had to change also my input files (those generated by bam2pindel) because when comparing them with the demodata, mines had only one "@" in each line instead of 2 which had the demodata. I don´t know why did that happen because I obtained those input files with bam2pindel but anyway now it worked

                        Thanks a lot
                        one @ is enough.

                        Comment


                        • #27
                          pindel_filter

                          Hi Kai,

                          Thanks again to you and Eric_Wubbo for the new pindel and pindel2vcf. Is it still a good idea to use the filter of the bam2pinel.pl result files before using the new pindel?

                          Thanks,

                          Dex

                          Comment


                          • #28
                            Originally posted by DexterDuncan View Post
                            Hi Kai,

                            Thanks again to you and Eric_Wubbo for the new pindel and pindel2vcf. Is it still a good idea to use the filter of the bam2pinel.pl result files before using the new pindel?

                            Thanks,

                            Dex
                            If you use BAM files as input, you certainly don't have to use any filtering. If bam2pindel.pl is used first to extract reads, you can directly use it as input.

                            So you don't need to use filtering.

                            Kai

                            Comment


                            • #29
                              new pindel output format

                              Hi Kai,

                              Would you briefly explain the new pindel out put for version 0.2.3 below?

                              Thanks,

                              Dex

                              ####################################################################################################
                              0 D 2 NT 0 "" ChrID 20 BP 74310 74313 BP_range 74310 74316 Supports 19 18 + 9 8
                              - 10 10 S1 110 SUM_MS 1016 1 NumSupSamples 1 1 blood 9 8 10 10

                              Comment


                              • #30
                                I will be more explicit.

                                Hi Kai,

                                Here is one SV from the new version of pindel output. Could you explain how we get the number of normal reads versus the SV reads? Also, the header now is different with the new version for each SV, and for some reason, it is not coming to me what it all means. I must admit, I need to be become more educated with SVs.

                                ####################################################################################################
                                4 D 1 NT 0 "" ChrID 20 BP 34005 34007 BP_range 34005 34023 Supports 11 11 + 2 2 - 9
                                9 S1 30 SUM_MS 660 2 NumSupSamples 2 2 COLO-829 2 2 5 5 COLO-829-BL 0 0 4 4
                                CAACCAGATATGCCTCCTTACAAGAGATTCTTAAGGGAGCTCTAAACCTACAATCAAAAGAACAACACCTGCTACaAAAAAAAAAAAAAAAACATACTTATGCACATAAAGACACTATAAAGCAACTACACTATCAAGTCTACATAATAA
                                CTTAAGGGAGCTCTAAACCTACAATCAAAAGAACAACACCTGCTAC AAAAAAAAAAAAAAAACATACTTATGCAC - 34175 60 COLO-829 @@EAS188_62:6:20:111:1106/2
                                CAAAAAAACAACACCTGCTAC AAAAAAAAAAAAAAAACATACTTATGCACATAAAGACACTATAAAGCAACTACA + 33660 60 COLO-829 @@EAS188_62:3:40:104:1946/1
                                CTAAACCTACAATCAAAAGAACAACACCTGCTAC AAAAAAAAAAAAAAAACATACTTTTGCACATAAAGACACTA - 34184 60 COLO-829 @@EAS139_60:7:37:896:889/2
                                AACAACACCTGCTAC AAAAAAAAAAAAAAAACATACTTATGCAAAAAAAAACACTATAAAGCAACTACACTATCA - 34396 60 COLO-829 @@EAS139_60:5:24:381:681/1
                                AAACCTACAATCAAAAGAACAACACCTGCTAC AAAAAAAAAAAAAAAACAAACTTATGCACATAAAGACACTATA - 34196 60 COLO-829 @@EAS131_8:8:43:784:1438/2
                                TAAACCTACAATCAAAAGAACAACACCTGCTAC AAAAAAAAAAAAAAAACATACTTATGCACATAAAGACACTAT - 34388 60 COLO-829 @@EAS131_6:8:39:243:1719/1
                                CTCTAAACCTACAATCAAAAGAACAACACCTGCTAC AAAAAAAAAAAAAAAACATACTTATGCACATAAAGACAC + 33667 60 COLO-829 @@EAS25_5:1:80:1493:28/1
                                TCAAAAGAACAACACCTGCTAC AAAAAAAAAAAAAAAACATACTTATGCACATAAAGACACTATAAAGCAACTAC - 34198 60 COLO-829-BL @@USI-EAS39_8289_FC30GCV_PE:5:18:1550:123/1
                                CTTAAGGGAGCTCTAAACCTACAATCAAAAGAACAACACCTGCTAC AAAAAAAAAAAAAAAACATACTTATGCAC - 34175 60 COLO-829-BL @@HWI-EAS300_8282_FC30BVC_PE:1:15:777:1187/1
                                TAAGGGAGCTCTAAACCTACAATCAAAAGAACAACACCTGCTAC AAAAAAAAAAAAAAAACATACTTATGCACAT - 34164 60 COLO-829-BL @@HWI-EAS255_8291_FC30GRN_PE:2:73:533:1356/2
                                GAGCTCTAAACCTACAATCAAAAGAACAACACCTGCTAC AAAAAAAAAAAAAAAACATACTTATGCACATAAAGA - 34192 60 COLO-829-BL @@HWI-EAS138_4_FC30GP8:4:54:1227:1320/2


                                Thanks for all of your assistance,

                                Dex

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X