Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • boetsie
    Senior Member
    • Feb 2010
    • 245

    #61
    Originally posted by VidJa View Post
    Hi Boetie,

    Do you plan to include the use of SAM/BAM as output instead of the default bowtie, that way you clould dropin all sam/bam capable aligners, like BWA (for longer reads with the sw option), Smalt http://www.sanger.ac.uk/resources/software/smalt/
    or Stampy http://www.well.ox.ac.uk/project-stampy
    and probably a full bag of other aligners.
    After some testing with Bowtie, i came to the conclusion that going for a SAM/BAM output instead of the default bowtie output is much slower.

    SAM output will generate output for all reads, no matter they are mapped against a contig or not. Bowtie output only contains reads that are mapped. Since SSPACE only contains a fraction of the contigs (only begin and ends of each contig), the number of reads that maps is also low. SSPACE goes through all output lines, thus the higher the number of output lines, the slower the program goes.

    Say there are 1M reads, and only 10.000 reads map. Bowtie will generate 10.000 output lines, while SAM produces 1M output lines, making it 100 times slower to read in the output file.

    Therefore, I'll make it possible to insert tab-delimited files with information about paired reads in the format;

    <read1_tig> <read1_start> <read1_end> <read2_tig> <read2_start> <read2_end>

    I will provide a script that can convert SAM output files to a TAB file. This way, all SAM capable aligners can be used.

    In addition, multiple TAB files of different libraries can be given, as well as a combination of TAB and normal paired-reads. For example; if you have a paired-end library of 200bp and one with 500bp. For both libraries you map the reads to the contigs, generating two SAM files, which you can convert to .tab file. Both could be given to SSPACE, first SSPACE scaffolds the contigs using the 200bp library. Next, the positions of the contigs are updated by determining their new position within the scaffolds. Then, the 500bp library is used for scaffolding the previous scaffolds generated with the 200bp library.

    Still in testing fase though, but the results till now look ok. I get similar results if i input a paired-end fastQ file, or a .tab file.

    I'll keep you updated!

    Kind regards,
    Boetsie

    Comment

    • jstjohn
      Member
      • Jun 2010
      • 35

      #62
      Originally posted by boetsie View Post

      SAM output will generate output for all reads, no matter they are mapped against a contig or not. Bowtie output only contains reads that are mapped. Since SSPACE only contains a fraction of the contigs (only begin and ends of each contig), the number of reads that maps is also low. SSPACE goes through all output lines, thus the higher the number of output lines, the slower the program goes.
      Boetsie
      Hi Boetsie,
      I don't know about all mappers that output SAM format, but BWA for example can output SAM to stdout if you don't give it the -o outfile.sam option. If you are only interested in reads that map, or reads where only one mate of the pair maps, you can either pipe the output through samtools and filter on the flag, or pipe through a perl or awk one liner that filters the sam output on the flag. The flag contains info about whether the read maps and/or the mate maps.
      -John

      Comment

      • jstjohn
        Member
        • Jun 2010
        • 35

        #63
        Originally posted by jstjohn View Post
        BWA for example can output SAM to stdout if you don't give it the -o outfile.sam option.
        -John
        Actually on most *nix systems including mac and every linux I have worked on so far you can use "/dev/fd/0" as the file name when you want to output something to standard out and the program doesn't give that option.

        For example:
        echo "hello world">/dev/fd/0

        Comment

        • westerman
          Rick Westerman
          • Jun 2008
          • 1104

          #64
          Originally posted by jstjohn View Post
          Actually on most *nix systems including mac and every linux I have worked on so far you can use "/dev/fd/0" as the file name when you want to output something to standard out and the program doesn't give that option.

          For example:
          echo "hello world">/dev/fd/0
          Shouldn't that be /dev/fd/1 ?? file descriptor #0 is usually stdin. Granted /0 works but I think that this is a side effect and not something to be relied on.

          Besides that example proves nothing since you are taking stdout and (if you use /dev/fd/1) putting it into stdout. What you need for proof is a program like:

          Code:
          #!/usr/bin/perl
          open (TEST, '>', $ARGV[0]) or die "Can not open file $ARGV[0]\n";
          print TEST "This is a test\n";
          exit;
          If I name the above 't.pl' then I can run the following 4 examples:

          1) if there is no file name given then nothing is output; either in a file nor in stdout.

          Code:
          rm -f test.tmp; ls test.tmp; ./t.pl ; ls test.tmp
          ls: test.tmp: No such file or directory
          ls: test.tmp: No such file or directory
          2) If a file name is given then the file is created but no stdout.

          Code:
          rm -f test.tmp; ls test.tmp; ./t.pl test.tmp ; ls test.tmp
          ls: test.tmp: No such file or directory
          test.tmp
          3) If I give the stdout file descriptor as the file name then I get text on stdout but not in the file.

          Code:
           rm -f test.tmp; ls test.tmp; ./t.pl /dev/fd/1 ; ls test.tmp
          ls: test.tmp: No such file or directory
          This is a test
          ls: test.tmp: No such file or directory
          4) Likewise I can use the stdin file descriptor

          Code:
          rm -f test.tmp; ls test.tmp; ./t.pl /dev/fd/0 ; ls test.tmp
          ls: test.tmp: No such file or directory
          This is a test
          ls: test.tmp: No such file or directory
          5) But the above to descriptor #0 is really a side effect as can be shown by piping stdout to another program. In this case I'll use 'od' (octal dump). Using descriptor #1 (stdout -- the recommended descriptor) I get 'od' output.

          Code:
           rm -f test.tmp; ls test.tmp; ./t.pl /dev/fd/1 | od -c; ls test.tmp
          ls: test.tmp: No such file or directory
          0000000   T   h   i   s       i   s       a       t   e   s   t  \n
          0000017
          ls: test.tmp: No such file or directory
          6) Whereas if I use the incorrect descriptor #0 (stdin) I do not get proper output but instead just see the text.
          Code:
           rm -f test.tmp; ls test.tmp; ./t.pl /dev/fd/0 | od -c; ls test.tmp
          ls: test.tmp: No such file or directory
          This is a test
          0000000
          ls: test.tmp: No such file or directory
          Whew! Now back to what I was doing. Which is running sspace in various configurations.
          Last edited by westerman; 06-15-2011, 07:48 AM. Reason: minor correction

          Comment

          • jstjohn
            Member
            • Jun 2010
            • 35

            #65
            Nice tests! Good to see how that /dev/fd stuff works.

            Comment

            • VidJa
              Junior Member
              • Apr 2010
              • 7

              #66
              Hi Boetsie,

              great explanation, I expected sam/bam to be slower. Another nice aligner which could be considered is PASS: http://pass.cribi.unipd.it/cgi-bin/pass.pl
              Very fast and has its own output type or gff, besides sam/bam. It has an option to attempt contig association and handles long reads.

              Comment

              • boetsie
                Senior Member
                • Feb 2010
                • 245

                #67
                Thank you for the suggestion Vidja, appreciate your help. However, right now i think including an independent script for conversion of .sam to .tab would be the best idea. SAM input can be possible, but I have to find a way to find the paired reads. With bowtie I do this by including /1 and /2 to the reads, so I know which is the left and right read (hence; i do a single read mapping, because Bowtie only finds read-pairs on same contig). Readnames differ for each platform, so I should probably let the user include the postfix of both reads. That's why i think at the moment .tab is the best way.

                Conclusion; have to think about it

                Comment

                • KanyeDidIt
                  Junior Member
                  • Sep 2010
                  • 8

                  #68
                  Hi Boetsie,

                  Just out of curiosity is this script written in 32 bit perl? I see an issue with the program not taking beyond a certain level of memory.

                  Comment

                  • christinawu2008
                    Member
                    • Feb 2011
                    • 13

                    #69
                    problem with SSPACE

                    Hi

                    I have a problem of using SSPACE. the process was failed at bowtie-build. I can use bowtie to do index with my genome file, but it seems not working in SSPACE... What's the problem?

                    Thanks

                    Comment

                    • boetsie
                      Senior Member
                      • Feb 2010
                      • 245

                      #70
                      Originally posted by christinawu2008 View Post
                      Hi

                      I have a problem of using SSPACE. the process was failed at bowtie-build. I can use bowtie to do index with my genome file, but it seems not working in SSPACE... What's the problem?

                      Thanks
                      I quote an older post in this thread;
                      The problem was mainly solved by going through the directory were the main SSPACE script (SSPACE_v1-x.pl) and folders are stored using the command line. Then, do one of the following;

                      chmod a+x bowtie/*

                      or

                      chmod 777 *

                      in your command line.

                      If this won't work, then you may try to download the newest Bowtie version at http://sourceforge.net/projects/bowt...bowtie/0.12.7/

                      Replace the files in the bowtie folder with the ones you've downloaded.
                      Kind regards,
                      Boetsie

                      Comment

                      • boetsie
                        Senior Member
                        • Feb 2010
                        • 245

                        #71
                        Originally posted by KanyeDidIt View Post
                        Hi Boetsie,

                        Just out of curiosity is this script written in 32 bit perl? I see an issue with the program not taking beyond a certain level of memory.
                        Yes, i think it is written in 32 bit..

                        Comment

                        • Ashu
                          Member
                          • Aug 2010
                          • 15

                          #72
                          Hi Boetsie,
                          I am trying to use SSPACE, for scaffolding with out any success, I have the following questions for you:
                          1 Does it use contigs from ABySS assemblyor CLCbio, and add paired end and mate pair??
                          2 Why am I getting empty result, with no improvement in N50 or anything. I have tested every possible option, -k2 -k5
                          can you please give some suggestions about it.

                          Thank you,
                          with kind regards,
                          Ashu

                          Comment

                          • boetsie
                            Senior Member
                            • Feb 2010
                            • 245

                            #73
                            Hi again Ashu,

                            1. yes, this is possible. I've used both assemblies for scaffolding before, including paired-end and/or matepair reads.

                            2. Could you maybe send me your libraryfile and summaryfile as private message, so i can check what is going on.

                            Regards,
                            Boetsie

                            Comment

                            • user1313
                              Junior Member
                              • May 2011
                              • 5

                              #74
                              Hi boetsie,

                              I tried SSPACE, it is pretty good. However, i want to use sequencing technologies besides illumina, for scaffolding. Is the *.tab file for that? Also, i have the assemblies already in *.caf, *.maf and *.ace formats. Can i extract info on the mates from these (former two) files and use it in SSPACE? Do you have any scripts that can do that properly?

                              Sincerely yours
                              Nestor

                              Comment

                              • DeNovoG
                                Junior Member
                                • May 2010
                                • 7

                                #75
                                Hello there,

                                Great tool! my organellar assembly improved quite nicely.

                                Now that I have the scaffolds I wanted to ask here how may upload them to genBank to comply with their requisite of uploading an AGP file for scaffolds containing: contig and amount of Ns linking them:



                                Is there a way or any suggestions on how may achieve this AGP files using sspace output files? It will be great.

                                Thank you very much for any info,

                                Best Regards,

                                G

                                Comment

                                Latest Articles

                                Collapse

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-09-2026, 11:58 AM
                                0 responses
                                22 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-05-2026, 10:09 AM
                                0 responses
                                28 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-04-2026, 08:59 AM
                                0 responses
                                39 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                61 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...