Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    OK. I think that problem was simply the viewer telling me that read was a problem. I've got past that, but now get a

    [main_samview] fail to get the reference name. Continue anyway.

    error, and nothing in the output.

    Does anyone know what that means? It happens during the sharpenedges part of the script.

    cheers

    Comment


    • #17
      Hello!

      I'm trying to make structural variation calls from 1000 genomes data. I thought I might try breakway but ran into problems :/. When calculating PED values with the dnaa script dbampairedenddist I need to specify a certain range based on predicted PED from library generation. As far as I know the 1000 genomes bam-files take input from several different raw read files so how can I know which range to choose? Or does this make 1000 genomes data incompatible with breakway SV detection?

      Thank you very much!

      Comment


      • #18
        Originally posted by Michael.James.Clark View Post
        While I haven't tested it on such datasets, it ought to work on them. The key will be in the reference genome used.

        Breakway functions by looking for clusters of aberrantly spaced paired reads, so the key is to have an appropriate reference genome for it to compare to.

        For exon capture, it should work with the normal reference genome just as well as it will with whole genomes.

        For RNAseq, and I'm not an expert so I welcome other suggestions, the transcriptome will probably be best used as the reference genome.
        I'm very interested in using this with RNA-Seq. I figure aligning against transcriptome is an issue because it limits the size of the indel that you can have (e.g. no 2-transcript mappings, where one end maps to one transcript and the other maps to a completely different transcript).

        Comment


        • #19
          Originally posted by orcy View Post
          OK. I think that problem was simply the viewer telling me that read was a problem. I've got past that, but now get a

          [main_samview] fail to get the reference name. Continue anyway.

          error, and nothing in the output.

          Does anyone know what that means? It happens during the sharpenedges part of the script.

          cheers
          Sorry for the late reply on this.

          Sharpenedges uses samtools as part of its activity, and this is a samtools error.

          Make sure that you've properly indexed the BAM file, and that the file is in BAM format.

          If you still have a problem, please run samtools view and post an example read here for me to look at.
          Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
          Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
          Projects: U87MG whole genome sequence [Website] [Paper]

          Comment


          • #20
            Originally posted by megnetz View Post
            Hello!

            I'm trying to make structural variation calls from 1000 genomes data. I thought I might try breakway but ran into problems :/. When calculating PED values with the dnaa script dbampairedenddist I need to specify a certain range based on predicted PED from library generation. As far as I know the 1000 genomes bam-files take input from several different raw read files so how can I know which range to choose? Or does this make 1000 genomes data incompatible with breakway SV detection?

            Thank you very much!
            Breakway works on a library-by-library basis. One can combine libraries with very similar PEDs in a single analysis and it will still function.

            If you have libraries with very different PEDs, it will have difficulty working correctly. You can isolate reads with very different PEDs from each other and run it independently on each one, then combine the results, though. This is what I have done.

            I'm not very familiar with 1000 genomes data, but if they use the read group flag in their BAM files with the library field clarifying which library specific RGs are sourced from, you can use that to isolate the reads.

            Sorry I can't be more help--Breakway was designed to function optimally on a sample-by-sample basis, not on a batch of samples.
            Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
            Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
            Projects: U87MG whole genome sequence [Website] [Paper]

            Comment


            • #21
              Originally posted by Lee Sam View Post
              I'm very interested in using this with RNA-Seq. I figure aligning against transcriptome is an issue because it limits the size of the indel that you can have (e.g. no 2-transcript mappings, where one end maps to one transcript and the other maps to a completely different transcript).
              True, it would be blind to fusion transcripts if you were to use transcriptome.

              An alternative might be using all possible fusions as a reference.

              I believe Tophat/Cufflink are very popular for this type of analysis, so you may want to take a look at them!
              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
              Projects: U87MG whole genome sequence [Website] [Paper]

              Comment


              • #22
                I'll try that, thanks!

                Comment


                • #23
                  Hi Michael,

                  Shouldn't Breakway.ReadCluster.pl find both clusters of reads implicating insertions or deletions exceeding the floor-pe-length and ceiling-pe-length and translocations? In a quick test of some Illumina mate-pair data you only see the intra-chromosomal events but not the inter-chromosomal events event though a quick parsing of the dtranslocations table clearly identifies positive control events that should meet the -mincs and -maxcs options used.

                  Comment


                  • #24
                    Originally posted by Jon_Keats View Post
                    Hi Michael,

                    Shouldn't Breakway.ReadCluster.pl find both clusters of reads implicating insertions or deletions exceeding the floor-pe-length and ceiling-pe-length and translocations? In a quick test of some Illumina mate-pair data you only see the intra-chromosomal events but not the inter-chromosomal events event though a quick parsing of the dtranslocations table clearly identifies positive control events that should meet the -mincs and -maxcs options used.
                    Hi Jon,

                    Sorry for the late reply, I've been otherwise occupied, but I hope I can help solve this with you.

                    I'm a little bit unclear on what you're seeing. Are you observing that an event that should pass your parameters is not being reported by Breakway? If so, would it be possible to provide the library design (insert size, read length, sequence depth, etc.), parameters you used in dtranslocations and Breakway and the segment of the dtranslocations file in question?

                    Usually if this type of thing happens, I find it's due to the dtranslocations spot being sporadic to the point that the event doesn't meet the minimum requirements for Breakway. These minimums are determined by mincs/maxcs, so you can decrease mincs and increase maxcs and often they will then come through.
                    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                    Projects: U87MG whole genome sequence [Website] [Paper]

                    Comment


                    • #25
                      Dear Michael,

                      We are trying to instal BreakAway. Did successfully install BFast, SAMTools in the root as suggested but are having issues during installation of DNAA. During ./configure, it shows fatal: Not a git repository and when we make it it gives the error;

                      make all-recursive
                      make[1]: Entering directory `/storage/Software/dnaa-0.1.2'
                      Making all in dkbaseencoding
                      make[2]: Entering directory `/storage/Software/dnaa-0.1.2/dkbaseencoding'
                      make[2]: *** No rule to make target `all'. Stop.
                      make[2]: Leaving directory `/storage/Software/dnaa-0.1.2/dkbaseencoding'
                      make[1]: *** [all-recursive] Error 1
                      make[1]: Leaving directory `/storage/Software/dnaa-0.1.2'
                      make: *** [all] Error 2

                      We are using 64bit Debian.

                      Could you pl help?

                      Comment


                      • #26
                        Hm, not sure what's going on. I'm not the author of DNAA, I'm afraid, but I have gotten it to install successfully myself.

                        I assume you got the tar.gz from here:
                        DNAA is the DNA analysis package, for analyzing next-generation post-alignment whole genome resequencing data. Specifically, DNAA is able to find…

                        Then obviously followed the INSTALL.
                        If you got it through git, maybe that is a problem and you should try making it from the tarball.

                        A search on google for the error "fatal: Not a git repository" has a number of hits that you might want to look at.

                        Just to let you know, I just successfully installed DNAA from scratch on my Mac Pro here.
                        Last edited by Michael.James.Clark; 10-29-2010, 12:59 PM.
                        Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                        Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                        Projects: U87MG whole genome sequence [Website] [Paper]

                        Comment


                        • #27
                          The most common mistake I find people making is forgetting to index their BAM file. Always index your BAM file! Breakway will look in the same folder as the BAM file for a file with the same exact name with the ".bai" appended to the end, which is the standard output from the samtools index program.
                          Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                          Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                          Projects: U87MG whole genome sequence [Website] [Paper]

                          Comment


                          • #28
                            Hi all,

                            Breakway has been updated to version 0.7.

                            In this update:

                            -The breakway.parameters.pl script has been improved. It no longer requires the dbampairedenddist program from DNAA to run. Now BAM files can be directly passed to breakway.parameters.pl along with insert size range and the program will report mean, standard deviation and 95% bounds of the entire BAM file. See The Breakway Compendium at breakway.sf.net for usage.

                            -A bug in breakway.sharpenedges.pl has been fixed. Though it was supposed to default the --score parameter to zero, it was actually undefined, so if one ran the program with this optional parameter, it would crash. Now the script can be run with --score default parameter successfully.
                            Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                            Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                            Projects: U87MG whole genome sequence [Website] [Paper]

                            Comment


                            • #29
                              Hi,
                              We have the same problem, Breakwasy chokes at :
                              samtools view -X sample.bam chr1:56-230|egrep "pPUr[0-9]d"| head -5
                              286_89_1940 pPUr1d chr1 97 16 50M ...
                              since the string
                              pPUr1d
                              is not captured in its entirity by line in load_alignments function
                              if($line =~ m/^(\S+)\s+([pPrRuU12]*)\s+(\S+)\s+(\d+)\s+\d+\s+\S+\s+\S+\s+(\d+)\s+-?([0-9]+)\s+(\w+)/)
                              in the breakway.sharpenedges.pl

                              Is there a particular reason for accepting only srings of type "pPrR1" ?

                              Comment


                              • #30
                                Thanks for pointing that out! I honestly was at a loss for what this bug was as I hadn't seen the "d" before.

                                Can I ask what version of Samtools you've been using? I have only tested it against an old version that Breakway was designed to work with (v0.1.6 (r453) as stated in the Breakway script headers).

                                This quick fix should work. You can change that line to the following:

                                Code:
                                if($line =~ m/^(\S+)\s+([B].*[/B][pPrRuU12]*[B].*[/B])\s+(\S+)\s+(\d+)\s+\d+\s+\S+\s+\S+\s+(\d+)\s+-?([0-9]+)\s+(\w+)/)
                                That way, it should be robust against anything else in the flag field that might get added subsequently.

                                I have uploaded the program with that bug fix to the Breakway site, so alternatively you can just download and extract it (the only difference is that line!).

                                Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                                Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                                Projects: U87MG whole genome sequence [Website] [Paper]

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                17 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X