Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • short reads mapping software with GUI needed

    (Warning message: total noob talking)

    Hello all,

    I know nothing about sequencing or sequences alignment.
    I was given a collection of short reads (millions of them) and I would like to map them to the mouse genome and to visualize the surrounding sequence and the alignment (and do some kind of quantification of expression, maybe...).

    I have no clue on how to do this and started looking around for some software that would help me (I won't blast millions of reads one by one, right?).

    However, I found most of the software to be CLI-based. In addition to my total lack of experience in sequencing and sequence handling, I'm not very comfortable with the terminal neither and I fear I would not understand the output of such programs.

    I am thus looking for a GUI-based software (any platform is OK, mac would be my favorite choice) in order to allow me to visualize where all these reads map in the mouse genome.

    I am ready to read tons of manual pages and tutorial if necessary but, again, I am a bit allergic to the terminal. I mean, if the software is CLI-based but the output is something easy to understand (like a zoomable image, in my dreams) I would go for it.

    Tell me if you guys know any tool that could help me (or a better way to answer my questions).

    Thank you very much in advance for your help.
    Best regards
    -a-

  • #2
    IGV does a great job of visualizing nextgen read alignments. You need to first generate a BAM file (a binary representation of aligned reads to a reference) and you can do this with a variety of mapping tools. If your data is Illumina I would recommend BWA or STAMPY, but there are lots of programs to choose from. They are easy to use but not GUI-based. And most run on MACs.

    Comment


    • #3
      Maybe something like Partek Genomics Suite, but I think it's expensive. You can get a free trial from them though.

      Comment


      • #4
        Thank you for these answers.

        I think I will try to use IGV first.
        Now I need to generate a BSM file from the multiple .sar files that I have.
        I'll look into that (BWA or... can bowtie do this as well?).
        If you know any "simple" way to do that, I'd be very interested to know about it.
        Thanks again
        -a-

        Comment


        • #5
          You should really consider how easy it is to learn how to use CLI programs.

          This is coming from an ex-totally clueless noob.


          Go through the first section on UNIX and you're ready to go. We're talking an investment in the hours range. A minimal investment of time when you take into consideration how much work you have and will put into your project.
          --------------
          Ethan

          Comment


          • #6
            Check this: http://seqanswers.com/wiki/Software/list

            You do not say if you are looking for something free or are willing to buy. In either case you can find something you will like.

            CLC Genomics workbench or Geneious (both commercial) would fit. I am not endorsing either. Just providing a pointer.

            If you are not averse to using a web accessible resource try: http://usegalaxy.org. Check the wiki and learn links on the Galaxy site to get started.

            Comment


            • #7
              I would look into Simon Andrews programs, SeqMonk (http://www.bioinformatics.bbsrc.ac.uk/projects/seqmonk/) and FASTQC, (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/).
              SeqMonk is for visualizing after you have done an alignment (used an alignment program like bowtie or bwa; which can be done in Galaxy, http://main.g2.bx.psu.edu/.) and produced an alignment file (preferably in SAM format). Very easy to use and you can go from a whole chromosome view down to a particular gene. Many other things can be done with SeqMonk as well.
              First, you want to know how your file of sequences is formatted. If it has been provided by a sequencing facility using Illumina sequencers then it is most likely in FASTQ format. (Don't confuse FASTQ with FASTQC. FASTQC is a quality assessment program, FASTQ is a format for sequence files.) You can Google 'FASTQ' to see what this format is. It is very simple and not complicated. Then look at a portion of your sequence file and see if it is in the same format. If it is a FASTQ file the rest will be easy.

              Upload it to Galaxy under Get Data in the Options. You can then first do some quality assessment by finding FASTQC on Galaxy in the Options under NGS:QC and manipulation. FASTQC will work on the FASTQ file and provide you with some quality assessment of your sequences.
              The FASTQ file can then be used to align the sequences with bowtie or bwa. Under the Options in Galaxy, go to NGS: mapping and you can then align the FASTQ file using bowtie or bwa. The resulting file can be downloaded to your computer and put into SeqMonk. Download and load the mouse genome into SeqMonk. This can easily be done using SeqMonk when you click 'New Project'. After loading the genome, load your alignment file and you will then be able to visualize the position of your sequences in relation to genes and other annotation on the genome.

              Comment


              • #8
                Originally posted by mattanswers View Post
                Thank you


                Originally posted by mattanswers View Post
                SeqMonk is for visualizing after you have done an alignment (used an alignment program like bowtie or bwa; which can be done in Galaxy, http://main.g2.bx.psu.edu/.) and produced an alignment file (preferably in SAM format). Very easy to use and you can go from a whole chromosome view down to a particular gene. Many other things can be done with SeqMonk as well.
                My problem right now is to know how to "get" the data.
                - I have 20 .sra files.
                - In order to align them with BWA (or bowtie) I guess I should merge them into 1 .fastq file, right?
                - NCBI says I should use "fastq-dump" to extract fastq from sra but I can also download the data directly in fastq (compressed in fastq.gz). I wonder if these are the same as the files that would be generated by "fastq-dump" I guess the answert is Yes...
                - I do not know how to "merge" 20 .fastq files into a single one. Can I just copy-paste using textedit for instance? (.fastq files are just text files after all, right?)
                - the resulting .fastq file will be very large (I guess something like 10Gigabytes). Too large to upload?

                Tell me what you think about it.

                Thank you very much.
                -a-

                Comment


                • #9
                  Converting .sra files to fastq with fastq-dump will give you the same thing as downloading the fastq files, although downloading sra and converting will in my experience be way faster than downloading the fastq files. Although recently I've had some trouble in that the files converted from sra weren't correct and I had to download the fastq files, no idea what went wrong...

                  Why do you need to merge the fastq files? Usually you would only do this if there are all different runs of the same sample, and you wanted to analyze them combined.

                  Comment


                  • #10
                    Examine the fastq from SRA via dump-fastq utility.


                    The Short Read Archive folks added a complication. The fastq files are often not BWA compatible. You further need to cook the data. Here tools like perl,sed,awk,gcc come in handy.

                    You'd want to use the "cat" program (from the cygwin/bash/linux command line). A text editor would struggle with the data. "cat" concatenates input files into an output file.

                    At this point in the game "enterprise java-bean enabled cloud GUI just click and watchen the blinken lichten" ain't there. You really do need to interface with command line. For today at least.

                    10GB is pretty small these days. Your home DSL may not handle it well, but it's very manageable.

                    Comment


                    • #11
                      Originally posted by biznatch View Post
                      Why do you need to merge the fastq files? Usually you would only do this if there are all different runs of the same sample, and you wanted to analyze them combined.
                      do you mean I could align the (millions of) reads contained in 20 .fastq files to the mouse genome? I mean, without getting 20 alignment files at the end, but a single one to look at with a viewer...

                      Tx

                      Comment


                      • #12
                        You'll find it convenient to run fastqs in chunks through alignment software like BWA.
                        Check out samtools to merge and BAMize the resulting SAM files.

                        Comment


                        • #13
                          Originally posted by Richard Finney View Post
                          Examine the fastq from SRA via dump-fastq utility.


                          The Short Read Archive folks added a complication. The fastq files are often not BWA compatible.
                          Is that due to
                          "note that the NCBI have converted this FASTQ data from the original Solexa/Illumina encoding to the Sanger standard"
                          ?

                          Originally posted by Richard Finney View Post
                          You further need to cook the data. Here tools like perl,sed,awk,gcc come in handy.
                          by "cooking" you mean going back to the illumina encoding? manually? ouch!!


                          Originally posted by Richard Finney View Post
                          You'd want to use the "cat" program (from the cygwin/bash/linux command line). A text editor would struggle with the data. "cat" concatenates input files into an output file.
                          in order to merge the 20 .fastq files into a single one, right?


                          Originally posted by Richard Finney View Post
                          At this point in the game "enterprise java-bean enabled cloud GUI just click and watchen the blinken lichten" ain't there. You really do need to interface with command line. For today at least.
                          to do what part of the job?
                          1) .sar -> .fastq conversion? (including encoding restoration)
                          2).fastq files merge (I think I can read "man cat" and maybe even understand it)
                          3) align the .fastq file (millions of 25bp reads) with the mouse genome (using BWA or bowtie) to get a SAM/BAM file. (can I do this using Galaxy?)
                          4) view the alignment with a GUI viewer

                          did I miss anything?
                          I guess points 1) and 3) are the most difficult, right?


                          Originally posted by Richard Finney View Post
                          10GB is pretty small these days. Your home DSL may not handle it well, but it's very manageable.
                          OK... cool
                          Thank you

                          Comment


                          • #14
                            Originally posted by Richard Finney View Post
                            You'll find it convenient to run fastqs in chunks through alignment software like BWA.
                            Check out samtools to merge and BAMize the resulting SAM files.
                            So it would be more like:
                            1) convert .sar files to .fastq files using dump-fastq.
                            I don't know how to do that. Should I install the "SRA toolkit"?

                            2) align .fastq files to the mouse genome to generate SAM files. Using BWA or bowtie (via Galaxy?)
                            I don't know how to do that but it should be "easy" to find out.

                            3) merge SAM files and generate a BAM file using SAMtools.
                            I do not know how to do that. I guess there's a user manual for samtools

                            4) view the alignment using a graphical viewer (SeqMonk, IGV, others?)

                            am I right?

                            thanks
                            -a-

                            Comment


                            • #15
                              Originally posted by asheenlevrai View Post
                              So it would be more like:
                              1) convert .sar files to .fastq files using dump-fastq.
                              I don't know how to do that. Should I install the "SRA toolkit"?
                              So I used the "fastq-dump" command from the "SRA toolkit" in order to convert 1 of the .sra files to .fastq
                              I compared (using textedit) this file to the correponding .fastq file I downloaded from NCBI. the 1st thing I saw is that the reads are 36bp long using fastq-dump. This is because the barcode (3bp) and the adapter (8bp) sequences are still there (flanking the "true" read). I guess I do not want these sequences for the alignment, right?

                              The quality values are different among the 2 .fastq files (as expected due to the re-encoding from NCBI).
                              ______EDIT: Actually they are the same. The different size (36bp vs 25bp) made them just look different. I guess I should just download all the 20 files in .fastq format since this seem the easiest solution._____

                              _____(should I find a way to get rid of the barcode and adapter sequences?)________
                              Last edited by asheenlevrai; 11-07-2011, 02:13 PM. Reason: new info

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              56 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              70 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X