Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sequence alignment

    Hi all,
    I want to align some read data in fasta format. i use bowtie short read aligner. but before i align them, i need a refrence sequence. im new in bioinformatics and searched about refrence seq and didnt find anything useful about why we need refrence seq for read alignment.
    please help me on understanding that and how i can download required refrence sequences.

    thank you all. justin
    Last edited by biouser; 08-14-2012, 11:46 AM.

  • #2
    You need something to align against, that is the purpose of the reference sequence. What organism is your sequencing from? That will pretty much answer the question of what to download.

    Comment


    • #3
      there is a fasta file containing 15million reads which is 454 sequences of Human HapMap, downloaded from genomic paired-end library from ncbi.

      Comment


      • #4
        So, you have a bunch of reads from a human and you want to know where they map. For that you would need a reference human genome sequence. You could use the one from NCBI or the 1000 genome project (there are probably others, I actually don't know off-hand if the NCBI reference differs from that of the 1000 genomes project as I don't do any human sequencing).

        Comment


        • #5
          in NCBI ftp there is 2 kinds of files, some are in .fa format and other in .rm.out
          which one is used for refrence sequence?
          i got output from bowtie as below :

          # reads processed: 15281579
          # reads with at least one reported alignment: 610764 (4.00%)
          # reads that failed to align: 14670815 (96.00%)
          Reported 610764 alignments to 1 output stream(s)

          what is the meaning of this output report? does it mean that 4% of reads belong to chromosome 2 that i used as refrence sequnce?

          Comment


          • #6
            You'll want the fa (fasta format) files. The rm.out files are from repeat masker.

            Comment


            • #7
              Thank you dpryan ,
              and what about second question? The Bowtie report?

              Comment


              • #8
                Originally posted by biouser View Post
                Thank you dpryan ,
                and what about second question? The Bowtie report?
                Ah, I missed that, mea culpa. It really just means that only 4% aligned. The remainder may not have aligned because (1) they didn't come from chromosome 2 (2) you didn't quality trim prior to alignment and so things couldn't align or (3) there adapter contamination that wasn't trimmed that caused misalignment. For your real run, I would use the "cat" command to concatenate the various chromosomes into a single file, which would then be indexed and mapped against. Since you're using bowtie, you might be able to download prebuilt indexes form the bowtie website. That'll save you a bit of time!

                Comment


                • #9
                  For your real run, I would use the "cat" command to concatenate the various chromosomes into a single file, which would then be indexed and mapped against. Since you're using bowtie, you might be able to download prebuilt indexes form the bowtie website. That'll save you a bit of time!
                  no hay problema. actualy building an index of reference sequence took only 3minutes and alignment against it took several hours.
                  but, is "cat" command one of bowtie's commands? or it is possible using other softwares?

                  Comment


                  • #10
                    Ah, I assumed that you're using Linux or a Mac, in which case cat is a standard shell program. If you're using windows then I wouldn't have a clue, presumably there's something similar.

                    Comment


                    • #11
                      yes! i forgot that command. i certainly use it.
                      helped me alot dpryan.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 08:47 AM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      59 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      54 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X