Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie index file & matching GTF for tophat2, for specific human genome cytoband

    Are there examples of bowtie2 ref file (used for indexing with bowtie2-build) and the gtf file that matches that, so I can use them as standard for my mapping?

    I am having trouble in creating the bowtie index file that matches the GTF file for a specific cytoband of the human genome..

    Also, if you guys have any comments on narrowing down the region of interest I'd appreciate.. (I am currently doing to test my files as it's my first time doing these analyses)

  • #2
    Illumina's iGenomes site hosts bundles (sequence/index/annotation) for several genomes: http://support.illumina.com/sequenci...e/igenome.html

    Comment


    • #3
      But, for example, say I downloaded the hg19 version at the iGenomes page.. in the file I find the chr10.fa reference file, which I create the bowtie index (bt2 files).. then theres also a GTF file for the whole genome..
      so is it correct if i run tophat like this:
      tophat -p 8 -G hg19.gtf chr10 reads1.fastq,reads2.fastq

      Everytime I put the GTF file it simply won't work.. I also tried assempling transcripts with cufflinks using a bam file generated on tophat without the GTF file (only the chr10 bowtie index reference)..

      The thing is: do I need to manipulate these files before putting them for a run?

      I am sorry! I am verry new to this and tried several times changing small things in these files but simply didn't work.

      Comment


      • #4
        And yeah, I am aware the first column of the gtf file needs to match the headers of my chr10.fa reference files (that I check with command "bowtie2-inspect --names chr10").. So i manually changed the headers of the fasta file, created new index files, tried to run and it also didn't work (which is expected since it lost all the genomic coordinates information haha).

        I don't know how to do this, that's why I was asking for an example of files ready for a run.. hehe

        Thanks a lot for your attention!

        Comment


        • #5
          I am not sure why you are trying to recreate the index files since the bundle contains BowtieIndex and Bowtie2Index, which already have the index files.

          GTF files define features in the genome and if you want to add additional ones as long as you stick to the correct format you should not need to mess with the reference or the index files.

          If you make any changes to the reference itself (add/delete, even a base) then you will need to recreate the index files and edit GTF file as well since you will have effectively changed co-ordinates of the original genome reference.

          Are you trying to look at a specific region of the chromosome? If so you can extract just the reads aligning in that region using samtools (check out the view command). Then use original (or modified) GTF file with this extracted region in the new BAM file.

          Comment


          • #6
            Originally posted by GenoMax View Post
            Are you trying to look at a specific region of the chromosome? If so you can extract just the reads aligning in that region using samtools (check out the view command). Then use original (or modified) GTF file with this extracted region in the new BAM file.
            Ah!! This is what I needed to know!! Yes, I am trying to align my reads to a specific part of the chromosome due to computational limitations..

            Thanks so much for your help!

            Comment


            • #7
              Originally posted by rodrigo.duarte88 View Post
              Ah!! This is what I needed to know!! Yes, I am trying to align my reads to a specific part of the chromosome due to computational limitations..
              There are two ways of doing this as I alluded to above.

              1. Doing the alignment to the entire genome followed by the samtools view option to extract a particular region would be more straightforward (with a sorted and indexed bam file).
              Code:
              $ samtools view your.bam chr1:10000-20000
              2. If you really want to work with just the region of interest then (using the iGenomes file bundle)
              a. Get that sequence from the "genome.fa" (use the appropriate chromosome sequence from this file) using the bedtools "getfasta" option: http://bedtools.readthedocs.org/en/l.../getfasta.html
              b. Create appropriate indexes (bowtie/bowtie2) using this file.
              c. Select appropriate regions from the GTF file. You will have to adjust the coordinates appropriately (not sure how big a region you are looking at).
              Accomplishing a and b is straightforward. c would be difficult.

              3. You could use BioMart or UCSC table browser to extract the sequence and the annotations. Make the indexes using that sequence file.

              Unless you are really constrained for computational power going with option 1 is straightforward.
              Last edited by GenoMax; 05-17-2015, 06:01 AM.

              Comment


              • #8
                I will try running option 1 since it's more correct, as far as I've been reading.

                But anyway, you've been very helpful! Thanks so much!!

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                46 views
                0 likes
                Last Post seqadmin  
                Working...
                X