Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq - need FASTA and GTF from same resource

    I am following the DESeq protocol.

    It suggests for me to download a reference genome sequence for the organism under study (homo sapiens) in (compressed) FASTA format. It also asks for me to download a GTF file with gene models for the organism of interest (homo sapiens).

    Upon Googling, I began to download the hg19 from this site (http://tophat.cbcb.umd.edu/igenomes.shtml). As it is downloading, it is called Homo_sapiens_UCSC_hg19.tar.gz.crdownload, which makes me wonder if it really is a FASTA format?

    Moreover, the site does not also include GTF file for humans.

    So, I am wondering, what is the safest place to download both these files?

  • #2
    The Homo_sapiens_UCSC_hg19.tar.gz file will contain the sequence, prebuilt indices for bowtie2, and annotation files in GTF format. You can also just download the fasta and GTF files from Ensembl or UCSC (but don't mix the two) yourself and then build the indices with bowtie2-build.

    Comment


    • #3
      Thanks again, dpryan.

      I think I may have done it correctly then. I did download three items:

      1) Homo_sapiens_UCSC_hg19.tar.gz (thanks for your explanation of this file)
      2) UCSC compressed FASTA file (http://hgdownload.cse.ucsc.edu/golde...chromFa.tar.gz)
      3) UCSC GTF file (refseq-hg19.gtf.gz)

      I downloaded (2) and (3) from a harvard site (https://atgu.mgh.harvard.edu/plinkseq/resources.shtml), but they were both indicated to be UCSC.

      Thank you!

      Comment


      • #4
        You should use (and only need) files from download #1. It should have all the information you need.

        Mixing and matching files (even though they may be indicated to be from a particular genome build) *may* lead to some strange complications (e.g. the GTF files from UCSC and Ensembl are generally different).

        BTW: Illumina hosts an expanded selection of iGenomes at: https://support.illumina.com/sequenc...e/igenome.ilmn

        Comment


        • #5
          GenoMax, thanks for your help.

          The first download has crashed twice, and I have tried to start downloading it, but it has an estimation of 2-3 days for the past 2 hours.

          However, the other two files downloaded successfully.

          If this first file still keeps crashing, how dangerous do you think it would be to use the later two? They both say they are from UCSC.

          Thanks!

          Comment


          • #6
            Originally posted by SuzuBell View Post

            If this first file still keeps crashing, how dangerous do you think it would be to use the later two? They both say they are from UCSC.

            Thanks!
            Try downloading from the iGenomes site at Illumina.

            Though this is more investment in time upfront (I assume you are not paying for the bandwidth) it could potentially save you a significant amount of downstream work (creating indexes etc).

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 08:47 AM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            59 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X