Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating genome with splice junctions - STAR

    Hi,

    I am wondering how to go about including splice junction's database when generating a genome with STAR. I am using bovine genome downloaded from UCSC (BosTau7/Btau_4.6.1)

    I have successfully generated this genome as it is and run mapping jobs against it. I'm just wondering if anyone can point me in the right direction for finding a .gtf file with annotated introns in the three column format:

    Chr Start End Strand +/-

    I can't see one anywhere on the UCSC site, do I have to create one myself?

    Any help would be much appreciated, I'm new to this!

  • #2
    You can find the GFF files for (BosTau7/Btau_4.6.1) at NCBI: ftp://ftp.ncbi.nlm.nih.gov/genomes/Bos_taurus/GFF/

    Comment


    • #3
      Thanks a million. I'm having a bit of trouble now with generating the genome though.

      code is:

      /shared/STAR_2.3.0e/STAR --runMode generateGenome --runThreadN 8 --genomeDir /data/shared/genomes/bosTau7 --genomeFastaFiles /data/shared/genomes/bosTau7/bosTau7.fa

      to generate genome without spliced junctions. Not quite sure how to include the gff files but I was going to add --sjdbFileChrStartEnd alt_BosTau_4.6.1_scaffolds.gff3 to the end of the above. Please let me know if this is wrong

      However, whenever I try anything now all that comes up is:

      "Aug 06 13:29:03 ..... Started STAR run
      shivshiv@compute:/data/shared/genomes/bosTau7>"

      Its just returning to the command line without any error message?

      Comment


      • #4
        Check to make sure that the file from NCBI is in the 4-column format (quotes from STAR manual).

        Chr \tab\ Start \tab\ End \tab\ Strand(+or-)
        According to the manual you need to specify for GFF3 files
        --sjdbGTFtagExonParentTranscript Parent
        along with:
        --sjdbOverhang <N>: the length of the "overhang" on each side of a splice junctions. Ideally it should be equal to (MateLength - 1).

        Comment


        • #5
          You have "--runMode generateGenome" when it should be "--runMode genomeGenerate"

          Comment


          • #6
            Hi @shivshiv,

            @bruce01 and #GenoMax advices were spot on. So your command for generating the genome with GFF3 annotation would look like:

            STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /data/shared/genomes/bosTau7/ --genomeFastaFiles /data/shared/genomes/bosTau7/bosTau7.fa --sjdbGTFfile /data/shared/genomes/bosTau7/alt_BosTau_4.6.1_scaffolds.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 100

            Comment


            • #7
              Hi - @shivshiv I realise you've already solved your original problem, but in case anyone else has the same problem here is how I made my own table:

              1. UCSC table browser for your genome
              2. Extract fasta sequence for each intron (keep each one separate)
              3. Grep out the fasta headers (will contain all the relevant details)
              4. Open in excel, use data-to-columns to pull out the relevant details as separate columns, delete the rest.

              Definitely not the most high tech efficient way of doing it but it got the job done quickly!

              Comment


              • #8
                If you want to do cross-species alignment, particularly RNA-seq, I suggest BBMap. I don't know of anything else capable of cross-species RNA-seq. BBMap does not use GFF files, but for RNA-seq, you do need to set the maxindel flag appropriately, e.g. "maxindel=200000" if you expect the most introns to be under 200kb.

                Comment


                • #9
                  Originally posted by Brian Bushnell View Post
                  If you want to do cross-species alignment, particularly RNA-seq, I suggest BBMap. I don't know of anything else capable of cross-species RNA-seq. BBMap does not use GFF files, but for RNA-seq, you do need to set the maxindel flag appropriately, e.g. "maxindel=200000" if you expect the most introns to be under 200kb.
                  thanks. let me see can i find more info about BBMap to help me understand it further more.
                  Last edited by kurban910; 09-27-2014, 02:31 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  8 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  8 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X