Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Nino View Post
    Hey, does anyone know if you need the reference genome indexed according to Star because I know for tophat2 the reference genome needs to be indexed *.b2t (bowtie2)

    Thanks,
    Nino
    You will need to generate special genome files for STAR.
    This is done with the following command:
    STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --runThreadN <Nthreads>
    If you want to use annotations for improved mapping accuracy, you also need to use:
    --sjdbGTFfile /path/to/Annot.gtf --sjdbOverhang <N>, where ideally N=ReadMateLength-1, or you could generically use ~100.

    Comment


    • #32
      Originally posted by alexdobin View Post
      You will need to generate special genome files for STAR.
      This is done with the following command:
      STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --runThreadN <Nthreads>
      If you want to use annotations for improved mapping accuracy, you also need to use:
      --sjdbGTFfile /path/to/Annot.gtf --sjdbOverhang <N>, where ideally N=ReadMateLength-1, or you could generically use ~100.
      Alex, I successfully used STAR to generate the SAM file. But I can't find how to specify the output for chimeric alignments. Should I use "--outSAMunmapped Within" to include everything in the SAM and use samtools to find chimeric alignments? And also for "--outReadsUnmapped", does it include chimeric and singleton?

      Thanks.

      Comment


      • #33
        Originally posted by Auction View Post
        Alex, I successfully used STAR to generate the SAM file. But I can't find how to specify the output for chimeric alignments. Should I use "--outSAMunmapped Within" to include everything in the SAM and use samtools to find chimeric alignments? And also for "--outReadsUnmapped", does it include chimeric and singleton?

        Thanks.
        To switch on chimeric detection and output, you would need to specify non-zero --chimSegmentMin, which is a minimum length of a segment (piece) of which chimeras are made. For example, if you have 2x100 PE reads and specify --chimSegmentMin, you could have a chimera in which one segment of (100-mate1+80-mate2) bases maps non-chimerically to one chromosome, and another segement of 20b-mate2 maps to another chromosome.
        The Chimeric output will go into Chimeric.out.sam and Chimeric.out.junction files.

        Note that the same read can have both acceptable non-chimeric (output to Aligned.out.sam) and chimeric alignments (output to Chimeric.out.*). A read is considered "unmapped" if it does not have an acceptable non-chimeric alignment, and --outSAMunmapped Within will output "unmapped" reads into Aligned.out.sam without alignment coordinates (which allows to fully reconstruct fastq file from the SAM file), while --outReadsUnmapped Fastx will output them into a fastq or fasta files.

        There are other parameters that control chimeric detection:
        chimJunctionOverhangMin 20
        int>0: minimum overhang for a chimeric junction
        chimScoreMin 0
        int>0: minimum total (summed) score of the chimeric segments
        chimScoreDropMax 20
        int>0: max drop (difference) of chimeric score (the sum of scores of all chimeric segements) from the read length
        chimScoreSeparation 10
        int>0: minimum difference (separation) between the best chimeric score and the next one
        chimScoreJunctionNonGTAG -1
        int: penalty for a non-GT/AG chimeric junction

        Comment


        • #34
          I am pretty new to RNA-seq analysis and I am now using STAR instead of Tophat and I am very satisfied with both the quality of the results and the speed at which I get them. One thing I miss though is the .GTF file I get from Tophat that contains new genes predicted based on the reads and splice junktions.
          Does anyone know if there is a way I can combine an existing GTF file with the .tab file to create a new .GTF (or GFF) file containing newly predicted gene sites (with random names for these)?

          Comment


          • #35
            Originally posted by [email protected] View Post
            I am pretty new to RNA-seq analysis and I am now using STAR instead of Tophat and I am very satisfied with both the quality of the results and the speed at which I get them. One thing I miss though is the .GTF file I get from Tophat that contains new genes predicted based on the reads and splice junktions.
            Does anyone know if there is a way I can combine an existing GTF file with the .tab file to create a new .GTF (or GFF) file containing newly predicted gene sites (with random names for these)?
            As far as I know TopHat does not produce a GTF file on its own, at least it was true for the last version I tried (~2.0.3). You need to feed the alignments to Cufflinks, which will assemble and quantify transcripts, and produce the GTF file.

            You can run Cufflinks on STAR alignments.
            If you have un-stranded RNA-seq data you will need to run STAR with --outSAMstrandField intronMotif option, which will generate the XS strand attribute for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical junctions) will be suppressed.

            If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option --library-type options. For example,
            cufflinks ... ... --library-type fr-firststrand
            should be used for the “standard” dUTP protocol. This option has to be used only for Cufflinks runs and not for STAR runs.
            It is recommended to remove the non-canonical junctions for Cufflinks runs using STAR's options:
            --outFilterIntronMotifs RemoveNoncanonical OR RemoveNoncanonicalUnannotated

            Comment


            • #36
              As far as I know TopHat does not produce a GTF file on its own, at least it was true for the last version I tried (~2.0.3). You need to feed the alignments to Cufflinks, which will assemble and quantify transcripts, and produce the GTF file.
              You are right, sorry I mixed it up a bit. Thanks for the information on the options I should use.

              Comment


              • #37
                Hi all, sorry for the basic question:

                I am writing a bash script to submit star jobs, remove duplicates, get counts etc. The dataset I have has multiple fastq per sample, but different numbers for each. I have made files containing fastq in the specified format (fq_r1_1,..,fq_r1_n). Can I use these when submitting the STAR job? Ie:


                STAR [options] readFilesIn $files/file_read1 $files/file_read2

                ?

                Have tried a few ways to do this but can't figure it out or get STAR to accept input. I am a 'midrange' bioinformatics PhD, so don't hold back on most efficient or crazy way of doing this!

                Thanks in advance,

                Bruce.
                Last edited by bruce01; 05-07-2013, 05:12 AM.

                Comment


                • #38
                  Originally posted by bruce01 View Post
                  I have made files containing fastq in the specified format (fq_r1_1,..,fq_r1_n). Can I use these when submitting the STAR job? Ie:

                  STAR [options] readFilesIn $files/file_read1 $files/file_read2
                  Have you just tried the following?
                  Code:
                  STAR --readFilesIn Sample1_r1_1.fq,Sample1_r1_2.fq,Sample1_r1_3.fq... Sample1_r2_1.fq,Sample1_r2_2.fq,Sample1_r2_3.fq...
                  You could also just concatenate the files together as appropriate and use the result.

                  Comment


                  • #39
                    Dpryan, yes have tried using wildcards as input to test it works, I get a segmentation fault. When I run it with all filenames included as standard it runs fine. I have a lot of samples, with variable numbers of fastq files per sample, and want a single script to submit to a queue. So inputting all fastq by hand is not an option, hence my original question.

                    Concatenating the fastqs will mean I have to uncompress them, using computing time and I am keen to go from the .gz that my facility have supplied. This can't be too big of a problem is it?

                    Comment


                    • #40
                      My example didn't use wildcards, so I'm not sure where that idea came from.

                      You can just concatenate the gzipped files together without uncompressing them first.

                      The other normal process would be to simply write your script to generate the comma separated list that's then fed to STAR. You should be able to do that easily enough in bash, which whatever you're using for job scheduling probably already can handle.

                      Comment


                      • #41
                        Ok, asked over on Stackoverflow, this works:

                        group1=( $files/Sample1*r1* );
                        group2=( $files/Sample1*r2* );
                        ( IFS=,; STAR --readFilesIn "${group1[*]}" "${group2[*]}" [OPTIONS]);

                        Thanks for the help and ideas Dpryan.

                        ##Edit: DPryan, sorry, getting wires crossed between here and Stackoverflow. I was asking how to give STAR the input that I had created, above works. I am reticent to concatenate gzip files, I dont want to create doubles and don't want to change the gzips in any way before aligning: paranoia!
                        Last edited by bruce01; 05-08-2013, 03:48 AM. Reason: Miscommunication with poster

                        Comment


                        • #42
                          You can also try following commands, it works for me.
                          fq1=`ls -m *_R1_*.fastq.gz | tr -d '\n' | tr -d ' '`
                          fq2=${fq1//"_R1_"/"_R2_"}
                          STAR --readFilesIn $fq1 $fq2

                          Comment


                          • #43
                            Originally posted by alexdobin View Post


                            If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option --library-type options. For example,
                            cufflinks ... ... --library-type fr-firststrand
                            should be used for the “standard” dUTP protocol. This option has to be used only for Cufflinks runs and not for STAR runs.
                            It is recommended to remove the non-canonical junctions for Cufflinks runs using STAR's options:
                            --outFilterIntronMotifs RemoveNoncanonical OR RemoveNoncanonicalUnannotated

                            Hi Alex,
                            I am trying STAR to align the reads and then use the Cufflinks to look for expression values.I have stranded RNA-seq data. MAy I know why it is recommended to remove the non-canonical junctions for cufflinks run. How is it gonna affect in Cufflinks if I use the default parameter "no filtering" ??

                            Comment


                            • #44
                              hi priya, you may want to post this and carry on the conversation at the google groups for rna-star:

                              Comment


                              • #45
                                I believe it's best to feed Cufflinks only with the highest confidence alignments, and non-canonical junctions in my experience contain more false positives.
                                Also, many non-canonical splices occur just a few bases away from the highly expressed canonical, which could be caused by sequencing/mapping errors, and possibly by spliceosome errors. These splices will likely throw Cufflinks assembly off.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                34 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X