Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by SamCurt View Post
    So, just for gene expression profiling purposes, should I keep my sjDb file set for second-pass alignment constant?

    Complete story: I have a set of ~40 samples already completed the entire set of double-pass alignment for both gene expression and variation analysis purposes. sjDb files from the first-passes of these samples were used for their second-pass alignments.

    Now I have received a further ~15 samples within the same project of which I'd perform gene expression only. I wonder whether I should I do a first-pass on these new samples and pool their sjDb's with the old ones for second-pass, or just do a "second-pass" with the old sjDb's? My concern is obviously not about time, but rather whether using a different sjDb set would make the gene counts less comparable.
    Hi Sam,

    To avoid quantification bias it's better to use the same splice junctions for the 2nd pass mapping. However, this affects only the novel (unannotated junctions), so if you are quantifying only annotated genes, the bias is likely to be very small.

    The ideal solution is to combine splice junctions files (SJ.out.tab) from the 1st pass of all samples (old and new), and then run the 2nd pass on *all* samples.

    The 2nd best solution (for differential expression) is to use only the junctions from the old samples for the "2nd" pass mapping of the new samples (you would not need the 1st pass mapping for the new samples, nor another 2nd pass on the old samples). This way you would avoid bias for junctions detected only in the new samples.

    Cheers
    Alex

    Comment


    • Hi everyone
      Do you think we can align with Star on a laptop with Intel Core Extreme i7-4940MX and 32GB RAM, even overnight? I will have about 130 million reads, to align on human genome.
      Thank you

      Comment


      • Originally posted by mdidish View Post
        Hi everyone
        Do you think we can align with Star on a laptop with Intel Core Extreme i7-4940MX and 32GB RAM, even overnight? I will have about 130 million reads, to align on human genome.
        Thank you
        Hi,

        depending on the read length, the speed should 20-50M reads per hour per core, so it should be doable. 32GB is just enough for human genome.

        Cheers
        Alex

        Comment


        • Hi,
          Thank you or your response. Finally, I should have a laptop with Intel Core Extreme i7-4940MX and 64GB RAM.
          The duration is not important, I just wanted to make sure I can start the analysis.
          Marc

          Comment


          • dear alex, or other star experts

            which parameters should I set to get ALL non-canonical (i.e. back) spliced reads in the unmapped sam file. I want call circular RNAs with these reads.

            I have 50 bp paired-end unstranded RNA-seq reads, and a genome index with a splice database from the same data. 2-pass over all samples. I hope back-spliced junctions are NOT present in this (joined) splice database - or should I filter theses databases accordingly to remove the back-splice junctions?...


            is

            --outFilterIntronMotifs RemoveNoncanonicalUnannotated

            the correct setting. i.e. will all spliced reads not present in the splice junctions database get in the unmapped sam-file?

            best wishes and thank's in advance,

            dietmar

            Comment


            • Originally posted by dietmar13 View Post
              which parameters should I set to get ALL non-canonical (i.e. back) spliced reads in the unmapped sam file. I want call circular RNAs with these reads.

              I have 50 bp paired-end unstranded RNA-seq reads, and a genome index with a splice database from the same data. 2-pass over all samples. I hope back-spliced junctions are NOT present in this (joined) splice database - or should I filter theses databases accordingly to remove the back-splice junctions?...


              is

              --outFilterIntronMotifs RemoveNoncanonicalUnannotated

              the correct setting. i.e. will all spliced reads not present in the splice junctions database get in the unmapped sam-file?

              best wishes and thank's in advance,

              dietmar
              Hi Dietmar,

              the non-canonical junctions have non-canonical motifs, but they are still "linear" in the genome, i.e. acceptor site follows the donor site. The circular junctions are classified as "chimeric", so you need to enable chimeric detection, e.g.: --chimSegmentMin 15 --chimJunctionOverhangMin 15 . You can extract the circular junctions from the Chimeric.out.junction (see this post), an example script is in STAR source distribution: extras/scripts/filterCirc.awk . The chimeric alignments are also written in the SAM/BAM files.

              Cheers
              Alex

              Comment


              • Hi Alex

                I've been using Star now for several weeks and I love it! Thanks for creating such a great tool.

                I'd like to use Star to try to align Macaque reads to the human genome. I think this might work best if I relax the alignment stringency. Do you have any recommendations for how I should do this?

                Comment


                • Hi Alex,
                  I understand that with --quantMode TranscriptomeSAM --quantTranscriptomeBan Singleend I can generate a transcript-coordinate bam file with indels and soft-clips. But do you consider it acceptable for variant-calling (eg for allele-specific expression purposes)?

                  Comment


                  • duplicated reference genomes

                    I don't understand why the genomeGenerate mode is creating a duplicated (concatenated) reference. This is resulting in at least two identical alignments for every read:

                    Command issued:
                    Code:
                    STAR --runMode genomeGenerate --genomeDir NPB_Pi9 --genomeFastaFiles NPB_Pi9.fasta --runThreadN 2 --genomeSAindexNbases 14
                    resulting SAM header and first two alignments:
                    Code:
                    @HD	VN:1.4
                    @SQ	SN:chr01	LN:43270923
                    @SQ	SN:chr02	LN:35937250
                    @SQ	SN:chr03	LN:36413819
                    @SQ	SN:chr04	LN:35502694
                    @SQ	SN:chr05	LN:29958434
                    @SQ	SN:chr06	LN:31248787
                    @SQ	SN:chr07	LN:29697621
                    @SQ	SN:chr08	LN:28443022
                    @SQ	SN:chr09	LN:23012720
                    @SQ	SN:chr10	LN:23207287
                    @SQ	SN:chr11	LN:29021106
                    @SQ	SN:chr12	LN:27531856
                    @SQ	SN:AC155918	LN:32941
                    @SQ	SN:AC156495	LN:88500
                    @SQ	SN:AC160949	LN:128256
                    @SQ	SN:AP008246	LN:206004
                    @SQ	SN:AP008247	LN:157458
                    @SQ	SN:AC174930	LN:15426
                    @SQ	SN:Syng_TIGR_002	LN:14476
                    @SQ	SN:Syng_TIGR_004	LN:19457
                    @SQ	SN:Syng_TIGR_005	LN:21787
                    @SQ	SN:Syng_TIGR_007	LN:7820
                    @SQ	SN:Syng_TIGR_008	LN:16676
                    @SQ	SN:Syng_TIGR_009	LN:10296
                    @SQ	SN:Syng_TIGR_010	LN:15493
                    @SQ	SN:Syng_TIGR_011	LN:10901
                    @SQ	SN:Syng_TIGR_012	LN:16417
                    @SQ	SN:Syng_TIGR_013	LN:10512
                    @SQ	SN:Syng_TIGR_014	LN:21421
                    @SQ	SN:Syng_TIGR_015	LN:10595
                    @SQ	SN:Syng_TIGR_016	LN:12792
                    @SQ	SN:Syng_TIGR_019	LN:10422
                    @SQ	SN:Syng_TIGR_020	LN:10699
                    @SQ	SN:Syng_TIGR_021	LN:17477
                    @SQ	SN:Syng_TIGR_022	LN:9889
                    @SQ	SN:Syng_TIGR_023	LN:24772
                    @SQ	SN:Syng_TIGR_024	LN:10060
                    @SQ	SN:Syng_TIGR_026	LN:19971
                    @SQ	SN:Syng_TIGR_027	LN:11522
                    @SQ	SN:Syng_TIGR_028	LN:31094
                    @SQ	SN:Syng_TIGR_029	LN:12884
                    @SQ	SN:Syng_TIGR_030	LN:10794
                    @SQ	SN:Syng_TIGR_031	LN:9548
                    @SQ	SN:Syng_TIGR_032	LN:9603
                    @SQ	SN:Syng_TIGR_033	LN:11093
                    @SQ	SN:Syng_TIGR_034	LN:10311
                    @SQ	SN:Syng_TIGR_035	LN:10686
                    @SQ	SN:Syng_TIGR_036	LN:10434
                    @SQ	SN:Syng_TIGR_037	LN:13061
                    @SQ	SN:Syng_TIGR_038	LN:8197
                    @SQ	SN:Syng_TIGR_039	LN:6269
                    @SQ	SN:Syng_TIGR_041	LN:10210
                    @SQ	SN:Syng_TIGR_042	LN:5510
                    @SQ	SN:Syng_TIGR_043	LN:4236
                    @SQ	SN:Syng_TIGR_044	LN:6000
                    @SQ	SN:Syng_TIGR_045	LN:22545
                    @SQ	SN:Syng_TIGR_046	LN:11447
                    @SQ	SN:Syng_TIGR_047	LN:20829
                    @SQ	SN:Syng_TIGR_048	LN:7140
                    @SQ	SN:Syng_TIGR_049	LN:6261
                    @SQ	SN:Syng_TIGR_050	LN:8529
                    @SQ	SN:Pi9_cDNA	LN:4650
                    @SQ	SN:chr01	LN:43270923
                    @SQ	SN:chr02	LN:35937250
                    @SQ	SN:chr03	LN:36413819
                    @SQ	SN:chr04	LN:35502694
                    @SQ	SN:chr05	LN:29958434
                    @SQ	SN:chr06	LN:31248787
                    @SQ	SN:chr07	LN:29697621
                    @SQ	SN:chr08	LN:28443022
                    @SQ	SN:chr09	LN:23012720
                    @SQ	SN:chr10	LN:23207287
                    @SQ	SN:chr11	LN:29021106
                    @SQ	SN:chr12	LN:27531856
                    @SQ	SN:AC155918	LN:32941
                    @SQ	SN:AC156495	LN:88500
                    @SQ	SN:AC160949	LN:128256
                    @SQ	SN:AP008246	LN:206004
                    @SQ	SN:AP008247	LN:157458
                    @SQ	SN:AC174930	LN:15426
                    @SQ	SN:Syng_TIGR_002	LN:14476
                    @SQ	SN:Syng_TIGR_004	LN:19457
                    @SQ	SN:Syng_TIGR_005	LN:21787
                    @SQ	SN:Syng_TIGR_007	LN:7820
                    @SQ	SN:Syng_TIGR_008	LN:16676
                    @SQ	SN:Syng_TIGR_009	LN:10296
                    @SQ	SN:Syng_TIGR_010	LN:15493
                    @SQ	SN:Syng_TIGR_011	LN:10901
                    @SQ	SN:Syng_TIGR_012	LN:16417
                    @SQ	SN:Syng_TIGR_013	LN:10512
                    @SQ	SN:Syng_TIGR_014	LN:21421
                    @SQ	SN:Syng_TIGR_015	LN:10595
                    @SQ	SN:Syng_TIGR_016	LN:12792
                    @SQ	SN:Syng_TIGR_019	LN:10422
                    @SQ	SN:Syng_TIGR_020	LN:10699
                    @SQ	SN:Syng_TIGR_021	LN:17477
                    @SQ	SN:Syng_TIGR_022	LN:9889
                    @SQ	SN:Syng_TIGR_023	LN:24772
                    @SQ	SN:Syng_TIGR_024	LN:10060
                    @SQ	SN:Syng_TIGR_026	LN:19971
                    @SQ	SN:Syng_TIGR_027	LN:11522
                    @SQ	SN:Syng_TIGR_028	LN:31094
                    @SQ	SN:Syng_TIGR_029	LN:12884
                    @SQ	SN:Syng_TIGR_030	LN:10794
                    @SQ	SN:Syng_TIGR_031	LN:9548
                    @SQ	SN:Syng_TIGR_032	LN:9603
                    @SQ	SN:Syng_TIGR_033	LN:11093
                    @SQ	SN:Syng_TIGR_034	LN:10311
                    @SQ	SN:Syng_TIGR_035	LN:10686
                    @SQ	SN:Syng_TIGR_036	LN:10434
                    @SQ	SN:Syng_TIGR_037	LN:13061
                    @SQ	SN:Syng_TIGR_038	LN:8197
                    @SQ	SN:Syng_TIGR_039	LN:6269
                    @SQ	SN:Syng_TIGR_041	LN:10210
                    @SQ	SN:Syng_TIGR_042	LN:5510
                    @SQ	SN:Syng_TIGR_043	LN:4236
                    @SQ	SN:Syng_TIGR_044	LN:6000
                    @SQ	SN:Syng_TIGR_045	LN:22545
                    @SQ	SN:Syng_TIGR_046	LN:11447
                    @SQ	SN:Syng_TIGR_047	LN:20829
                    @SQ	SN:Syng_TIGR_048	LN:7140
                    @SQ	SN:Syng_TIGR_049	LN:6261
                    @SQ	SN:Syng_TIGR_050	LN:8529
                    @SQ	SN:Pi9_cDNA	LN:4650
                    @PG	ID:STAR	PN:STAR	VN:STAR_2.5.4b	CL:STAR   --runThreadN 16   --genomeDir NPB_Pi9   --genomeFastaFiles NPB_Pi9.fasta      --genomeSAindexNbases 1   --readFilesIn STARFILES/MF046_S4_L002_R
                    1_001.fastq.gz      --readFilesCommand gunzip   -c      --outFileNamePrefix STARFILES/MF046.NPB_Pi9   --outFilterMatchNmin 40
                    @CO	user command line: STAR --runThreadN 16 --genomeDir NPB_Pi9 --genomeFastaFiles NPB_Pi9.fasta --genomeSAindexNbases 1 --readFilesCommand gunzip -c --readFilesIn STARFILES/MF046_S4_L002_R
                    1_001.fastq.gz --outFileNamePrefix STARFILES/MF046.NPB_Pi9 --outFilterMatchNmin 40
                    K00282:141:HJTJWBBXX:2:1101:2656:1068	16	chr01	12873883	3	50M1S	*	0	0	CTTGAGNCGANCACACTATAGCCATGTACATTAGTATAGGTTTACACTAGN	JJJJJJ#JJJ#J<FJFJJJJJJJJJ
                    JJJJJJJJJJJJJJJJJJJJJAFAA#	NH:i:2	HI:i:1	AS:i:47	nM:i:0
                    K00282:141:HJTJWBBXX:2:1101:2656:1068	272	chr01	12873883	3	50M1S	*	0	0	CTTGAGNCGANCACACTATAGCCATGTACATTAGTATAGGTTTACACTAGN	JJJJJJ#JJJ#J<FJFJJJJJJJJJ
                    JJJJJJJJJJJJJJJJJJJJJAFAA#	NH:i:2	HI:i:2	AS:i:47	nM:i:0
                    Last edited by GenoMax; 04-03-2018, 03:35 AM.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM
                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-14-2024, 06:13 AM
                    0 responses
                    33 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-08-2024, 08:03 AM
                    0 responses
                    72 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-07-2024, 08:13 AM
                    0 responses
                    81 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-06-2024, 09:51 AM
                    0 responses
                    68 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X