Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • harrike
    Member
    • Jun 2010
    • 29

    Does the 2nd column indicate the strand? I check a few lines of my data. The frequency and number (2nd column value) are listed below.

    Frequency Number
    847 147
    75 163
    902 339
    94 355
    97 403
    909 419
    71 83
    847 99

    419/339 pair stands for the "+" strand, 147/99 the "-" strand"? What others?

    Comment

    • GenoMax
      Senior Member
      • Feb 2008
      • 7142

      Originally posted by harrike View Post
      Does the 2nd column indicate the strand? I check a few lines of my data. The frequency and number (2nd column value) are listed below.

      Frequency Number
      847 147
      75 163
      902 339
      94 355
      97 403
      909 419
      71 83
      847 99

      419/339 pair stands for the "+" strand, 147/99 the "-" strand"? What others?
      See the added info in my post above.

      Comment

      • harrike
        Member
        • Jun 2010
        • 29

        Hi Genomax,

        Thanks for providing the info. It is quite helpful. I am clear now.

        Rui

        Comment

        • harrike
          Member
          • Jun 2010
          • 29

          Hi Alex,

          This time I am using STAR to another set of data, which are strand-specific, paired-end, and of 150 bp read length. The command I used is "

          STAR-STAR_2.4.2a/bin/Linux_x86_64/STAR --genomeDir Zmay_AGPv2_STAR_index/ --runThreadN 24 --readFilesIn Zm_ant_02_07a_TGACCA_L001_R1_001.fastq Zm_ant_02_07a_TGACCA_L001_R2_001.fx/ --runThreadN 24 --readFilesIn Zm_TGACCA_L001_R1_001.fastq Zm_TGACCA_L001_R2_001.fastq --outSAype EndToEnd --outFilterIntronMotifs RemoveNoncanonical --outFilterType BySJout --outFileNamePrefix Zm_antMtype BAM Unsorted --outFilterMultimapNmax 20 --alignIntronMax 10000 --alignMatesGapMax 10000 --alignEndsType EndToEnd --outFilterIntronMotifs RemoveNoncanonical --outFilterType BySJout --outFileNamePrefix Zm_TGACCA_L001_R1R2_

          There are 56.34 of reads unmapped (short), see the Log.final.out file below:

          Started job on | Feb 19 07:56:41
          Started mapping on | Feb 19 07:57:00
          Finished on | Feb 19 08:02:30
          Mapping speed, Million of reads per hour | 209.57

          Number of input reads | 19210657
          Average input read length | 302
          UNIQUE READS:
          Uniquely mapped reads number | 5863394
          Uniquely mapped reads % | 30.52%
          Average mapped length | 301.78
          Number of splices: Total | 5527054
          Number of splices: Annotated (sjdb) | 5329770
          Number of splices: GT/AG | 5445210
          Number of splices: GC/AG | 76305
          Number of splices: AT/AC | 5539
          Number of splices: Non-canonical | 0
          Mismatch rate per base, % | 0.83%
          Deletion rate per base | 0.07%
          Deletion average length | 3.02
          Insertion rate per base | 0.07%
          Insertion average length | 1.98
          MULTI-MAPPING READS:
          Number of reads mapped to multiple loci | 1335209
          % of reads mapped to multiple loci | 6.95%
          Number of reads mapped to too many loci | 5625
          % of reads mapped to too many loci | 0.03%
          UNMAPPED READS:
          % of reads unmapped: too many mismatches | 4.69%
          % of reads unmapped: too short | 56.34%
          % of reads unmapped: other | 1.46%
          What are the possible reason of this low-mapping rate? Thanks,

          Rui

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            I suggest that you start by looking at a few (10-20) unmapped reads and blast them against nt to see what they are aligning to. You may be surprised by what you find and it may provide an explanation for the low % alignment.

            Comment

            • alexdobin
              Senior Member
              • Feb 2009
              • 161

              Hi Rui,

              here are a few suggestions in addition to @GenoMax's suggestion.

              1. You are using --alignEndsType EndToEnd, which requires end-to-end alignment for each read (no soft clipping). This might be too harsh for longer reads, which are more likely to have poor quality tails, adapters at the ends etc. Please try to map without this option.
              2. Map read1 and read2 separately - you may have a problem with one of the reads.
              3. Check sequencing quality by plotting quality scores vs position in read (Illumina pipelines typically produce these plots). If sequencing quality drops towards the ends of the reads for a substantial portion of the reads, this would explain poor mappability.

              Cheers
              Alex

              Comment

              • harrike
                Member
                • Jun 2010
                • 29

                Hi Alex,

                Thanks for your suggestions.

                I manually checked a couple of reads as Genomax suggested, and find the major reason of this low mapping rate is because that most of the reads have adapter, due to the poor construction of RNA-seq library. What I am trying to do is to trim the adapter and do the mapping again. The read quality is good per FastQC check.

                I will try to relax --alignEndsType option, and see if the mapping will become better or not.

                Rui

                Comment

                • Juegos 2 friv 4school
                  Junior Member
                  • Feb 2016
                  • 1

                  Thank your article. very helpful article. thank you very much.

                  Comment

                  • SamCurt
                    Member
                    • May 2010
                    • 40

                    Just a quick question here. Is the parameters file used with --parametersFile just a list of command-line options in the same way I type in the console?

                    Comment

                    • alexdobin
                      Senior Member
                      • Feb 2009
                      • 161

                      Originally posted by SamCurt View Post
                      Just a quick question here. Is the parameters file used with --parametersFile just a list of command-line options in the same way I type in the console?
                      The file with parameters should have each parameter on a separate line:
                      <parameterName> <parameterValue(s)>
                      parameterName should not contain --
                      For instance,
                      genomeChrBinNbits 18
                      genomeSAsparseD 1
                      readFilesIn Read1 Read2
                      readFilesCommand -

                      Comment

                      • SamCurt
                        Member
                        • May 2010
                        • 40

                        Thank you for the quick reply, Alex.

                        I also have another problem here. My new institution only has 2.4.0j on their cluster, and it'd take about a week to get a newer version installed. Do you think it's safe to run the first pass using 2.4.0j, and use its SJ.out.tab files for --sjdbFileChrStartEnd when I get, say, 2.5.1b?

                        Comment

                        • alexdobin
                          Senior Member
                          • Feb 2009
                          • 161

                          Originally posted by SamCurt View Post
                          Thank you for the quick reply, Alex.

                          I also have another problem here. My new institution only has 2.4.0j on their cluster, and it'd take about a week to get a newer version installed. Do you think it's safe to run the first pass using 2.4.0j, and use its SJ.out.tab files for --sjdbFileChrStartEnd when I get, say, 2.5.1b?

                          Hi Sam,

                          this would be generally safe, however, when you publish your method, the reviewers and readers will have a bone to pick with you.
                          STAR does not really require installation, you can download a pre-compiled executable and run it instead of the one "installed" on your cluster.
                          I recommend re-generating the genome indexes for the 2.5.1b.

                          Cheers
                          Alex

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            Originally posted by alexdobin View Post
                            I recommend re-generating the genome indexes for the 2.5.1b.

                            Cheers
                            Alex
                            @Alex: Does that mean indexes generated with older versions won't work or you recommend that they be regenerated.

                            Comment

                            • alexdobin
                              Senior Member
                              • Feb 2009
                              • 161

                              Originally posted by GenoMax View Post
                              @Alex: Does that mean indexes generated with older versions won't work or you recommend that they be regenerated.
                              The new versions of STAR may not work with old genome indexes in rare cases - hence my recommendation to re-generate with 2.5.1 that is very stable.

                              Comment

                              • SamCurt
                                Member
                                • May 2010
                                • 40

                                So, just for gene expression profiling purposes, should I keep my sjDb file set for second-pass alignment constant?

                                Complete story: I have a set of ~40 samples already completed the entire set of double-pass alignment for both gene expression and variation analysis purposes. sjDb files from the first-passes of these samples were used for their second-pass alignments.

                                Now I have received a further ~15 samples within the same project of which I'd perform gene expression only. I wonder whether I should I do a first-pass on these new samples and pool their sjDb's with the old ones for second-pass, or just do a "second-pass" with the old sjDb's? My concern is obviously not about time, but rather whether using a different sjDb set would make the gene counts less comparable.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Pathogen Surveillance with Advanced Genomic Tools
                                  by seqadmin




                                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                  03-24-2025, 11:48 AM
                                • seqadmin
                                  New Genomics Tools and Methods Shared at AGBT 2025
                                  by seqadmin


                                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                  The Headliner
                                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                  03-03-2025, 01:39 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 10:17 AM
                                0 responses
                                7 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-20-2025, 05:03 AM
                                0 responses
                                49 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-19-2025, 07:27 AM
                                0 responses
                                60 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-18-2025, 12:50 PM
                                0 responses
                                50 views
                                0 reactions
                                Last Post seqadmin  
                                Working...