Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • STAR coupled with Cufflinks

    I am working on a plant, in which the genome has been sequenced but the gene-annotation quality are not as good as I expected. I am trying to use STAR and Cufflinks to annotate novel transcripts from pair-end RNA-seq data. When I used the default settings, I got many alternative-spliced transcripts, some of which don't look like real. Therefore, for each novel splicing junction, I would like to have a filter, that is, at least ten reads (unique mapped or multi-mapped) support the junction. I don't know how to do that. What I am currently using is to do 2-pass mapping using STAR. After the 1st pass mapping, I will have the unsupported (with less than 10 reads) junctions removed from the *SJ.out.tab, and use this modified *SJ.out.tab as "annotated" junctions (--sjdbFileChrStartEnd value) for the 2nd pass mapping. Will this step improve the the mapping and get a more reliable list of novel transcripts.

    Thanks!

  • #2
    Hi @harrike,

    --outSJfilterCountUniqueMin 10 10 10 10 --outFilterType BySJout
    will remove all junctions that are supported by <10 uniquely mapping reads. The 4 numbers are for different intron motifs: non-canonical, GT/AG, GC/AG, AT/AC.
    You can also use --outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout, which will remove junctions supported by <10 reads including unique and multimappers.

    This option will work with 1-pass or 2-pass mapping.

    There are also other splice junction filtering options - please check the manual for outSJfilter* otpions, they can all be used together in "AND" fashion.

    Cheers
    Alex

    Comment


    • #3
      Hi Alex,

      Great ! Thank you very much for your reply. I am testing those settings.

      I am curious about what happens to those uniquely mapped reads spanning a junction with <10 reads support. Are they considered as unmapped reads?

      Thanks,

      Rui

      Comment


      • #4
        Hi Alex,

        I tried one of my libraries, using the "--outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout". In the *SJ.out.tab file, some of the junctions are still of <10 reads support (see below). Should the sum of the column 7 and 8 be >=10?

        scaffold2131 22287 22614 1 1 0 62 0 39
        scaffold2131 22322 22614 1 1 0 3 0 23
        scaffold2131 22504 22614 1 1 0 1 0 35
        scaffold2131 22527 22614 1 1 0 1 0 16
        scaffold2131 22702 22827 1 1 0 40 0 44
        scaffold2131 22933 23197 1 1 0 31 0 44
        scaffold2131 23273 23348 1 1 0 157 0 45
        scaffold2131 23479 23582 1 1 0 94 61 44
        scaffold2131 23675 23875 1 1 0 134 24 44
        scaffold2131 23927 24382 1 1 0 150 0 44
        scaffold2131 24460 24565 1 1 0 131 23 44
        scaffold2131 24621 24694 1 1 0 2 0 34
        scaffold2131 24621 24699 1 1 0 1 0 40

        The command I used is:

        STAR --genomeDir /home/harrikex/workspot/litchi/RNAseq/star_index/ --runThreadN 24 --readFilesIn 121224_I663_FCC1L8RACXX_L6_LITgqjTABRAAPEI-62_1.fq 121224_I663_FCC1L8RACXX_L6_LITgqjTABRAAPEI-62_2.fq --outSAMtype BAM Unsorted --outFilterMultimapNmax 20 --alignIntronMax 10000 --alignMatesGapMax 10000 --outSAMstrandField intronMotif --alignEndsType EndToEnd --outFilterIntronMotifs RemoveNoncanonical --outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout --outFileNamePrefix FMO_121224_

        Thanks,

        Rui

        Comment


        • #5
          The 2-pass pipeline for STAR works very fine, I used it many times. Moreover, STAR with Cufflinks scores among the best results in alignment-based quantification pipelines when dealing with not very specific samples, e.g. tissue samples or whole-organism samples.

          Comment


          • #6
            Originally posted by harrike View Post
            Hi Alex,

            I tried one of my libraries, using the "--outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout". In the *SJ.out.tab file, some of the junctions are still of <10 reads support (see below). Should the sum of the column 7 and 8 be >=10?

            ...

            Thanks,

            Rui
            Hi Rui,

            sorry, I forgot that --outSJfilterCountUniqueMin and --outSJfilterCountTotalMin filters are used in the OR fashion, i.e. if one or the other is satisifed, the junction will be kept - so you have to specify both options. If you want only the Total filter, you need to make --outSJfilterCountUniqueMin numbers larger or -1, e.g. --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10 .

            The reads that cross the junctions that do not pass this filter will be re-mapped forcing them not to cross these junctions. Most often this will result in soft-clipped reads, i.e. donor or acceptor part of the read over the prohibited junction will be dropped.

            Cheers
            Alex

            Comment


            • #7
              Originally posted by alexdobin View Post
              Hi Rui,

              sorry, I forgot that --outSJfilterCountUniqueMin and --outSJfilterCountTotalMin filters are used in the OR fashion, i.e. if one or the other is satisifed, the junction will be kept - so you have to specify both options. If you want only the Total filter, you need to make --outSJfilterCountUniqueMin numbers larger or -1, e.g. --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10 .

              The reads that cross the junctions that do not pass this filter will be re-mapped forcing them not to cross these junctions. Most often this will result in soft-clipped reads, i.e. donor or acceptor part of the read over the prohibited junction will be dropped.

              Cheers
              Alex

              Hi Alex,

              Thanks for your explanation. I tried the combination of "--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10", it works. It has only half of the junctions (using the default setting) reported. It seems to have greater reliability.

              One more question: After the 1st-pass mapping, using the above setting, only the junctions with >=10 reads support will be reported into the *SJ.out.tab file. As I have a couple of samples, I will get a few *SJ.out.tab files. In the 2nd-pass, all these *SJ.out.tab will be used as references (annotated junctions) for the mapping, do I still need to use the setting of "--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10" ? as it is possible, a few reads, <10 in one library, are mapped to a junction which is of >10 read support in another library; in this case, I still want those read not to be filtered out. On the other hand, if I don't use those settings, I feel many false positives will be introduced in the 2nd-pass mapping.

              In other words, in the 2nd-pass mapping, will all the reads be preferentially mapped according the annotated junctions (provide in the *SJ.out.tab files using --sjdbFileChrStartEnd)? If this is true, I feel it is likely good without the >10 filter (--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10) in the 2nd-pass mapping.

              Thanks,

              Rui

              Comment


              • #8
                Hi Rui,

                the --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10 filter will not apply to the "annotated" junctions, i.e. junctions that are introduced via --sjdbFileChrStartEnd or --sjdbGTFfile options. So in the 2nd pass the junctions from your combined-SJ file will not be filtered out.
                However, STAR will still be detecting the junctions which never made it to combined-SJ file, and you will have to use this filter to get rid of them.

                Cheers
                Alex

                Comment


                • #9
                  Hi Alex,

                  Great, thanks for your clarification.

                  I decide to using the filter "--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10" for my 2nd-pass mapping.

                  Rui

                  Comment


                  • #10
                    Dear Rui and Alex,
                    Many thanks for asking and explaining the SJout filter combinations so well!
                    I want to completely suppress detection of any and all SJ in the second pass. Do need to set --outSJfilterCountUniqueMin -1 -1 -1 -1 AND --outSJfilterCountTotalMin -1 -1 -1 -1 for the second passes to achieve that?

                    Comment


                    • #11
                      Hi biokloot,

                      To me, all "-1" does suppress all SJ, as "-1 means no output for that motif". But I am not sure how a read spanning an intron will be mapped in this case.

                      I am sure Alex will give a good clarification.

                      Rui

                      Comment


                      • #12
                        Originally posted by biokloot View Post
                        Dear Rui and Alex,
                        Many thanks for asking and explaining the SJout filter combinations so well!
                        I want to completely suppress detection of any and all SJ in the second pass. Do need to set --outSJfilterCountUniqueMin -1 -1 -1 -1 AND --outSJfilterCountTotalMin -1 -1 -1 -1 for the second passes to achieve that?
                        Hi @biokloot,

                        the best way to suppress detection of any new junctions in 2nd pass (i.e. junctions not detected in the 1st pass) is to use --alignSJoverhang 100000 (or any number > read length).
                        Alternatively, you can use
                        --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin -1 -1 -1 -1 --outFilterType BySJout
                        or
                        --outSJfilterOverhangMin -1 -1 -1 -1 --outFilterType BySJout
                        but it's slightly slower.

                        Cheers
                        Alex

                        Comment


                        • #13
                          Thanks for the quick replies, Rui and Alex!

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          25 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          29 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          25 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X