Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • alekzs
    Junior Member
    • Jan 2018
    • 8

    ERCC - no gene counts

    Hi!
    I have a problem with recovering added ERCC to my RNAseq samples. Briefly, I'm doing smart-seq2 on single human T cells and align with STAR to hg19+ERCC sequences. Both FASTA and GTF file have the ERCCs but when I look at gene count results they don't show up at all, regardless if I use STARs genecount option, HTseq-count or RSEM.
    Let's assume I "forgot" to add the ERCC spike-ins and the count is actually 0... Shouldn't the gene names still appear in the downstream files but the value just be 0? If I run Samtools idxstats on the STAR output (sorted or unsorted bam file), it shows the ERCC "chromosomes".
    I'm confused by all of this and can't even figure out where in the pipeline my mistake might be. Help!

    Here are my commands:
    STAR --runMode genomeGenerate --runThreadN 8 --genomeDir indices/STAR --genomeFastaFiles path/to/genomeE.fa --sjdbGTFfile path/to/genesE.gtf --genomeChrBinNbits 12

    STAR --runMode alignReads \
    --genomeLoad NoSharedMemory \
    --genomeDir indices/STAR \
    --readFilesIn XX_R1_001.fastq.gz XX_R2_001.fastq.gz \
    --outFileNamePrefix /results/ercc/$i \
    --quantMode GeneCounts \
    --twopassMode Basic \
    --outSAMtype BAM Unsorted SortedByCoordinate \
    --readFilesCommand zcat

    htseq-count --mode=union --idattr=gene_name -f bam -order pos --stranded=no XX-Aligned.bam /path/to/genesE.gtf > XX-gene.count
    ### I tried stranded=yes or reverse but that didn't help either.

    Any pointers highly appreciated!!

    Alex
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    You made a "new" reference by appending the fasta ERCC sequences to end of human genome and then created the STAR indexes from this hybrid file?

    Comment

    • alekzs
      Junior Member
      • Jan 2018
      • 8

      #3
      Originally posted by GenoMax View Post
      You made a "new" reference by appending the fasta ERCC sequences to end of human genome and then created the STAR indexes from this hybrid file?
      Yes, I added both FASTA and GTF annotations und used the hybrid!

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Then I am inclined to speculate that someone forgot to spike the ERCC aliquots. Unless alignments are not being reported since they fail STAR's multi-mapping threshold. Look into that as well.

        Did you make the libraries (and add ERCC)?

        Comment

        • alekzs
          Junior Member
          • Jan 2018
          • 8

          #5
          Originally posted by GenoMax View Post
          Then I am inclined to speculate that someone forgot to spike the ERCC aliquots. Unless alignments are not being reported since they fail STAR's multi-mapping threshold. Look into that as well.

          Did you make the libraries (and add ERCC)?
          I did everything myself so chances are 50-50 I guess.
          Anyhow, even if I didn't add the spike ins, shouldn't the gene names from the reference appear in a gene count file? Like, normal genes get 0 alignments/counts but they're still in the list, right?

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            When you added them to the GTF file they were in the correct format?

            Are you able to see alignments for them in the BAM file?

            Comment

            • alekzs
              Junior Member
              • Jan 2018
              • 8

              #7
              Originally posted by GenoMax View Post
              When you added them to the GTF file they were in the correct format?

              Are you able to see alignments for them in the BAM file?
              Code:
              Tail of FASTA file:
              >ERCC-00171 DQ854994 Ac03459967_a1 Ac03460063_a1
              CTGGAGATTGTCTCGTACGGTTAAGAGCCTCCGCCCGTCTCTGGGACTATGGACGGGCACGCTCATATCAGGCTATATTTGGTCCGGGTTATTATCGTCGCGGTTACCGTAATACTTCAGATCAGTTAAGTAGGGCCATATGCCTCGGGAATAAGCTGACGGTGACAAGGTTTCCCCCTAATCGAGACGCTGCAATAACACAGGGGCATACAGTAACCAGGCAAGAGTTCAATCGCTTAGTTTCGTGGCGGGATTTGAGGAAAACTGCGACTGTTCTTTAACCAAACATCCGTGCGATTCGTGCCACTCGTAGACGGCATCTCACAGTCACTGAAGGCTATTAAAGAGTTAGCACCCACCATTGGATGAAGCCCAGGATAAGTGACCCCCCCGGACCTTGGAGTTTCATGCTAATCAAAGAAGAGCTAATCCGACGTAAAGTTGCGGCGTTGATTACGCAGGATTGCGACCAAAGAACGAGAAAAAAAAAAAAAAAAAAAAAAAA
              
              Tail of GTF file
              >ERCC-00171	ercc	gene	1	506	.	+	.	gene_id "GERCC-00171"; gene_version "1"; gene_name "ERCC-00171"; gene_source "ercc"; gene_biotype "ercc";
              
              samtools view -h 10BTreg02_S290_L003Aligned.sortedByCoord.out.bam ERCC-00171
              >NS500597:113:HH5HKBGX5:3:11406:4418:20117	83	ERCC-00171	441	255	9S29M	=	60	-410	ACGACGTAGGTTGCGGCGTTGATTACGCAGGATTGCGA	EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	NH:i:1	HI:i:1	AS:i:65	nM:i:0
              NS500597:113:HH5HKBGX5:3:21612:3414:12240	89	ERCC-00171	442	255	36M	*	0	0	TTGCGGCGTTGATTACGCAGGATTGCTACCAAAGAA	EAEEEEEEAEAEEEEE/EEEEEEEAEEEEEEAAAAA	NH:i:1	HI:i:1	AS:i:33	nM:i:1
              (there's many more lines, finds other ERCC numbers as well)
              
              tail -n 20 10BTreg02_S290_L003ReadsPerGene.out.tab
              ENSG00000224240	0	0	0
              ENSG00000227629	0	0	0
              ENSG00000237917	0	0	0
              ENSG00000231514	0	0	0
              ENSG00000235857	0	0	0
              That's all I have to offer.
              Last edited by GenoMax; 04-27-2018, 11:52 AM. Reason: Added [code] tags

              Comment

              • r.rosati
                Member
                • Aug 2015
                • 95

                #8
                Here I am with makeshift solutions, but if you make the BAM into a SAM, you can `grep` it to see if the sequences are there.

                Comment

                • alekzs
                  Junior Member
                  • Jan 2018
                  • 8

                  #9
                  Originally posted by r.rosati View Post
                  Here I am with makeshift solutions, but if you make the BAM into a SAM, you can `grep` it to see if the sequences are there.
                  ha, that approach was far easier...

                  grep "ERCC-" 10B02_3.sam -c
                  6770

                  So, yes... they are there, just don't end up in any count file.

                  Comment

                  • r.rosati
                    Member
                    • Aug 2015
                    • 95

                    #10
                    I meant like grepping for
                    CTGGAGATTGTCTCGTACGGTTAAGAGCCTCCGCCC
                    (or any other fragment in the ERCC controls, I copy-pasted the one you wrote in a previous post)

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      Can you try featureCounts to do the counts? It will not count multi-mapping reads by default.

                      Comment

                      • arnollito
                        Junior Member
                        • Jul 2018
                        • 1

                        #12
                        Hi alekzs, how did you solve this issue in the end? Greetings from Switzerland.

                        Comment

                        • alekzs
                          Junior Member
                          • Jan 2018
                          • 8

                          #13
                          Originally posted by arnollito View Post
                          Hi alekzs, how did you solve this issue in the end? Greetings from Switzerland.
                          Yes... I used RSEM for the counting and the index generation with the ERCC-appended hg19 file had failed because the chr-labels weren't compatible so RSEM used an old index without ERCC genes.
                          I edited my fused ERCC-hg19, re-run the index step and then it worked. Hope that helps!

                          Comment

                          Latest Articles

                          Collapse

                          • SEQadmin2
                            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                            by SEQadmin2


                            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                            Here are nine questions we think about, in roughly the order they matter, before...
                            06-18-2026, 07:11 AM
                          • SEQadmin2
                            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                            by SEQadmin2


                            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                            ...
                            06-02-2026, 10:05 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, 06-17-2026, 06:09 AM
                          0 responses
                          26 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-09-2026, 11:58 AM
                          0 responses
                          43 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-05-2026, 10:09 AM
                          0 responses
                          48 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-04-2026, 08:59 AM
                          0 responses
                          49 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...