Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ERCC - no gene counts

    Hi!
    I have a problem with recovering added ERCC to my RNAseq samples. Briefly, I'm doing smart-seq2 on single human T cells and align with STAR to hg19+ERCC sequences. Both FASTA and GTF file have the ERCCs but when I look at gene count results they don't show up at all, regardless if I use STARs genecount option, HTseq-count or RSEM.
    Let's assume I "forgot" to add the ERCC spike-ins and the count is actually 0... Shouldn't the gene names still appear in the downstream files but the value just be 0? If I run Samtools idxstats on the STAR output (sorted or unsorted bam file), it shows the ERCC "chromosomes".
    I'm confused by all of this and can't even figure out where in the pipeline my mistake might be. Help!

    Here are my commands:
    STAR --runMode genomeGenerate --runThreadN 8 --genomeDir indices/STAR --genomeFastaFiles path/to/genomeE.fa --sjdbGTFfile path/to/genesE.gtf --genomeChrBinNbits 12

    STAR --runMode alignReads \
    --genomeLoad NoSharedMemory \
    --genomeDir indices/STAR \
    --readFilesIn XX_R1_001.fastq.gz XX_R2_001.fastq.gz \
    --outFileNamePrefix /results/ercc/$i \
    --quantMode GeneCounts \
    --twopassMode Basic \
    --outSAMtype BAM Unsorted SortedByCoordinate \
    --readFilesCommand zcat

    htseq-count --mode=union --idattr=gene_name -f bam -order pos --stranded=no XX-Aligned.bam /path/to/genesE.gtf > XX-gene.count
    ### I tried stranded=yes or reverse but that didn't help either.

    Any pointers highly appreciated!!

    Alex

  • #2
    You made a "new" reference by appending the fasta ERCC sequences to end of human genome and then created the STAR indexes from this hybrid file?

    Comment


    • #3
      Originally posted by GenoMax View Post
      You made a "new" reference by appending the fasta ERCC sequences to end of human genome and then created the STAR indexes from this hybrid file?
      Yes, I added both FASTA and GTF annotations und used the hybrid!

      Comment


      • #4
        Then I am inclined to speculate that someone forgot to spike the ERCC aliquots. Unless alignments are not being reported since they fail STAR's multi-mapping threshold. Look into that as well.

        Did you make the libraries (and add ERCC)?

        Comment


        • #5
          Originally posted by GenoMax View Post
          Then I am inclined to speculate that someone forgot to spike the ERCC aliquots. Unless alignments are not being reported since they fail STAR's multi-mapping threshold. Look into that as well.

          Did you make the libraries (and add ERCC)?
          I did everything myself so chances are 50-50 I guess.
          Anyhow, even if I didn't add the spike ins, shouldn't the gene names from the reference appear in a gene count file? Like, normal genes get 0 alignments/counts but they're still in the list, right?

          Comment


          • #6
            When you added them to the GTF file they were in the correct format?

            Are you able to see alignments for them in the BAM file?

            Comment


            • #7
              Originally posted by GenoMax View Post
              When you added them to the GTF file they were in the correct format?

              Are you able to see alignments for them in the BAM file?
              Code:
              Tail of FASTA file:
              >ERCC-00171 DQ854994 Ac03459967_a1 Ac03460063_a1
              CTGGAGATTGTCTCGTACGGTTAAGAGCCTCCGCCCGTCTCTGGGACTATGGACGGGCACGCTCATATCAGGCTATATTTGGTCCGGGTTATTATCGTCGCGGTTACCGTAATACTTCAGATCAGTTAAGTAGGGCCATATGCCTCGGGAATAAGCTGACGGTGACAAGGTTTCCCCCTAATCGAGACGCTGCAATAACACAGGGGCATACAGTAACCAGGCAAGAGTTCAATCGCTTAGTTTCGTGGCGGGATTTGAGGAAAACTGCGACTGTTCTTTAACCAAACATCCGTGCGATTCGTGCCACTCGTAGACGGCATCTCACAGTCACTGAAGGCTATTAAAGAGTTAGCACCCACCATTGGATGAAGCCCAGGATAAGTGACCCCCCCGGACCTTGGAGTTTCATGCTAATCAAAGAAGAGCTAATCCGACGTAAAGTTGCGGCGTTGATTACGCAGGATTGCGACCAAAGAACGAGAAAAAAAAAAAAAAAAAAAAAAAA
              
              Tail of GTF file
              >ERCC-00171	ercc	gene	1	506	.	+	.	gene_id "GERCC-00171"; gene_version "1"; gene_name "ERCC-00171"; gene_source "ercc"; gene_biotype "ercc";
              
              samtools view -h 10BTreg02_S290_L003Aligned.sortedByCoord.out.bam ERCC-00171
              >NS500597:113:HH5HKBGX5:3:11406:4418:20117	83	ERCC-00171	441	255	9S29M	=	60	-410	ACGACGTAGGTTGCGGCGTTGATTACGCAGGATTGCGA	EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA	NH:i:1	HI:i:1	AS:i:65	nM:i:0
              NS500597:113:HH5HKBGX5:3:21612:3414:12240	89	ERCC-00171	442	255	36M	*	0	0	TTGCGGCGTTGATTACGCAGGATTGCTACCAAAGAA	EAEEEEEEAEAEEEEE/EEEEEEEAEEEEEEAAAAA	NH:i:1	HI:i:1	AS:i:33	nM:i:1
              (there's many more lines, finds other ERCC numbers as well)
              
              tail -n 20 10BTreg02_S290_L003ReadsPerGene.out.tab
              ENSG00000224240	0	0	0
              ENSG00000227629	0	0	0
              ENSG00000237917	0	0	0
              ENSG00000231514	0	0	0
              ENSG00000235857	0	0	0
              That's all I have to offer.
              Last edited by GenoMax; 04-27-2018, 11:52 AM. Reason: Added [code] tags

              Comment


              • #8
                Here I am with makeshift solutions, but if you make the BAM into a SAM, you can `grep` it to see if the sequences are there.

                Comment


                • #9
                  Originally posted by r.rosati View Post
                  Here I am with makeshift solutions, but if you make the BAM into a SAM, you can `grep` it to see if the sequences are there.
                  ha, that approach was far easier...

                  grep "ERCC-" 10B02_3.sam -c
                  6770

                  So, yes... they are there, just don't end up in any count file.

                  Comment


                  • #10
                    I meant like grepping for
                    CTGGAGATTGTCTCGTACGGTTAAGAGCCTCCGCCC
                    (or any other fragment in the ERCC controls, I copy-pasted the one you wrote in a previous post)

                    Comment


                    • #11
                      Can you try featureCounts to do the counts? It will not count multi-mapping reads by default.

                      Comment


                      • #12
                        Hi alekzs, how did you solve this issue in the end? Greetings from Switzerland.

                        Comment


                        • #13
                          Originally posted by arnollito View Post
                          Hi alekzs, how did you solve this issue in the end? Greetings from Switzerland.
                          Yes... I used RSEM for the counting and the index generation with the ERCC-appended hg19 file had failed because the chr-labels weren't compatible so RSEM used an old index without ERCC genes.
                          I edited my fused ERCC-hg19, re-run the index step and then it worked. Hope that helps!

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          9 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          50 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X