Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • NDUFB11
    Member
    • Jul 2017
    • 34

    RNAseq analysis, and bam files

    Hello everyone,
    I m handling for the first time bam files from RNAseq and my work is to perform a differential expression analysis. Looking online, I could see there are many tutorials and different ways to do this analysis. I was able to perform example from bioconductor website but I m not able to process my files.

    My question is, what is the first step to do with two bam files?

    Some tutorial says to use salmon other to use something else,
    some others says I have to convert bam to sam files for other bam is enough ...


    I would be more than happy if you guys can give me an idea about how to proceed first
    thank you
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    1. Run featureCounts to produce counts.
    2. Load into DESeq2 or edgeR or limma and follow the appropriate vignette

    Salmon is a newer method that requires either fastq files or BAM files of transcriptome alignments. It's unlikely that you have those. There is never a reason any more to make a SAM file, that's just a waste of space.

    Comment

    • NDUFB11
      Member
      • Jul 2017
      • 34

      #3
      Hi Devon Ryan,
      Thank you for your answer, do you know where I can find a good tutorial that explains how to use featureCounts?

      or if I can have the comman line

      thanks
      Last edited by NDUFB11; 07-03-2017, 06:07 PM.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        featureCounts manual.

        Comment

        • NDUFB11
          Member
          • Jul 2017
          • 34

          #5
          Thank you genoMax, doesn t seem to be easy :/

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            It is not that difficult if you have some command like skills. If not, can you do R? There is an R package for subread/featurecounts that could be used instead.

            All you basically need to do is this (-s 2 is for stranded libraries, if yours is not then use -s 1):
            Code:
            featureCounts -s 2 -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_file1.bam mapping_results_file2.bam
            This will use the annotation file (in GTF format) to identify regions of interest (in this case exon) and then summarize the counts at the gene level (using gene_id) and write the results out to a file (count.txt) as a matrix. Rows would be your gene_id and columns your samples. First 5 columns in the matrix would be annotation. Last two columns will contain the counts for your samples (you have 2 correct, if there are more then include those files in the command line and you will get extra columns in the matrix file). You don't even need to sort the bam files since featureCounts can do that itself.

            Comment

            • NDUFB11
              Member
              • Jul 2017
              • 34

              #7
              Ok I tell you my story,

              I have R, containing the package Rsubread::featureCounts() and I have two bam files which I named Cell1.bam and Cell2.bam.

              I m in the directory where this files are

              > getwd()
              [1] "/Users/name/Desktop/bamfolder"


              now, if I copy paste this command line I get this
              > featureCounts -s 2 -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_file1.bam mapping_results_file2.bam
              Error: unexpected numeric constant in "featureCounts -s 2"
              >



              This is because upstream I miss something, like the word file1.bam and file2.bam need to be Cell1.bam and Cell2.bam, is this correct?

              What is missing I guess is the GTF file, should I take it from ensamble?



              Thanks a lot
              Last edited by NDUFB11; 07-04-2017, 06:27 PM.

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                The command I had posted is for using in a shell. You should be able to figure out the R equivalent from subread vignette.

                You would need to find a GTF for the genome build that was used to generate your alignment files (you can't mix/match otherwise the results may be nonsense).

                Comment

                • NDUFB11
                  Member
                  • Jul 2017
                  • 34

                  #9
                  Dear GenoMax,
                  I think I did another step forward, I run FutureCounts in R.
                  I used this command line:

                  > z <- featureCounts(fls, "hg19", nthreads = 30)

                  but it s not clear to me, how the program counts the reads just writing hg19 as genome reference on the line, without loading any file. Can you tell me more please?

                  I got this at the end:

                  ========== _____ _ _ ____ _____ ______ _____
                  ===== / ____| | | | _ \| __ \| ____| /\ | __ \
                  ===== | (___ | | | | |_) | |__) | |__ / \ | | | |
                  ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
                  ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
                  ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
                  Rsubread 1.26.0

                  //========================== featureCounts setting ===========================\\
                  || ||
                  || Input files : 2 BAM files ||
                  || S wgEncodeRikenCageGm12878CellPapAlnRep1.bam ||
                  || S wgEncodeRikenCageHelas3CellPapAlnRep1.bam ||
                  || ||
                  || Dir for temp files : . ||
                  || Threads : 16 ||
                  || Level : meta-feature level ||
                  || Paired-end : no ||
                  || Strand specific : no ||
                  || Multimapping reads : not counted ||
                  || Multi-overlapping reads : not counted ||
                  || Min overlapping bases : 1 ||
                  || ||
                  \\===================== http://subread.sourceforge.net/ ======================//

                  //================================= Running ==================================\\
                  || ||
                  || Load annotation file /usr/local/lib/R/site-library/Rsubread/annot/hg19 ... ||
                  || Features : 225074 ||
                  || Meta-features : 25702 ||
                  || Chromosomes/contigs : 52 ||
                  || ||
                  || Process BAM file wgEncodeRikenCageGm12878CellPapAlnRep1.bam... ||
                  || Single-end reads are included. ||
                  || Assign reads to features... ||
                  || Total reads : 19677397 ||
                  || Successfully assigned reads : 14167723 (72.0%) ||
                  || Running time : 0.15 minutes ||
                  || ||
                  || Process BAM file wgEncodeRikenCageHelas3CellPapAlnRep1.bam... ||
                  || Single-end reads are included. ||
                  || Assign reads to features... ||
                  || Total reads : 24319886 ||
                  || Successfully assigned reads : 18864794 (77.6%) ||
                  || Running time : 1.08 minutes ||
                  || ||
                  || Read assignment finished. ||
                  || ||
                  \\===================== http://subread.sourceforge.net/ ======================//
                  Last edited by NDUFB11; 07-11-2017, 01:34 PM.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  38 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  100 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  121 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  114 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...