Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problemas with Rip-seq, ripseeker, rownames and R

    Dear all

    I am trying to run a RIp-seq experiment using RIPseeker in order to process two bams (of two human samples mapped on the last ensembl release with tophat) with the front-end tool of ripseek of ripseeker using the following command

    ripSeek(bamPath="/home/cllorens/nextseq/gabaldon/bam", cNAME="EK1_all_accepted_hits.bam", genomeBuild="GRCh37.p11", uniqueHit=TRUE, assignMultihits=TRUE, rerunWithDisambiguatedMultihits=TRUE, binSize=NULL, strandType=NULL, paired=FALSE, biomart="ensembl", biomaRt_dataset="hsapiens_gene_ensembl", goAnno="org.Hs.eg.db", exportFormat="gff3", annotateFormat="txt", annotateType="TSS", outDir="/home/cllorens/nextseq/gabaldon/3_out_rip", padjMethod="BH", logOddCutoff = 0, pvalCutoff=1, pvalAdjCutoff = 1, eFDRCutoff = 1)

    All aparently goes well in the run until the script arrives to this step where it performs annotation

    *IV. Annotate RIP regions via online ensembl database (hsapiens_gene_ensembl):

    and gives me the following error

    Error in `rownames<-`(`*tmp*`, value = c("1", "0")) :
    invalid 'row.names' length

    It seems not to be a problem of the bams but more properly something wrong with the vector assigned to the array that make the calls to ensembl, perhaps something wrong with R? Anyone had the same or simular trouble with an R script.

    All the best and thanks in advance

    Carlos

  • #2
    Well, It seems thatnot so much people is familiarized with RIp-seq and ripseeker which are both probably quite new technology and associated tool

    Ok i will post the solution just if any other find the same issue.

    although it would be good to also say that I had anoter issue with this pipeline in addition to that the posted above. The pipeline gave me the following error

    ***0. Computing optimal bin size.
    ***0. Computing optimal bin size.
    Error in GRangesList(nbhGRList) :
    all elements in '...' must be GRanges objects

    Thinking about it seems that "Granges" means "Genomic ranges" ( i am not an expert in R) and the firts issue i posted seems to be a problem of calling to biomart because differences in format. Taking into account the examples given in the url of ripseeker are based on the use of the hg19 genome reference files provided by illumina via a link in the tophat web site, which only implements chromosomes 1-22 plus sexuals and "mit" and that i was using the Grch37 last versión which implements a lot of supercontings not assigned to any chromosome because are repetitive or problematic sequences in term of assembly, the solution it was quite obvious.

    Use the hg19 as such and voila "problemo resolved", probably the grch37 will work too if the supercontigs not assigned to chromosomes are removed from the fasta and the index o including if such labels are removed from the bam via samtools etc.

    Just hope this to be useful for others in order to not to waste time in trying to find the bug.

    CArlix
    Last edited by cllorens; 07-15-2013, 01:06 AM.

    Comment


    • #3
      One additional comment,
      I downloaded the hg19 from the ucsc with all the features (chromosomes plus unasigned supercontigs) and ripseeker work well using this material if previously a vector from list.files to the bampath is assigned before launching the ripseek all in one function.

      bamFiles <- list.files(path="your path", recursive=TRUE, full.names=TRUE)

      and then

      ripSeek(bamPath=bamFiles,... same string to that above)

      Cheers

      Comment


      • #4
        Hi cllorens,
        I am using RIPSeeker as well, and received 50+(probably 100s or 1000s) warnings saying for Nbm_em, Alpha had become negative. Have you seen this error? Is this a sign of bad input reads or an issue with the analysis?

        Comment


        • #5
          Hi George,
          I did not receive that warning. If it is a warning you might ignore it (careful with this Word, it is just an option of your choice) if the results are reasonable. Nbm_em is the Expectation conditional maximization of likelihood for negative binomial
          mixture model of RIPseeker. I am not an expert on this tool. Can only say that It Works well when using (at least in my case) aligned data generated by tophat and using the all-in-one main function ripseek and then computeRPKm and rulebaseRIPseek. I documented the steps and protocol of commands i successfully used after some proofs i have it in spanglish but if your find it useful just let it to known and i will write in english and will upload it here. My recomendation as for the function Nbm_em and your issue is to keep an eye to the tutorial available here (http://bioconductor.org/packages/2.1.../RIPSeeker.pdf) as it is quite useful and also contact to Yue Li (the autor) which is a nice guy that i am sure will be deligth to answer you any question about the theoretical underlying background. I am not sure if it is appropiate to publicly type his email here, but it is available in the NAR paper where he introduce the tool or just write me directly to my email ([email protected]) and i will reply you with the Yue address.

          Best

          Carlos

          Comment


          • #6
            Could you post that please? My inputs look like this:
            cNAME <- "RIP1_IgG"
            > getwd()
            [1] "/home/georgem/BAM"
            > outDir <- paste(getwd(), "escRIP_Brg1_1", sep="/")
            > binSize <- NULL
            > biomart <- "ENSEMBL_MART_ENSEMBL"
            > dataset <- "mmusculus_gene_ensembl"
            > host <- "dec2011.archive.ensembl.org"
            > goAnno <- "org.Mm.eg.db"
            > seekOut <- ripSeek(bamPath="/home/georgem/BAM", cNAME=cNAME, binSize=binSize, strandType=NULL, paired=TRUE, outDir=outDir, genomeBuild="mm9", biomart=biomart, host=host, biomaRt_dataset = dataset, goAnno = goAnno, uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = TRUE, multicore=FALSE).

            Everything proceeds nicely from chromosome to chromosome, but it halts with an error at:

            *IV. Annotate RIP regions via online ensembl database (mmusculus_gene_ensembl):

            Error: require(biomaRt) is not TRUE.

            This was after a day+ of computing......

            Thanks for your help!

            Comment


            • #7
              ok George, give me a couple of days, anyway contact Yue and check if you are using the appropiate reference reléase for calling Biomart. I do not if you used it but it would be a good idea to use the annotation illumina material provided by tophat in its web site.

              Comment


              • #8
                Hi George,

                Here you have the protocol I followed.

                I am not sure if it will helps you but it was success in running my analysis of two human samples single end control and rip.

                just a comment for your problem. Try with the index and annotation material provided by Tophap and use the last mm release (if so you do not need to include the host info in your command.

                Hope it helps,

                Carlix



                A) mapping

                for both Control and RIP

                ./tophat --read-mismatches 2 --read-gap-length 3 --read-edit-dist 3 --GTF /home/your_path/Homo_sapiens/UCSC/hg19/Annotation/Archives/archive-2013-03-06-11-23-03/Genes/genes.gtf --transcriptome-index=/home/your_path/Homo_sapiens/UCSC/hg19/Sequence/ref_transcriptome_to_create /home/your_path/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome /home/your_path/your.fastq

                B) Then RIP-seq analisis using RIPseeker (http://bioconductor.org/packages/2.1...RIPSeeker.html)

                1) calling R (assume you have it installed)

                R

                2) installing RIPseeker

                source("http://bioconductor.org/biocLite.R")
                biocLite("RIPSeeker")

                3) Then load it into console

                library(RIPSeeker)

                4) Run Ripseek

                bamFiles <- list.files(path="/home/your_bampath", recursive=TRUE, full.names=TRUE)

                seekOut.CIPF <- ripSeek(bamPath=bamFiles, cNAME="control_accepted_hits.bam", genomeBuild="hg19", uniqueHit=TRUE, assignMultihits=TRUE, rerunWithDisambiguatedMultihits=TRUE, binSize=NULL, strandType=NULL, paired=FALSE, biomart="ensembl", biomaRt_dataset="hsapiens_gene_ensembl", goAnno="org.Hs.eg.db", exportFormat="gff3", annotateFormat="txt", annotateType="TSS", outDir="/home/your_path/output", padjMethod="BH", logOddCutoff = 0, pvalCutoff=1, pvalAdjCutoff = 1, eFDRCutoff = 1)

                5) peak counts

                ripRPKM <- computeRPKM(bamFiles[1], dataset="hsapiens_gene_ensembl", paired=FALSE, moreGeneInfo=TRUE, justRPKM=FALSE, idType="ensembl_transcript_id", featureType="exon",txDbName="biomart", biomart="ensembl", by="tx", saveData="/home/yourpath/ripRPKM")

                rulebase.results <- rulebaseRIPSeek(bamFiles=bamFiles, cNAME="control_accepted_hits.bam", myMin=1, featureGRanges=ripRPKM$featureGRanges, rpkmCutoff = 0.4, fcCutoff = 3, moreRIPGeneInfo=TRUE, idType="ensembl_transcript_id", biomart="ensembl",dataset="hsapiens_gene_ensembl", saveData="/home/yourpath/rulebase.results")

                head(ripRPKM$rpkmDF)

                df <- rulebase.results$rpkmDF

                df <- df[order(df$foldchange, decreasing=TRUE), ]

                write.table(df, file="/home/yourpath/csv", append=FALSE, quote=TRUE, sep = ";", eol = "\n", na = "NA", dec = ".", row.names=TRUE, col.names=TRUE, qmethod = c("escape", "double"), fileEncoding = "")

                6) now_look_for_coverage

                RipGal <- readGappedAlignments("/data1/home/yourpath/control_accepted_hits.bam")

                CtlGal <- readGappedAlignments("/data1/home/yourpath/RIP_accepted_hits.bam")

                ripGR <- as(RipGal, "GRanges")

                ctlGR <- as(CtlGal, "GRanges")

                ripGRList <- split(ripGR, seqnames(ripGR))


                ripGRList <- ripGRList[elementLengths(ripGRList) != 0]

                ctlGRList <- GRangesList(as.list(split(ctlGR, seqnames(ctlGR))))

                ctlGRList <- ctlGRList[lapply(ctlGRList, length) != 0]

                binSize <- 1000

                #c(bottom, left, top, right)

                par(mfrow=c(1, 2), mar=c(2, 2, 2, 0) + 0.1)

                tmp <- lapply(ripGRList, plotStrandedCoverage, binSize=binSize, xlab="", ylab="", plotLegend=TRUE, legend.cex=1.5, main="RIP", cex.main=1.5)

                tmp <- lapply(ctlGRList, plotStrandedCoverage, binSize=binSize, xlab="", ylab="", plotLegend=TRUE, legend.cex=1.5, main="CTL", cex.main=1.5)

                Comment


                • #9
                  Hi cllorens,
                  In analyzing the RIPregions data produced by RIPSeeker, I am a little confused. In many cases, the region in experimental sample and the control sample show similar read counts and FPKM, but the logOddScore will be drastically different (e.g.: Region 1: FPK:3435, logOddScore:88.6; CTRL_fpk: 3110, CTRL_logOddScore: -59.6). How can this be?
                  Thanks! I hope your RIP-seq experiments are going well.

                  Comment


                  • #10
                    Hi,
                    I am also working with RipSeeker v 1.4 and got similar issue:

                    ########### portion of output

                    **D. Re-run NB HMM with unique hits + disambiguated multihits.
                    *** All chromosomes have at least one alignment
                    **********
                    chr1:
                    **********
                    ***0. Computing optimal bin size.
                    **********
                    chr10:
                    **********
                    ***0. Computing optimal bin size.
                    Error in GRangesList(nbhGRList) :
                    all elements in '...' must be GRanges objects

                    ######### the program terminates after over a day of computation.

                    I have successfully used this program before with long reads (100nt control and RIP)
                    Here I decided to use smaller reads (100nt control and 50 bp RIP).
                    Alignment was done with Tophat with the setting described in most recent RIPSeeker paper and it was exactly same for all data sets.
                    I have made sure that the alignment was done to reference genome with only well annotated chromosomes.

                    Any suggestion what may be the problem???

                    ##### the script I used
                    library(RIPSeeker)

                    bamFiles # will have 2 total files
                    iggFiles # will only have igg

                    outDir <- file.path(wd,"ripseek_Med_out")
                    multicore <- TRUE
                    biomart <- "ENSEMBL_MART_ENSEMBL" # use archive to get ensembl 65
                    biomaRt_dataset <- "mmusculus_gene_ensembl" # mouse dataset id name
                    host <- "dec2011.archive.ensembl.org" # use ensembl 65 for annotation for mm9
                    goAnno <- "org.Mm.eg.db" # GO annotation database

                    ripseek_Med <- ripSeek(bamPath=bamFiles, cNAME=iggFiles, genomeBuild="mm9", uniqueHit=TRUE,
                    assignMultihits=TRUE,
                    rerunWithDisambiguatedMultihits=TRUE, biomart=biomart, host=host,
                    biomaRt_dataset=biomaRt_dataset, goAnno=goAnno, multicore=multicore,
                    outDir=outDir)

                    ##### end script
                    Last edited by pavol; 10-02-2014, 06:24 AM.

                    Comment


                    • #11
                      Yet another issue using the same command and settings as seen in my previous post.
                      Any suggestions?

                      ####
                      *III. Seek RIP regions with control library:

                      Error in .normargGenome(value, seqnames(x)) :
                      when length of supplied 'genome' vector is not 1, then it must equal the number of sequences

                      Comment


                      • #12
                        RIPSeeker error

                        Hello Pavol,

                        I am also getting same error. Were you able to resolve this? If yes, please share your solution. Thank you for your help.

                        Error in GRangesList(nbhGRList) :
                        all elements in '...' must be GRanges objects



                        Originally posted by pavol View Post
                        Yet another issue using the same command and settings as seen in my previous post.
                        Any suggestions?

                        ####
                        *III. Seek RIP regions with control library:

                        Error in .normargGenome(value, seqnames(x)) :
                        when length of supplied 'genome' vector is not 1, then it must equal the number of sequences

                        Comment


                        • #13
                          RIP-Seq, RIP-Seeker - annotation

                          I am working with my first RIP-Seq data (plant genome: Arabidopsis Thaliana).

                          Can I get help with the following information for annotating the RIP regions identified by RIPSeeker?
                          genomeBuild= "TAIR10"
                          biomart=
                          host=
                          biomaRt_dataset =
                          goAnno =

                          thanks

                          Comment


                          • #14
                            Hi,

                            I think this should work:

                            biomart="plants_mart"
                            host="plants.ensembl.org"
                            biomaRt_dataset ="athaliana_eg_gene"
                            goAnno ="org.At.tair.db"

                            source: https://support.bioconductor.org/p/75424/

                            Cheers

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Essential Discoveries and Tools in Epitranscriptomics
                              by seqadmin




                              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                              04-22-2024, 07:01 AM
                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Today, 11:49 AM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 08:47 AM
                            0 responses
                            16 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            61 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            60 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X