Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mapping SOLID reads under 24nt

    Hi everybody,
    I used AB SOLID to sequence smallRNAs, especially miRNAs, and I want to obtain ~ 18-25nt length sequences. For that, I try some softwares to map the color reads against a genome reference (hundreds of thousands contigs).

    I tried SHRIMP, but it can’t map under 24nt length. Here are the commands I used:
    ./utils/splitreads.py 250000 reads1_bcSample1_F3.csfasta
    ./bin/rmapper-cs -P 0_to_249999.csfasta genome.fasta > 0_to_249999.csfasta.lib1.out

    At the end, only 3.5% of the reads were mapped on the reference genome, which is expected since the large majority of my smallRNAs are under 24nt.

    I also tried MAQ, but results were poor. I tried with those commands:
    perl solid2fastq.pl reads1_bcSample1_ lib1;
    gunzip lib1.single.fastq.gz;
    maq fastq2bfq -n 1000000 lib1.single.fastq lib1.bfq;
    maq fasta2csfa genome.fasta > genome.csfa;
    maq fasta2bfa genome.csfa genome.csbfa;
    for f in *.bfq;do maq map –d Primer.lib1.txt -c $f.aln.cs.map genome.csbfa $f 2> aln.log;done;
    maq mapmerge lib1.map $(ls *aln.cs.map);
    maq mapview lib1.map > lib1.map.view.txt

    Results are weird, because I just obtain 35nt length sequences, and just 0.5% of reads are mapped.

    In SHRIMP or MAQ, I never see any traces of adapters, and I shouldn’t since adapters are not supposed to map on the genome. Also, using short seeds didn’t improve results.

    Others exists, and recommended by some people on this forum, like Bfast, ABySS, but both are limited, and can’t map less than 25 nt. There is also BLAT, SOAP, BWA, etc… but I didn’t try them. According to SOAP article, they claim it could map between 18 to 26bp.

    So, for a lot of people, I read MAQ is the reference, but I don’t understand why I have bad results with it… And also, did someone ever try SOAP?

    Thank you all

    Mike
    Last edited by mycky; 07-30-2010, 08:57 AM.

  • #2
    The SOLiD itself has a small RNA pipeline. It takes into account the fact that reads from miRNA have the RNA + the adaptors. If the mappers you tried failed because of the small read size then I think using the SOLiD pipeline might be a good idea.

    Comment


    • #3
      I'm trying rna2map, but it also has limitations. There is a huge problem for rna2map : it's made for genomes, with complete chromosomes. 1 folder and 1 script (for LSF or PBS) for 1 chromosome. In my case i'm working on an incomplete genome, i have hundreds of thousands of contigs, and rna2map create 1 folder and 1 script for each one, it's too much. I still tried, but at 91 000 folders created my system couldn't create more, and i don't blame it !
      And of course, each folder have the name of the contig, then if the contig have annotations (for the majority they have one), all illegal characters are not eliminated by rna2map, and it just crash because they are forbidden in folder names.
      I think i'll have to change the code if i continue with it.

      Comment


      • #4
        I think i'm going to concatenate all contigs in just few hundreds, and put positions in a database to locate original contigs. This will be easier than modifying rna2map

        Comment


        • #5
          You are still better off running the solid small rna analysis tool.

          it will automatically trim your reads and filter our primer sequence.

          You are going to have even more issues if you make contigs. If you are going to pass on the provided small rna analysis tool, you should just trim your reads.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 11:49 AM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          61 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X