Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • questions from reference to tview by samtool

    Hello, guys.

    I have many questions about NGS and please help me!!!

    1. where do i download hg19 reference file?

    As I know, NCBI and UCSC provides fasta format reference sequence per chromosome.

    Is it matter where I get the ref sequence?

    Actually, I downloaded each chromosome reference sequence files from NCBI:

    http://www.ncbi.nlm.nih.gov/genome/?term=homo%20sapiens -> 'genome' tab -> files from Genome Reference Consortium.

    However, as i searched other posts, other users recommends to download files from 1000genome or UCSC golden path??...

    is there any difference?


    2. when using BWA, what's the minimal unit of reference file?

    do i have to use whole hg19 ref sequence as a ref.fasta? or chr3.fasta?(ex)

    Or, can i use specific gene's fasta format sequence file as a reference file?
    (I also downloaded it from NCBI. if I want to use EGFR as a reference, enter 'EGFR' from 'gene' category, and click the result of homo sapiens, and download sequence as fasta format.)

    as you know, fasta file format starts with '>~~~~~~~~' and from the next line, 'AGCTCCTG~~~~'.

    the first line('>~~~~') is important for using BWA tool?

    In case of using specific gene's fasta format sequence file, what should i write the first line of fasta file?

    3. when i use bwa pair end mode align, as you know the command is like followings:

    'bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam'

    Actually, I used some barcode at read1.fq so I trimmed barcode sequence(=6bp) from read1.fq not by using command option but using programming code.(I also trimmed the quality score for 6 characters)

    In this situation, the length of lines in read1.fq and read2.fq doesn't same.

    I runned pair end mode align command, the terminal window shows 'weird pair' but anyway it made result file 'aln.sam'.

    is it okay? does anyone who had same experience like this?

  • #2
    In general, you should align against the whole genome, not one chromosome at a time. If your read aligns with one error to Chr 3, but with no errors to Chr 8, if you only provide Chr 3 to align against, you will get a wrong alignment. Yes, it might take a little longer than aligning one chromosome at a time, but it will be more accurate.

    Every time I've ever seen "weird pair", it was because I did something wrong. sampe should show a line like this:

    [infer_isize] inferred external isize from 179561 pairs: 222.453 +/- 120.731
    If it says that the pairs have an appropriate distance beween them, its probably fine. If the distance is far too large, or it won't calculate it at all, double-check that your command line is right, and that you aren't mixing up files. I don't think that having different sizes of reads shoudl make a difference.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin


      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
      Yesterday, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    39 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    41 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    35 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X