questions from reference to tview by samtool

dkrtndhkd

Member

Join Date: Jan 2012

Posts: 42
- Share
- Tweet
#1

questions from reference to tview by samtool

01-30-2012, 06:06 AM

Hello, guys.

I have many questions about NGS and please help me!!!

1. where do i download hg19 reference file?

As I know, NCBI and UCSC provides fasta format reference sequence per chromosome.

Is it matter where I get the ref sequence?

Actually, I downloaded each chromosome reference sequence files from NCBI:

http://www.ncbi.nlm.nih.gov/genome/?term=homo%20sapiens -> 'genome' tab -> files from Genome Reference Consortium.

However, as i searched other posts, other users recommends to download files from 1000genome or UCSC golden path??...

is there any difference?

2. when using BWA, what's the minimal unit of reference file?

do i have to use whole hg19 ref sequence as a ref.fasta? or chr3.fasta?(ex)

Or, can i use specific gene's fasta format sequence file as a reference file?
(I also downloaded it from NCBI. if I want to use EGFR as a reference, enter 'EGFR' from 'gene' category, and click the result of homo sapiens, and download sequence as fasta format.)

as you know, fasta file format starts with '>~~~~~~~~' and from the next line, 'AGCTCCTG~~~~'.

the first line('>~~~~') is important for using BWA tool?

In case of using specific gene's fasta format sequence file, what should i write the first line of fasta file?

3. when i use bwa pair end mode align, as you know the command is like followings:

'bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam'

Actually, I used some barcode at read1.fq so I trimmed barcode sequence(=6bp) from read1.fq not by using command option but using programming code.(I also trimmed the quality score for 6 characters)

In this situation, the length of lines in read1.fq and read2.fq doesn't same.

I runned pair end mode align command, the terminal window shows 'weird pair' but anyway it made result file 'aln.sam'.

is it okay? does anyone who had same experience like this?
Tags: None
swbarnes2

Senior Member

Join Date: May 2008

Posts: 910
- Share
- Tweet
#2

01-30-2012, 10:22 AM

In general, you should align against the whole genome, not one chromosome at a time. If your read aligns with one error to Chr 3, but with no errors to Chr 8, if you only provide Chr 3 to align against, you will get a wrong alignment. Yes, it might take a little longer than aligning one chromosome at a time, but it will be more accurate.

Every time I've ever seen "weird pair", it was because I did something wrong. sampe should show a line like this:

[infer_isize] inferred external isize from 179561 pairs: 222.453 +/- 120.731

If it says that the pairs have an appropriate distance beween them, its probably fine. If the distance is far too large, or it won't calculate it at all, double-check that your command line is right, and that you aren't mixing up files. I don't think that having different sizes of reads shoudl make a difference.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

questions from reference to tview by samtool

Comment

Latest Articles

ad_right_rmr

News