Сравнительная геномика Полиморфизм генома человека ФББ, 4 курс Василий Евгеньевич Раменский, Институт молекулярной биологии РАН.

Презентация:



Advertisements
Похожие презентации
Полиморфизм генома человека Алма-Ата, Василий Раменский, Институт молекулярной биологии им. Энгельгардта РАН, Москва.
Advertisements

Введение в эволюционную и медицинскую геномику, часть II ФББ МГУ, весна 2008 Лекция 4.
Genetics Genetics (from Ancient Greek γενετικός genetikos, "genitive" and that from γένεσις genesis, "origin"),[1][2][3] a discipline of biology, is the.
Workshop 11 Imprint - Assembly Meshing Wizard. WS11-2 Assembly Meshing Wizard Design goals One comprehensive user interface Intuitive approach for solid.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v Route Selection Using Policy Controls Using Multihomed BGP Networks.
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
The reconstruction of coding scheme through errors distributions Lyakhovetskii V.A., Karpinskaya V.Ju*, Bobrova E.V. Pavlov Institute of Physiology of.
© 2006 Cisco Systems, Inc. All rights reserved. MPLS v Complex MPLS VPNs Introducing Central Services VPNs.
1 Cutaneous Melanoma. 2 Equivalent Terms, Definitions and Illustrations Skin only C440-C449 Definitions identify reportable tumors –Evolving melanoma.
© 2005 Cisco Systems, Inc. All rights reserved.INTRO v Building a Simple Serial Network Understanding the OSI Model.
WS8-1 WORKSHOP 8 DIRECT TRANSIENT RESPONSE WITH ENFORCED ACCELERATION MATRIX PARTITION APPROACH NAS122, Workshop 8, August 2005 Copyright 2005 MSC.Software.
Chap 11-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 11 Hypothesis Testing II Statistics for Business and Economics.
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 1-1 Chapter 1 Why Study Statistics? Statistics for Business and Economics.
© 2009 Avaya Inc. All rights reserved.1 Chapter Two, Voic Pro Components Module Two – Actions, Variables & Conditions.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v Customer-to-Provider Connectivity with BGP Understanding Customer-to-Provider Connectivity.
SIR model The SIR model Standard convention labels these three compartments S (for susceptible), I (for infectious) and R (for recovered). Therefore, this.
WS9-1 PAT328, Workshop 9, May 2005 Copyright 2005 MSC.Software Corporation WORKSHOP 9 PARAMETERIZED GEOMETRY SHAPES.
Time-Series Analysis and Forecasting – Part IV To read at home.
© 2005 Cisco Systems, Inc. All rights reserved.INTRO v Constructing Network Addresses Classifying Network Addressing.
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Транксрипт:

Сравнительная геномика Полиморфизм генома человека ФББ, 4 курс Василий Евгеньевич Раменский, Институт молекулярной биологии РАН

People are different…

…caccagctcctgtgGggggaggccctgct… …caccagctcctgtgGggggaggccctgct… …caccagctcctgtgGggggaggccctgct… …caccagctcctgtgCggggaggccctgct… …caccagctcctgtgCggggaggccctgct… …and so are their genomes

Определение SNP (single nucleotide polymorphism): существование в популяции на одной и той же позиции геномной ДНК двух нуклеотидных вариантов с частотой более редкого варианта (аллеля) 1% A ||||||||||||||||||||||||||||||| T G ||||||||||||||||||||||||||||||| C NaNgNaNg N a +N g = N, N a /N 0.01, N g /N 0.01

Комментарии к определению речь идет о сравнении последовательностей одного биол. вида слово «полиморфизм» не имеет в русском языке множественного числа (Н.Ляпунова, личное сообщение) в обыденной речи под «полиморфизмом» чаще всего подразумевают именно нуклеотид (т.е. используют его как синоним слова «мутация») определение подразумевает достоверное измерение частот в популяции(-ях), что в текущей практике пока редкость

Типы полиморфизма в геноме * однонуклеотидный (SNP) * короткая вставка/делеция * микросателлитный повтор различной длины (VNTR, variable number tandem repeat) * вставка объекта * множественный нуклеотидный (MNP)

Некоторые свойства SNPs Comprise the ~90% of human genetic variation Occur with an average density ~1/1000 bp Transition CT(GA) occurs at ~2/3 of all cases, three transversions CA (GT), CG(GC), TA(AT) in ~1/6 of all cases each Most of them (~85%) are common to all populations (with differing allele frequencies)

Why SNPs are important? Convenient genetic markers Responsible for existence of various phenotypes, with primary interest in disease ones Pharmacogenomics: individual response to drugs Clues to understand human evolution

SNP в геноме человека

BuildDate# rss, x ?Feb Aug Jan Jan Jan dbSNP build statistics

Estimates of SNP density in the human genome Li and Sadler (1991), Genetics, ~1/1000 bp Zhao et al., (2003), Gene: ~1/1200 bp dbSNP, build 124 (2005): ~1/300 bp (?)

Классификация SNP по положению в геноме 1. гены 1.1 UTR 1.2 экзоны (cSNP) синонимичные(sSNP) несинонимичные (nsSNP) 1.3 интроны 1.4 сайты сплайсинга 2. регуляторные участки генов (rSNP) 3. межгенные участки

Synonymous vs. non-synonymous SNPs: …CAC CAG CTC CTG TGG GGG GAG GCC CTG CT… …CAC CAG CTC CTG TGC GGG GAG GCT CTG CT… HGVBase ID: SNP G C Hypothetical SNP: C T … H Q L L W G E A L … … H Q L L C G E A L … Example: Lysosomal alpha-glucosidase precursor (SwissProt P10253) nsSNP Trp746 Cys sSNP Ala749 Ala

Summary of Annotation on human Genome Build 33 dbSNP Build 124 : FUNCTION CLASS CODE SNP COUNT GENE COUNT FUNCTIONAL CLASSIFICATION Locus region Allele synonymous to contig nucleotide Allele nonsynonymous to contig nucleotide untranslated region intron splice site Allele is same as contig nucleotide Coding: synonymy unknown

Упражнение В одной базе ~11,000 nsSNPs в ~6,000 белков. В другой базе ~47,000 последовательностей белков общей длиной ~19.5x10 6 остатков. Оценить (а) среднюю длину белка (б) среднее число nsSNP в одном белке (в) среднее число nsSNP на единицу длины белка

Жизненный цикл SNP (по Miller&Kwok, 2001) I.Появление нового аллельного варианта путем мутации (~100 мутаций на индивидуум) II.«Выживание» до момента появления гомозигот по этому аллелю III.Медленное увеличение частоты в популяции IV.Фиксация нового аллеля (0 vs. 100%), превращение в between-species difference

Упражнение Описанный выше жизненный цикл SNP занимает ~0.3 млн лет. Предполагая, что разделение человека и шимпанзе произошло ~5 млн лет назад, а выход H.sapiens из Африки и разделение различных популяций ~ млн лет назад, аргументировать возможность существования (а) одинаковых SNPs у человека и других видов, (б) «private» SNP, т.е. локализованных в пределах одной человеческой популяции

Why polymorphisms are maintained in the population? Selectionists: because heterozygotes have higher fitness Neutralists: because all observed polymoprhisms are selectively neutral Reality: is always somewhat more complicated

Why SNPs are important? Convenient genetic markers Responsible for existence of various phenotypes, with primary interest in disease ones Pharmacogenomics: individual response to drugs Clues to understand human evolution

nsSNPs vs. disease mutations Disease mutations are rare (

Some common nsSNPs are known to affect critical structure features Frequency of the haemochromatosis allelic variant of HLA-H protein Cys260Tyr (with destroyed disulphide bond) is up to 6% in Northern Europe

Identifying SNPs responsible for specific phenotypes whole genome scan – hypothesis free approach; extraordinary number of candidate SNPs candidate gene studies – requires a priori models; nevertheless, large numbers of candidate SNPs to be tested Both methods, however, require huge amounts of expensive experimental data and are are statistically unreliable. Therefore, in silico expertise is required

Methods for prediction of effect of nsSNPs * Sequence-based methods: analysis of multiple alignment with homologs Ng-Henikoff [2002] * Structure-based methods: analysis of various structural parameters Wang, Moult [2001]; Chasman, Adams [2001] * Combined methods: sequence and structure analysis Sunyaev,Ramensky,Bork [2000, 2001, 2002]

PolyPhen : prediction of amino acid substitution effect on protein function Data sources: 1.Sequence annotation of the query protein 2.PSIC profile matrix values derived from multiple alignment with homologous proteins 3.Structural parameters and contacts of query protein structure or its >50% homolog Prediction: benign (neutral), damaging (deleterious)

PolyPhen query processing flowchart INPUT: Sequence: …IMAGLQQTNSE… Position: 133 Var1: Q Var2: P ACC/ID (if known protein): DMD_HUMAN sequence annotation PSIC profile scores for two amino acid variants structural parameters and contacts prediction rules PREDICTION: damaging benign unknown

I. Sequence annotation Hereditary hemochromatosis protein precursor (HLA-H, Q30201) Features checked: * bond: DISULFID, THIOLEST, THIOETH * site: BINDING, ACT_SITE, LIPID, METAL, SITE, MOD_RES, SE_CYS * region: TRANSMEM, SIGNAL, PROPEP

II. PSIC: profile analysis of homologous sequences 1.Align with homologous proteins with seq. ide %

II. PSIC: profile analysis of homologous sequences 2. Calculate the profile matrix with PSIC algorithm Profile matrix: S a,j = ln[ p a,j / q a ], a = {1,..20}, j = {1,..N}, N = alignment length S Asn,4 S Cys,4

II. PSIC: profile analysis of homologous sequences 3. Analyse difference between profile scores for two a.a. variants: S Asn,4 S Cys,4 Asn Cys: = | S Asn,4 – S Cys,4 | = 1.591

III. 3D structure analysis 1. Residues that are in spatial contact with a ligand or other critical residues Zen 999 residues in 5Å contact with Zen 999 Bos Taurus trypsin [PDB ID :1ql7]

III. 3D structure analysis 2. Residues that form the hydrophobic core of the protein (buried residues) Bos Taurus trypsin [PDB ID :1ql7] Surface residues Buried residues

Structural parameters and contacts Secondary structure Phi-psi dihedral angles Solvent accessible surface area, normed s.a.s.a Change in accessible surface propensity Change in residue side chain volume Contacts with heteroatoms Interchain contacts Contacts with functional sites (BINDING, ACT_SITE, LIPID, and METAL) Region of the phi-psi map (Ramachandran map) Normalised B-factor (temperature factor)

RULES (connected with logical AND) PREDICTION PSIC score difference : Substitution site properties:Substitution type properties: arbitrary annotated as a functional* or bond formation** site arbitrary probably damaging not considered in a region annotated or predicted as transmembrane PHAT matrix difference resulting from substitution is negative possibly damaging 0.5 arbitrary benign >1.0 atoms are closer than 3.0Å to atoms of a ligand or residue annotated as BINDING, ACT_SITE, LIPID, METAL arbitraryprobably damaging 0.5< 1.5 normed accessibility ACC 15% absolute change of accessible surface propensity is 0.75 or absolute change of side chain volume is 60 possibly damaging normed accessibility ACC 5% absolute change of accessible surface propensity is 1.0 or absolute change of side chain volume is 80 probably damaging 1.5< 2.0 arbitrary possibly damaging >2.0 arbitrary probably damaging

Control sets alldamunknown dam/(dam+ben) ––––––––––––––––––––––––––––––––––––––––––––– Disease mutations Strict set % Total2,7822, % Between species substitutions Total %

PolyPhen: predictions for nsSNPs All SNPs from HGVBase, rel ,589 synonymous ,310 (5,378 proteins) non-synonymous ,152 (6,124 proteins) Predictions for nsSNPs: unknown ,987 benign ,317 possibly damaging ,591 probably damaging ,257 Prediction basis: multiple alignment ,654 sequence annotation structure

PolyPhen predictions for dbSNP b.121 All: 9,502unknown 27,991benign % 7,905possibly damaging % 5,521probably damaging % 50,919total (44,005 unique rss) With structure: 42unknown 2,142benign % 531possibly damaging % 1,076probably damaging % 3,791total (,167 uniqe rss) [ Ivan Adzhubei, 2004 ]

PolyPhen predictions for dbSNP b.121 All: Filtered: 5 seq. in multiple alignment 16,813benign % 5,195possibly damaging % 4,168probably damaging % 26,176total (21,677 unique rss) With structure: Filtered: 5 seq. in multiple alignment 2,021benign % 499possibly damaging % 1,050probably damaging % 3,570total (2,983 unique rss) [ Ivan Adzhubei, 2004 ]

Hydrophobic core stability parameters are the best predictors Ramensky et al., Nucleic Acids Res. (2002) 30:

PolyPhen PolyPhen input : Protein identifier OR sequence Substitution position Substitution type

PolyPhen

PolyPhen: nsSNPs data collection

DAMAGING nsSNPs Transphyretin (PDB: 1tyr, SNP ) Thr118 Asn occurs at the ligand (REA) binding site Thr 118 REA 130

DAMAGING nsSNPs Trypsin (PDB: 1trn, SNP ) Ser142 Phe results in the strong side chain volume change at a buried position Ser 142

PolyPhen : дитя семи нянек ЦИКЛОП ПОЛИФЕМ ПРЕДСТАВЛЯЛ СОБОЙ УНИКАЛЬНЫЙ ПОДВИД КАРЛИКОВЫХ СЛОНОВ Известия-Наука, 18 ноября 2003 Вонзая заостренное бревно в единственный глаз свирепого циклопа Полифема, легендарный Одиссей истреблял уникальный вид карликовых слонов, обитавших на острове Сицилия. Древний миф об одноглазых человекообразных исполинах развеяли итальянские палеонтологи на научной экспозиции "Полифем в Модене". На выставке представлены черепа, обнаруженные исследователями на Сицилии, у которых одна фронтальная глазница. С первого взгляда она очень напоминает глаз во лбу. Найденные рядом с черепами кости действительно принадлежат немаленькому млекопитающему, которое имело габариты крупного медведя. Обладатель этих останков был не циклопом, а карликовым слоном. "Глаз" во лбу - отверстие для дыхательных путей, то есть для хобота.

Polyphenism : the ability of a single genome to produce two or more alternative morphologies within a single population in response to an environmental cue (such as temperature, photoperiod, or nutrition). [ Dr. Ehab Abouheif, McGill University, Montréal Québec ] The seasonal morphs of the buckeye butterfly, Precis coenia ( Nymphalidae ). The ventral surfaces are shown. The Summer morph ("linea") is on the left; the Fall morph ("rosa") is on the right. [ Scott F.Gilbert, A Companion to Developmental Biology. Chapter 22, Seasonal Polyphenism in Butterfly Wings ]

Damaging nsSNPs We estimate that ~20% of non-synonymous cSNPs from databases are damaging Average allele frequency of non-synonymous cSNPs predicted to be damaging is twice lower than for benign non-synonymous cSNPs We propose to use these predictions for prioritisation of candidates for association studies