Family Heart Study Quality Control of Linkage Analyses

February 6, 2001

Submitted by:  FHS Data Analysis Committee

 

Genome scan linkage analyses for FHS are being conducted at five different analysis centers (North Carolina, Minnesota, Framingham, Utah, and St. Louis).  This powerful strategy allows each analysis group to take advantage of its own unique expertise, interests, and strengths in the genetic epidemiology of particular phenotypes, and allows overall FHS analysis progress to proceed in parallel rather than serially.  However, a non-centralized analysis strategy has the danger that differences in results may be due to lack of standardization in analysis procedures and/or software differences.  In order to assess the degree of heterogeneity of results due to technical differences, we conducted a quality control experiment in which the same phenotypes in the same datasets were analyzed in genome wide linkage scans by the five different groups.

In this report, we summarize the experience of all five analysis groups.  Details of the QC plan are given in section I, followed by the five individual group reports as sections II-VI.  Each group contributed at least one phenotype to be replicated by another group, and also replicated at least one other group's genome scan analyses.  Therefore, each of the five group reports has two parts, the first summarizing their own primary genome scan analyses and the second describing their replication efforts.  In addition, the Coordinating Center replicated all phenotypes. 

In general, this QC experience was quite gratifying and comforting.  Although there were differences in results, this is to be expected, given the differences in software, and the compromises that often have to be made in real data analysis.  For example some software requires trimming of pedigrees that are too large, or nuclearization of pedigrees.  Some groups used model-based parametric linkage and others model-free variance components methods.  Even within the most commonly used approach (variance components), there are software differences, for instance, in the algorithms used for IBD estimation between releatives.  There can also be differences in the availability/use of ancillary, so called "nuisance parameters," such as marital correlations, extra sibling resemblance, dominance effects, etc.  These can make a big difference in the overall likelihoods and LOD scores for any given locus.  In fact, it is remarkable that we have the degree of agreement across sites that we see here.

Qualitatively, the results seem to replicate well across centers and software.  In particular, peaks found in one site seem to be replicated as peaks by other groups, although the exact magnitude of the peaks (exact LOD value) may differ.  Nonetheless, we conclude that the FHS analysis groups are well calibrated and that this QC experiment has demonstrated reasonable comparability of results across centers.
I.  QC Plan

Each site contributed at least one of their own primary phenotypes to be replicated by at least one other site.  All age/sex/center adjusted phenotypes were put into a single dataset, QCPHENOS, with 1 record/subject, summarized below, and distributed back to all five sites.

Table 1.1 Characteristics of Dataset QCPHENOS

 

Varname

Label

N

Mean

Std

Min

Max

SUBJECT

Unique Subject Number

5954

16160.12

9414.84

1.00

40091.00

PAIADCST

log pai adj age sex center & stand

4944

-1.30E-16

0.999

-3.31

2.76

PPADJ

Pulse Pressure adj

813

0.026

0.234

-0.80

0.861

FEV1ADJ

FEV1 adj age center sex

4387

0.0003

1.000

-4.41

4.599

HBP

High BP 1=Norm 2=Hypert

5663

1.137

0.344

1.00

2.000

SBPADJ

SBP adj age center sex 

5663

103.51

39.308

0

186

DBPADJ

DBP adj age center sex

5663

60.95

23.466

0

106

BMIADJ

BMI adj age center sex

5082

0.050

1.092

-2.75

5.718

 

NOTES:

1.     Each phenotype was already minimally adjusted by the primary analysis center for (the significant) age/sex/field center effects (except for the qualitative HBP (Y/N)), using the entire Phase II sample for adjustment.

2.     Although phenotypes were calculated for the entire Phase II set of subjects, the replication analyses should be done ONLY on the subset of the 1,027 subjects geneotyped in MGS batch 1.

 

Table I.2 Replication Scheme

 

Phenotype

Variable

Primary Analysis Site

Replication Site*

PAI-1

PAIADCST

NC

MN

Pulse Pressure

PPADJ

MN

MA

FEV1

FEV1ADJ

MA

UT

Hypertension (Y/N)

HBP

UT

CC

Systolic BP

SBPADJ

 

 

Diastolic BP

DBPADJ

 

 

BMI

BMIADJ

CC

NC

*In addition, the CC replicated the analysis of ALL of these phenotypes

 

II.  North Carolina QC Report

The North Carolina (UNC & Wake Forest) group has taken the lead in the linkage analysis of inflammation and diabetes related phenotypes and is serving as the site to replicate linkage studies of obesity (BMI) performed by the Coordinating Center.

II.1.   Linkage Analysis for diabetes-related phenotypes

Fasting glucose and fasting insulin were analyzed by the variance components method as implemented in SOLAR.  Prior to performing linkage analysis, pedigrees that were "nuclearized" were joined to form larger families for increased power (by providing more pairwise relationships). For diabetics, both fasting glucose and fasting insulin were coded as "missing". For fasting glucose, the covariates used in the model were age, sex and center. This model resulted in an estimated heritability for fasting glucose of 0.21 + 0.05 (p < 0.001). The same model was used for fasting insulin, with the heritability of fasting insulin estimated at 0.34 + 0.06 (p < 0.001).

In Table II.1 we present the results of the analyses of the initial genome screen data for fasting glucose and fasting insulin. Both single point (by marker) and multipoint linkage analyses were performed. As can be seen, the highest multipoint LOD scores for fasting glucose occurred on chromosomes 2, 5, 8, 9, 10, 12, 20 and 22 (LOD scores greater than 1.5). For fasting insulin, the highest multipoint LOD scores  occurred on chromosomes 12, 14 and 15 (LOD scores greater than 1.5). The highest LOD score for fasting glucose (2.95) occurred on chromosome 5, near markers D5S820 and D5S1456, in a region with many genes related to inflammation. The highest LOD score for fasting insulin (2.18) occurred on chromosome 14, near markers D14S617 and D14S1434, again in a region with multiple genes involved in inflammation.

Table II.1. Multipoint LOD scores for Fasting Glucose and Fasting Insulin.

 

 

Fasting Glucose

Multipoint

 

Fasting Insulin

Multipoint

Chromosome

Marker

LOD

 

Marker

LOD

1

D1S1589

0.2194

 

D1S3721

0.2975

2

D2S1356

1.4999

 

D2S1363

0.8900

3

D3S1764

0.2745

 

D3S2409

0.4286

4

D4S2431

0.8487

 

D4S3248

0.0418

5

D5S820

2.9542

 

D5S1456

0.0412

6

D6S1053

0.5416

 

D6S2410

0.0307

7

D7S821

0.0099

 

D7S3056

0.3486

8

D8S2324

1.5453

 

D8S1145

0.0583

9

D9S1825

1.8818

 

D9S1118

0.2279

10

D10S1430

2.4573

 

D10S2470

0.0035

11

D11S1984

0.7447

 

D11S2371

0.4784

12

D12S395

1.8499

 

D12S1052

1.9843

13

D13S787

0.3205

 

D13S800

0.2386

14

D14S588

0.2751

 

D14S617

2.1823

15

D15S817

0.1142

 

D15S1232

1.8455

16

D16S403

0.1632

 

D16S403

1.0212

17

D17S1308

0.3040

 

D17S1290

0.1584

18

D18S843

0.9683

 

D18S535

0.4878

19

D19S589

1.1436

 

D19S245

0.4944

20

D20S481

2.5160

 

D20S164

0.4606

21

D21S1437

1.0592

 

D21S1440

0.0881

22

D22S686

2.1555

 

D22S683

0.2498

 

II.2.   Replication of Linkage Analysis for Obesity (BMI)

The North Carolina group was assigned the task of replicating the genome scan on the BMI phenotype created by the Coordinating Center.  The phenotypic data was merged with the family structure data that we had previously organized for analyses on diabetes-related traits.  Again, the family structures represent larger pedigrees, rather than nuclear families. Also, the genome scan of BMI was performed using variance components methods implemented in SOLAR. The estimated heritability for BMI_adj (the phenotype provided by the CC), was 0.45 + 0.06 (p < 0.001). There were no regions identified that had multipoint LOD scores greater than 1.5.  There were a number of chromosomes with multipoint LOD scores greater than one, chromosomes 2 (1.1037), 3 (1.0502), 6 (1.1836), 7 (1.2945), 8 (1.1239), 14 (1.2994), 15 (1.0877), and 21 (1.4918). The maximum multipoint LOD score on chromosome 21 occurred at marker D21S1432.

III.  Minnesota QC Report

The Minnesota group has taken the lead in the linkage analysis of pulse pressure and is serving as the site to replicate linkage studies of PAI1 performed in North Carolina.  The pulse pressure linkage is to be replicated by the Framingham group.

III.1.   Linkage Analysis for pulse pressure (PP)

For the purposes of the FHS quality control experiment, we created a minimally adjusted PP phenotype.  PP was adjusted for age and center separately in each gender.  Linkage analysis was performed in GENEHUNTER2 using variance components analysis with no dominance component.  Using the supercomputer, we were able to set MAXBITS to 20.  This setting allowed GENEHUNTER2 to handle all but four of the families without deleting anyone.  Two chromosomes had multipoint LOD score greater than 1.5.  The following table presents the position for the highest LOD score in the region.

Table III.1  LOD Scores for Pulse Pressure

 

Chromosome

cM distance

LOD

8

32.52

3.01

8

101.1

2.35

14

80.01

1.52

 

III.2.   Replication of Linkage Analysis for PAI1

The Minnesota group was assigned the task of replicating the genome scan on the PAI1 phenotype created by North Carolina.  The phenotypic data was merged with the family structure data that we had previously organized for analyses on pulmonary traits.  The genome scan of pulse pressure (pp) was performed in GENEHUNTER2 using variance components analysis with no dominance component.  Three chromosomes had multipoint LOD scores greater than 1.5.  The following table presents the position for the highest LOD score in the region.

Table III.2  Replication LOD Scores for PAI1

 

Chromosome

cM distance

LOD

1

71.44

2.61

2

231.01

1.83

14

100.71

2.64

 

IV.  Framingham QC Report

The Framingham group has taken the lead in the linkage analysis of pulmonary function and is serving as the site to replicate linkage studies of pulse pressure performed in Minneapolis.  The pulmonary linkage studies are to be replicated by the Utah group.

IV.1.   Linkage Analysis for FEV1

For the purposes of the FHS quality control experiment, we created a minimally adjusted FEV1 phenotype.  FEV1 was adjusted for age and center separately in each gender and standardized to a mean=0, standard deviation=1.  Linkage analysis was performed in GENEHUNTER2 using variance components analysis with no QTL or polygenic dominance variance.  The default maxbits of 16 limits the number of individuals who can be analyzed in the family.  Therefore, GH2 trimmed families that were too large.  We sorted the family members by descending absolute value of the FEV1 phenotype, so that individuals with missing phenotype information were trimmed first.  Five chromosomes had multipoint LOD score greater than 1.5.  The following table presents the position for the highest LOD score in the region.

Table IV.1  LOD Scores for FEV1

 

Chromosome

cM distance

LOD

4

84.94

1.538

5

78.84

1.994

6

128.93

1.680

10

32.2

2.212

18

28.1

1.945

 

IV.2.   Replication of Linkage Analysis for Pulse Pressure

The Framingham group was assigned the task of replicating the genome scan on the pulse pressure phenotype created by Minnesota.  The phenotypic data was merged with the family structure data that we had previously organized for analyses on pulmonary traits.  The genome scan of pulse pressure (pp) was performed in GENEHUNTER2 using variance components analysis with no QTL or polygenic dominance variance.  Family members were sorted by descending absolute values of pp, so that individuals with missing phenotype information would be the first ones trimmed by GH2 when families were too large.  Three regions on two chromosomes had multipoint LOD scores greater than 1.5.  The following table presents the position for the highest LOD score in the region.

Table IV.2  Replication LOD Scores for Pulse Pressure

 

Chromosome

cM distance

LOD

8

34.45

3.126

8

101.10

3.346

14

79.16

1.663

 

V.  Utah QC Report

The Utah center was assigned to replicate the genome scan of a minimally adjusted FEV1 phenotype defined and analyzed at the Boston University center.  The original results from the two centers were somewhat discrepant, thus we have been following up to see whether we could identify the source of the discrepancy.  In an effort to be efficient, we are focusing our analyses on a region of chromosome 4 with varying degrees of linkage across a 50 cM region.  The first scans done yielded LOD scores of 1.8 at 58.4 cM in the Utah scan where BU had a 1.212, but BU had a LOD of 1.538 at 84.94 cM where Utah reported a LOD of 0.909.

Our initial review of the data identified some differences in the locus description files being used at the centers, but reanalysis of the BU genotype/phenotype file with the Utah locus description file resulted in LOD scores only minimally different from the original BU results.  Thus, it did not appear to be simply the allele frequency specifications that were driving the discrepancy.  Second we evaluated whether the different trait definitions had an impact on the results.  BU defined the phenotype using a standardized residual, while Utah performed a transformation on the standardized residual to make the phenotypes all positive values.  This transformation had no impact on the LOD scores.  As a check, we also ran GENEHUNTER2 on the BU machine using both Utah’s locus description files and genotype/phenotype files.  The results were identical to Utah’s results, indicating that the discrepancies were not due to differences in the way the machines ran.

We then began scrutinizing the log file from the runs using the BU data file and the Utah data file.  What soon became apparent was that different pedigrees were being skipped in the two data sets and different people in a pedigree were being trimmed.  The BU data was sorted by descending absolute value of the phenotype within a pedigree prior to running linkage.  When running GENEHUNTER2 with a maxbits = 16, this sorting helps to retain as many phenotyped people in the analysis as possible because GENEHUNTER2 trims families that are too large starting with the last members listed in the family.  Therefore, the trimming of the BU sorted data was removing mostly people with missing phenotype, while the trimming of Utah data was removing a greater number of people with phenotype.  In addition, GENEHUNTER2 was skipping 8 large pedigrees in the Utah data consisting of 94 phenotyped people and 4 pedigrees in the BU data set consisting of 26 phenotyped people.  The differences in the people being included in the analysis of the two different data sets does seem to be driving the discrepant LOD scores.  Using a maxbits = 20 in conjunction with the sorting reduced the number of phenotyped people trimmed, but did not allow the skipped families to be incorporated into the analysis.

As a final test to evaluate the effects of the different pedigrees being excluded from the analyses of the two data sets, we fixed the families being skipped.  We replaced the 8 families being skipped in the Utah data with the corresponding 12 families (4 were split into 2) from the BU data set that were not being skipped.  In addition, the problems in the 4 families being skipped in the BU data were fixed.  Thus, the final analysis of the two data sets did not have any pedigrees being skipped, and only one phenotyped person was being skipped in each analysis.  The phenotyped person being skipped was from the same pedigree in each data set, but not the same person.  The results of analyses using data sets in which no pedigrees were skipped are much more concordant than earlier results.  The LOD score curves are parallel, with the BU LOD scores being 0.2 LOD units higher on average.  The maximum LOD scores in this 50cM region were 1.719 for BU and 1.466 for Utah at 67.58 cM.

VI.  St. Louis QC Report

The St. Louis group has taken the lead in the linkage analysis of adiposity and cardiovascular disease phenotypes, and is serving to replicate linkage studies of each of the other sites as follows:  North Carolina - PAI-1; Minneapolis - Pulse Pressure, Boston - FEV1, and Utah - Blood Pressure.

VI.1  Linkage Analysis for BMI

BMI was analyzed by using a variance components linkage analysis, as implemented in the computer program SEGPATH (Province et al. 2000).  The phenotype was adjusted for the effects of age, sex, and center, within three age groups, using all White phase II subjects.  Further adjustments on the variance also were carried out by regressing the squared residual on age and higher order terms, also within sex.  The final phenotype was standardized using the sample mean and standard deviation.  However, only the subset of subject for which Marshfield genotypes were assessed were used in the linkage analysis.  For computational efficiency, nuclear families were analyzed.  The IBD estimates were obtained using MAPMAKER/sibs, using the multipoint option.  The null hypothesis (that the heritability of the QTL at a locus is equal to zero) was tested allowing for residual sibling correlation and spouse correlation, which were treated as nuisance parameters.  The loci with lod > 1.5 are shown below:

Table VI.1  LOD Scores for BMI

 

 

Chrom

 

Marker

Location (cM)

 

LOD

 

P

1

GX12A07

152

1.8

.00178

 

GY5F09

171

1.8

.00183

3

AX10H11

71

2.0

.0013

7

GX43C11

137

2.2

.00073

 

VI.2.   Replication of Linkage Analysis for Pulse Pressure

Pulse pressure was analyzed originally by the Minneapolis group using GENEHUNTER2, which accommodates intact pedigrees; pedigrees that were too large were trimmed by the program.  We used the adjusted phenotypes sent to us and variance components linkage analysis as implemented in SEGPATH was carried out in the nuclearized data, allowing for marital and residual sibling correlation.  The loci for which we obtained a lod score > 1.5 are shown below:

Table VI.2  Replication LOD Scores for Pulse Pressure

 

 

Chrom

 

Marker

Location (cM)

 

LOD

6

AX11D10

112

1.5

8

GX151F02

27

2.0

 

GX72C10

37

3.4

 

GX14E09

94

3.4

 

GAAT1A4

110

2.2

 

GX8B01

104

4.0

 

Lod scores in excess of 1.5 were found on chromosome 8 in the Minneapolis study.  From the plots, the maximum lod score of ~2.5 occurs at ~100cM, with another bimodal peak at ~20-30cM with lod just over 2.  Although the magnitude of the lod scores differ, the analysis methods differ fundamentally and, thus, such differences are perhaps not surprising.  Nonetheless, each of these analyses seem to point to the same regions.

VI.3.  Replication of linkage analysis of PAI1

These data were originally analyzed by the North Carolina group using SEGPATH, as did we.  The phenotypes sent by UNC were merged into our nuclearized data set and surprisingly, we obtained variant results.  After communication between the groups, we found some differences in the nuclearization of the families.  Once family structure as well as phenotypes were standardized between the analysis groups, analysis of the same data by the same program yielded identical results.  The primary finding was on chromosome 14, shown below:

Table VI.3  LOD Scores for PAI1

 

 

Chrom

 

Marker

Location (cM)

 

LOD

 

P

14

GY21G11

106

2.3

.00056

 

GX168F06

113

2.9

.00012

 

VI.4.  Replication of linkage analysis of FEV1

The analysis of this variable was contributed by the Framingham group.  For the purposes of the FHS quality control experiment, a minimally adjusted FEV1 phenotype was created.  FEV1 was adjusted for age and center separately in each gender and standardized to a mean of 0, standard deviation of 1.  Linkage analysis was performed in GENEHUNTER2 using variance components analysis with no QTL or polygenic dominance variance.  The default maxbits of 16 limits the number of individuals who can be analyzed in the family.  Therefore, GH2 trimmed families that were too large.  We sorted the family members by descending absolute value of the FEV1 phenotype, so that individuals with missing phenotype information were trimmed first.  Five chromosomes had multipoint LOD score greater than 1.5.  The following table presents the position for the highest LOD score in the region.  Again, the CC followup was conducted using nuclearized data (with no individuals trimmed) with SEGPATH allowing for marital and sibling correlations.  Results for the two groups in which the lod score was >1.5 are shown below:

Table VI.4  Replication LOD Scores for FEV1

 

 

Framingham results

St. Louis results

Chromosome

cM distance

LOD

cM distance

LOD

2

-

-

145

152

165

1.9

2.2

1.9

4

84.94

1.538

73

1.2

5

78.84

1.994

69

85

2.0

1.3

6

128.93

1.680

{same}

1.55

10

32.2

2.212

32

1.4

13

-

-

111

1.6

18

28.1

1.945

{same}

1.4

 

One problem in comparing these results is that the St. Louis analysis only computed lod scores at the positions of the markers, whereas it appears as though the linkage hypothesis was evaluated within the intervals in the Framingham analysis; thus, with the exception of two cases, the locations are not strictly comparable.  For comparison, we have presented the lod scores at the most closely flanking markers.  Even when the same locations could be compared, the lod scores were generally higher from GENEHUNTER than from SEGPATH, perhaps reflecting the differing information content between nuclearized and (virtually) intact pedigrees.  In contrast to this general pattern, at least one additional region of interest was identified on chromosomes 2 by SEGPATH, where, presumably, a score of <1.5 was obtained from GH.  These analyses are not wildly discrepant, however, this comparison shows the greatest variability of any of the QC comparisons with SEGPATH.

VI.4.  Replication of linkage analysis of Blood Pressure

While we were analyzing all these quantitative variables, we also examined systolic blood pressure (SBP) and diastolic blood pressure (DBP), even though the initial focus of the Utah group is on the qualitative hypertension phenotype.  The loci with lod > 1.5 are presented here for documentary purposes:

Table VI.4a  Replication LOD Scores for Systolic BP

 

 

Chrom

 

Marker

Location (cM)

 

LOD

4

M165XC11

195

1.9

5

BX2H09

139

2.9

12

AX25F09

125

2.2

 

GX4H01

137

1.9

16

AX55A11

64

2.5

 

GX138C05

81

1.6

 

GX22F09

72

1.8

 

GY3G05

58

2.0

 

Table VI.4b  Replication LOD Scores for Diastolic BP

 

 

Chrom

 

Marker

Location (cM)

 

LOD

5

GX2H09

139

2.1

7

GX5D08

109

2.1

12

AX24F09

125

2.3

 

GX4H01

137

2.1

16

AX55A11

64

2.4

 

GX138C05

81

2.2

 

GX22F09

72

1.7

 

GY3G05

58

1.8