ALSPAC OMICs Data Catalogue

Table of Contents

1. Introduction

Welcome to the ALSPAC Omics Catalogue, a guide to the omics data offered by ALSPAC. This catalogue features a variety of named ALSPAC datasets, each consisting of collected or produced data that has been organized, named, and curated for ease of use. Every named ALSPAC dataset comes with accompanying metadata that provides information about the dataset as a whole. Each named ALSPAC dataset has at least one release version that includes a curated selection of files detailed in the metadata sections.

Please note that these datasets are not generally accessible. Please see http://www.bristol.ac.uk/alspac/researchers/access/ for details for access.

The information within this catalogue is made available for browsing to help both internal ALSPAC users and external researchers understand the data and facilitate prospective data requests.

For external ALSPAC collaborators, we offer as standard "freezes" of specific dataset versions of named ALSPAC datasets. These freezes, along with their metadata, are outlined in this catalogue. External collaborators will be granted access to these freezes upon request (See http://www.bristol.ac.uk/alspac/researchers/access/ ). A freeze represents a carefully selected subset of data files within a version, containing the core data from a dataset with withdrawn consent removed and specific dataset IDs applied. These freezes are subject to periodic updates.

Due to the removal of withdrawn individuals from the freezes, please note that the number of participants within each dataset may change over time and may not match those found in the Methodology fields.

Freeze 1 timings: July 2021 - Dec 2022 Freeze 2 timings: Dec 2022 - Dec 2023 Freeze 3 timings: Jan 2023 - Current

Documentation for the current freeze is in the form of a yaml file is present below, listing the files external collaborators will receive, accompanied by metadata.

NamedALSPACDataset DatasetVersion Freeze

The metadata presented in our catalogue adheres to the ALSPAC Data catalogue Schema, which is crafted in LinkML. To explore the full schema documentation, please visit: https://alspac.github.io/alspac-data-catalogue-schema/

This website is equipped with RDFa, enabling the metadata to be machine-readable and allowing for the creation of queries using SPARQL with compatible tools, such as Apache Any23 and Apache Jena.

For more information about this see the document on FAIR data principles and the document describing the rational and construction of this catalogue here.

2. Catalogue overview

3. Genetic Array Data

3.1. Genome-wide - Illumina 550 quad - G1 (gwa_550_g1)

3.1.1. Description

This dataset contains genome wide array data genotype calls for G1 individuals.

3.1.2. Methodology

ALSPAC children were genotyped using the Illumina HumanHap550 quad chip genotyping platforms by 23andme subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, US. The resulting raw genome-wide data were subjected to standard quality control methods. Individuals were excluded on the basis of gender mismatches; minimal or excessive heterozygosity; disproportionate levels of individual missingness (>3%) and insufficient sample replication (IBD < 0.8). Population stratification was assessed by multidimensional scaling analysis and compared with Hapmap II (release 22) European descent (CEU), Han Chinese, Japanese and Yoruba reference populations; all individuals with non-European ancestry were removed. SNPs with a minor allele frequency of < 1%, a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 5E-7) were removed. Cryptic relatedness was measured as proportion of identity by descent (IBD > 0.1). Related subjects that passed all other quality control thresholds were retained during subsequent phasing and imputation. 9,115 subjects and 500,527 SNPs passed these quality control filters.

3.1.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:gwa_550_g1_2022-12-05_f3
name: >-
  Genome-wide array data including raw files and genotype calls
  for G1 individuals 2022-12-05 freeze 3
description: >-
  The third freeze of the genome-wide array data for G1 based on a
  2022-12-05 release. The data is in plink format.
freeze_size: 997M
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_gwa_550_g1/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
previous_freeze: alspacdcs:gwa_550_g1_2022-12-05_f2
freeze_of_alspac_dataset_version: alspacdcs:gwa_550_g1_2022-12-05
freeze_of_named_alspac_dataset: alspacdcs:gwa_550_g1

has_containers:
  - id: alspacdcs:5b87a9bf-879b-4d26-b3e2-aab9b14a1fdb ## uuid
    name: data
    description: A dir/folder containing the two freeze data files


has_parts:
  - id: alspacdcs:2fde6fb6-a1a9-454b-b0bc-51d450a80447
    name: Biallelic genotype table
    description: >-
      genotype data
    data_distributions:
      - id: alspacdcs:0dcc9a3c-7d5c-446e-a9a0-3493db443d0e
	name: freeze_id.bed
	description: >- 
	  Plink bed file.
	  Primary representation of genotype calls at biallelic
	  variants. Must be accompanied by .bim and .fam files.
	md5sum: 8ce44ce1dbf5c4d7f3299681cbf3dacf
	filesize: 982M
	filetype: .bed
	number_of_participants: 8224
	number_of_variants: 500527
	belongs_to_container: alspacdcs:5b87a9bf-879b-4d26-b3e2-aab9b14a1fdb
  - id: alspacdcs:af4a19ce-a0c0-4086-80da-da4a6865dae0
    name: Variant Information
    description: >-
      Information about SNPS
    data_distributions:
      - id: alspacdcs:f00b1310-f7f6-47c7-b46d-7082f43f542d
	name: freeze_id.bim
	description: >-
	   Extended variant information file accompanying a .bed binary
	   genotype table. (--make-just-bim can be used to update just
	   this file.) A text file with no header line, and one line per
	   variant with the following six fields:

	   1. Chromosome code (either an integer, or 'X'/'Y'/'XY'/'MT';
	      '0' indicates unknown) or name
	   2. Variant identifier
	   3. Position in morgans or centimorgans (safe to use dummy value of '0')
	   4. Base-pair coordinate (1-based; limited to 231-2)
	   5. Allele 1 (corresponding to clear bits in .bed; usually minor)
	   6. Allele 2 (corresponding to set bits in .bed; usually major)
	md5sum: c7fa007331fab0e8b6ce5b78412848da
	filesize: 14M
	filetype: .bim
	number_of_variants: 500527
	belongs_to_container: alspacdcs:5b87a9bf-879b-4d26-b3e2-aab9b14a1fdb
  - id: alspacdcs:c3ac5077-d8d4-44d5-9456-3b731d23f67f
    name: sample info
    description: >-
      Sample ids
    data_distributions:
      - id: alspacdcs:c20cd22a-61ac-49a6-8adb-2b877868784f
	name: freeze_id.fam
	description: >-
	  A text file with no header line, and one line per sample
	  with the following six fields:
	    1. Family ID ('FID')
	    2. Within-family ID ('IID'; cannot be '0')
	    3. Within-family ID of father ('0' if father isn't in dataset)
	    4. Within-family ID of mother ('0' if mother isn't in dataset)
	    5. Sex code ('1' = male, '2' = female, '0' = unknown)
	    6. Phenotype value ('1' = control, '2' = case,
	    '-9'/'0'/non-numeric =
	    missing data if case/control)
	md5sum: befc659c2383222218e4e002665bdfb7
	filesize: 249k
	filetype: .fam
	number_of_participants: 8224
	belongs_to_container: alspacdcs:5b87a9bf-879b-4d26-b3e2-aab9b14a1fdb
  - id: alspacdcs:8d57fbeb-51de-48f9-a92f-92d70f936a5a
    name: Heterozygous haploid and nonmale Y chromosome call list
    description: >-
      A plink report
    data_distributions:
      - id: alspacdcs:dfc98933-69fa-4e53-99ab-55e50836ccbf
	name: freeze_id.hh
	description: >-
	  Produced automatically when the input data contains
	  heterozygous calls where they shouldn't be possible (haploid
	  chromosomes, male X/Y), or there are nonmissing calls for
	  nonmales on the Y chromosome.

	  A text file with one line per error (sorted primarily by
	  variant ID, secondarily by sample ID) with the following three fields:

	  Family ID
	  Within-family ID
	  Variant ID
	md5sum: 2b5b4d40d9f4a18755a94efd7c9709e3
	filesize: 1.7M
	filetype: .hh
	belongs_to_container: alspacdcs:5b87a9bf-879b-4d26-b3e2-aab9b14a1fdb

  - id: alspacdcs:e32ff428-5f2d-4c04-9c29-940c2812a867
    name: Logs
    description: >-
      plink log
    data_distributions:
      - id: alspacdcs:09912f6a-dd58-4723-8f0b-ae6825d30dc4
	name: freeze_id.log
	description: >-
	  plink log file
	md5sum: a54bea59148ed6b1aabfd590cea050a6
	filesize: 1.2K
	filetype: .log
	belongs_to_container: alspacdcs:5b87a9bf-879b-4d26-b3e2-aab9b14a1fdb

3.2. Genome-wide - Illumina exome core array - G0 partners (gwa_exome_g0p)

3.2.1. Description

This dataset contains genome wide array genotype calls for G0 mothers and partners.

3.2.2. Methodology

3,453 ALSPAC mother and fathers and 535,478 SNPs were genotyped using the Illumina HumanCoreExome chip genotyping platforms by the ALSPAC lab and called using GenomeStudio. The resulting raw genome-wide data were subjected to standard quality control methods using PLINK (v1.07). Individuals were excluded on the basis of gender mismatches (n = 80); minimal or excessive heterozygosity (n = 64); disproportionate levels of individual missingness (>5%, n = 60) and possible contamination (n = 3). Population stratification was assessed by multidimensional scaling analysis and compared with 1000 Genomes phase 3 data and principal component analysis (n = 266); all individuals with non-European ancestry were removed. Cryptic relatedness was measured as SNP relatedness in GCTA (relatedness > 0.1, n = 69 removed). SNPs with a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 1E-7) and those which failed GenomeStudio quality control measures were removed (n = 21,298). 6,594 duplicate SNPs were also removed. This resulted in 2,911 unrelated mothers and father genotypes at 507,586 SNPs. We then identified 2217 samples where aln assigned historically by the lab matched genetically assigned aln.

3.2.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:gwa_exome_g0p_2016-11-22_f3
name: Freeze 3 version 2016-11-22 Genome-wide - Illumina exome core array - G0 partners
description: >-
  Freeze 3 version 2016-11-22 Genome-wide array data including raw files and genotype calls for G0 partners, also including additional G0 mothers  who were absent from previous genotyping rounds
freeze_size: 281M
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_gwa_exome_g0p/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-1913
previous_freeze: alspacdcs:gwa_exome_g0p_2016-11-22_f2
freeze_of_alspac_dataset_version: alspacdcs:gwa_exome_g0p_2016-11-22
freeze_of_named_alspac_dataset: alspacdcs:gwa_exome_g0p

has_containers:
  - id: alspacdcs:6c843f1c-5225-4780-8a93-58315f5e9dfe
    name: data
    description: A dir/folder containing the plink data files

has_parts:
- id: alspacdcs:c5d1dff1-3f6d-4506-aebd-c56db36e8d85
  name: freeze_id
  data_distributions:
  - id: alspacdcs:353f07d8-4345-4904-bca2-f6549612f38d
    name: freeze_id.fam
    description: >-
	A text file with no header line, and one line per sample with the following six fields:

	1. Family ID ('FID')
	2. Within-family ID ('IID'; cannot be '0')
	3. Within-family ID of father ('0' if father isn't in dataset)
	4. Within-family ID of mother ('0' if mother isn't in dataset)
	5. Sex code ('1' = male, '2' = female, '0' = unknown)
	6. Phenotype value ('1' = control, '2' = case,
	'-9'/'0'/non-numeric =
	missing data if case/control)

	Here We use both the first two fields to have the full id of the
	participant. i.e. not separate family and within family ids.
    md5sum: 7c8d1559304240941ef9a047d84299f4
    filesize: 123KB
    filetype: .fam
    number_of_participants: 2198
    belongs_to_container: alspacdcs:6c843f1c-5225-4780-8a93-58315f5e9dfe
- id: alspacdcs:c5f6f2d2-3f61-4bfa-a43c-c6e349a76607
  name: freeze_id
  data_distributions:
  - id: alspacdcs:1b737f6f-f7b5-425c-ae16-45ce9ae8796c
    name: freeze_id.bim
    description: >-
      Extended variant information file accompanying a .bed binary
	genotype table. (in plink you can use --make-just-bim can be used to update just
	this file.) A text file with no header line, and one line per
	variant with the following six fields:

	  1.Chromosome code (either an integer, or 'X'/'Y'/'XY'/'MT';
	  '0' indicates unknown)
	  or name
	  2. Variant identifier
	  3. Position in morgans or centimorgans (safe to use dummy value of '0')
	  4. Base-pair coordinate (1-based; limited to 231-2)
	  5. Allele 1 (corresponding to clear bits in .bed; usually minor)
	  6. Allele 2 (corresponding to set bits in .bed; usually major)

    md5sum: 0fe43f888776059fef0a76d3f08d00ad
    filesize: 14MB
    filetype: .bim
    number_of_variants: 507586
    belongs_to_container: alspacdcs:6c843f1c-5225-4780-8a93-58315f5e9dfe
- id: alspacdcs:cceed8c5-0276-4dbe-a617-7b585078caa0
  name: freeze_id
  data_distributions:
  - id: alspacdcs:40da3e0d-a450-46ee-954e-c1b70751f3d0
    name: freeze_id.bed
    description: >-
      Primary representation of genotype calls at biallelic
      variants. Must be accompanied by .bim and .fam files.

    md5sum: 304b0d356880c5174806ce08d7beffd3
    filesize: 267MB
    filetype: .bed
    number_of_participants: 2198
    number_of_variants: 507586
    belongs_to_container: alspacdcs:6c843f1c-5225-4780-8a93-58315f5e9dfe
- id: alspacdcs:19556134-925a-4753-8834-933c4c74e784
  name: freeze_id
  data_distributions:
  - id: alspacdcs:2d9618aa-32bb-4b32-b62a-1b43f785584d
    name: freeze_id.log
    md5sum: 9df3e3d178b71bbc1370e89a329ae543
    filesize: 1.2KB
    filetype: .log
    belongs_to_container: alspacdcs:6c843f1c-5225-4780-8a93-58315f5e9dfe
- id: alspacdcs:b0d09887-a085-4d7c-a0d2-c9986a72c3db
  name: freeze_id
  data_distributions:
  - id: alspacdcs:e69564ad-9e75-4eea-8779-5ecaf04cff23
    name: freeze_id.hh
    description: >-
      plink .hh file see
      https://www.cog-genomics.org/plink/1.9/formats#hh 
    md5sum: 96660f1fa14a45bda605acdfb92f2d3e
    filesize: 116K
    filetype: .hh
    belongs_to_container: alspacdcs:6c843f1c-5225-4780-8a93-58315f5e9dfe

3.3. Genome-wide - Illumina 660 quad - G0 mothers (gwa_660_g0m)

3.3.1. Description

This dataset contains genome-wide array data including raw files and genotype calls for G0 mothers.

3.3.2. Methodology

ALSPAC mothers were genotyped using the Illumina human660W-quad array at Centre National de Génotypage (CNG) and genotypes were called with Illumina GenomeStudio. PLINK (v1.07) was used to carry out quality control measures on an initial set of 10,015 subjects and 557,124 directly genotyped SNPs. SNPs were removed if they displayed more than 5% missingness or a Hardy-Weinberg equilibrium P value of less than 1.0e-06. Additionally SNPs with a minor allele frequency of less than 1% were removed. Samples were excluded if they displayed more than 5% missingness, had indeterminate X chromosome heterozygosity or extreme autosomal heterozygosity. Samples showing evidence of population stratification were identified by multidimensional scaling of genome-wide identity by state pairwise distances using the four HapMap populations as a reference, and then excluded. Cryptic relatedness was assessed using a IBD estimate of more than 0.125 which is expected to correspond to roughly 12.5% alleles shared IBD or a relatedness at the first cousin level. Related subjects that passed all other quality control thresholds were retained. This resulted in 9,048 subjects and 526,688 SNPs passed these quality control filters.

3.3.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:gwa_660_g0m_2022-12-05_f3
name: Freeze 3 version 2022-12-05 Genome-wide - Illumina 660 quad - G0 mothers
description: >-
   Freeze 3 of genome-wide array data including genotype calls for G0 mothers
freeze_size: 2G
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_gwa_660_g0m/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
freeze_of_alspac_dataset_version: alspacdcs:gwa_660_g0m_2022-12-05
freeze_of_named_alspac_dataset: alspacdcs:gwa_660_g0m



has_containers:
  - id: alspacdcs:b610f7ab-8af9-4bd4-8edc-4d90cd0d2763
    name: data
    description: A dir/folder containing the plink data files

  - id: alspacdcs:ab3e1d38-8d4f-46c9-b860-cbccddecd012
    name: legacy1
    description: A dir/folder containing the plink data files. 
    Includes full set of SNPs but is missing ~500 mothers who 
    were excluded in legacy QC due to strict relatedness inclusion thresholds.
    belongs_to_container: alspacdcs:b610f7ab-8af9-4bd4-8edc-4d90cd0d2763

  - id: alspacdcs:9f6244b7-b1c6-4164-9180-c996255d8de1
    name: legacy2
    description: A dir/folder containing the plink data files
    Includes full set of individuals but due to legacy QC is restricted
    to a set of ~480k SNPs that overlap with the Illumina 550k array 
    (which was used for G1).
    belongs_to_container: alspacdcs:b610f7ab-8af9-4bd4-8edc-4d90cd0d2763

has_parts:
  - id: alspacdcs:39b88df9-de1c-4abd-abd2-68751b6a8e26
    name: Biallelic genotype table
    description: >-
      The genetic data
    data_distributions:
    - id: alspacdcs:e0cfc624-5e48-43d0-b31a-160ed23e9768
      name: freeze_id.bed
      description: >-
	Legacy 1 plink bed file.
	Primary representation of genotype calls at biallelic
	variants. Must be accompanied by .bim and .fam files.
	The legacy1 distribution of the plink bed file.
      md5sum: bb6389e3421f8c94994e85cf7390ae79
      filesize: 1021M
      filetype: .bed
      number_of_participants: 8123
      number_of_variants: 526688
      belongs_to_container: alspacdcs:ab3e1d38-8d4f-46c9-b860-cbccddecd012
    - id: alspacdcs:822e8560-f2f1-49fa-8b03-af745fe130ba
      name: freeze_id.bed
      description: >-
	Legacy 2 plink bed file.
	Primary representation of genotype calls at biallelic
	variants. Must be accompanied by .bim and .fam files.
	The legacy2 distribution of the plink bed file.
      md5sum: 870190e42e10c8c902f21e4f2f1cb96e
      filesize: 962M
      filetype: .bed
      number_of_variants: 465740
      number_of_participants: 8653
      belongs_to_container: alspacdcs:9f6244b7-b1c6-4164-9180-c996255d8de1
  - id: alspacdcs:58b8c0ca-ae2c-4cf6-b4f7-7cbf17a3b10f
    name: Variant Information 
    description: >-
      Information about genetic variants
    data_distributions:
    - id: alspacdcs:60b656f8-fb99-4ccd-9701-d7d896b4658d
      name: freeze_id.bim
      description: >-
	Legacy 1
	Extended variant information file accompanying a .bed binary
	genotype table. (--make-just-bim can be used to update just
	this file.) A text file with no header line, and one line per
	variant with the following six fields:

	  1.Chromosome code (either an integer, or 'X'/'Y'/'XY'/'MT';
	  '0' indicates unknown)
	  or name
	  2. Variant identifier
	  3. Position in morgans or centimorgans (safe to use dummy value of '0')
	  4. Base-pair coordinate (1-based; limited to 231-2)
	  5. Allele 1 (corresponding to clear bits in .bed; usually minor)
	  6. Allele 2 (corresponding to set bits in .bed; usually major)

      md5sum: db817272cbc16d31083e1c788f03996c
      filesize: 14M
      filetype: .bim
      number_of_variants: 526688
      belongs_to_container: alspacdcs:ab3e1d38-8d4f-46c9-b860-cbccddecd012
    - id: alspacdcs:de57fccd-d6a5-4355-8522-c283f9ca589c
      name: freeze_id.bim
      description: >-
	Legacy 2 
	Extended variant information file accompanying a .bed binary
	genotype table. (--make-just-bim can be used to update just
	this file.) A text file with no header line, and one line per
	variant with the following six fields:

	  1.Chromosome code (either an integer, or 'X'/'Y'/'XY'/'MT';
	  '0' indicates unknown)
	  or name
	  2. Variant identifier
	  3. Position in morgans or centimorgans (safe to use dummy value of '0')
	  4. Base-pair coordinate (1-based; limited to 231-2)
	  5. Allele 1 (corresponding to clear bits in .bed; usually minor)
	  6. Allele 2 (corresponding to set bits in .bed; usually major)

      md5sum: 8eb8e81af2af1e06d818ca391488f210
      filesize: 13M
      filetype: .bim
      number_of_variants: 465740
      belongs_to_container: alspacdcs:9f6244b7-b1c6-4164-9180-c996255d8de1
  - id: alspacdcs:f456e79f-2255-42d8-a121-82d80293a034
    name:  Sample information
    description: >-
      Information about the samples for the dataset
    data_distributions:
    - id: alspacdcs:c59e34c3-62e6-4750-8beb-665876b255ff
      name: freeze_id.fam
      description: >-
	legacy 1

	A text file with no header line, and one line per sample with the following six fields:

	1. Family ID ('FID')
	2. Within-family ID ('IID'; cannot be '0')
	3. Within-family ID of father ('0' if father isn't in dataset)
	4. Within-family ID of mother ('0' if mother isn't in dataset)
	5. Sex code ('1' = male, '2' = female, '0' = unknown)
	6. Phenotype value ('1' = control, '2' = case,
	'-9'/'0'/non-numeric =
	missing data if case/control)

      md5sum: e8d10db354416efa0a3cfe60dcf7d7df
      filesize: 254K
      filetype: .fam
      number_of_participants: 8123
      belongs_to_container: alspacdcs:ab3e1d38-8d4f-46c9-b860-cbccddecd012
    - id: alspacdcs:b454c198-d15b-4420-ab75-270c0377c6eb
      name: freeze_id.fam
      description: >-
	legacy2

	A text file with no header line, and one line per sample with the following six fields:

	1. Family ID ('FID')
	2. Within-family ID ('IID'; cannot be '0')
	3. Within-family ID of father ('0' if father isn't in dataset)
	4. Within-family ID of mother ('0' if mother isn't in dataset)
	5. Sex code ('1' = male, '2' = female, '0' = unknown)
	6. Phenotype value ('1' = control, '2' = case,
	'-9'/'0'/non-numeric =
	missing data if case/control)

      md5sum: b21a5ceb9b0aa2193614ee6f45da0bfa
      filesize: 448k
      filetype: .fam
      number_of_participants: 8653
      belongs_to_container: alspacdcs:9f6244b7-b1c6-4164-9180-c996255d8de1 
  - id: alspacdcs:971d0382-a395-465c-ba91-e73d5957c768
    name:  Log information
    description: >-
      Information about the plink run for making the dataset
    data_distributions:
    - id: alspacdcs:d6df50aa-c36f-4af9-8224-40f0cbd44e21
      name: freeze_id.log
      description: >-
	legacy 1 plink log file
      md5sum: de43ed5543e1cf7ec2abc351dd702190
      filesize: 1.1k
      filetype: .log
      belongs_to_container: alspacdcs:ab3e1d38-8d4f-46c9-b860-cbccddecd012
  - id: alspacdcs:2e99b905-8f7a-4679-91af-19eaa043b345
    name:  Log information
    description: >-
      Information about the plink run for making the dataset
    data_distributions:        
    - id: alspacdcs:fe7667c9-50ce-4385-a3f4-62a740a65336
      name: freeze_id.log
      description: >-
	legacy2 plink log file

      md5sum: 4a9cfccbef5ee2ce0b2ac502a8f83790
      filesize: 1.1k
      filetype: .log
      belongs_to_container: alspacdcs:9f6244b7-b1c6-4164-9180-c996255d8de1

3.4. Genome-wide - CNV - G1 (cnv_550_g1)

3.4.1. Description

This dataset contains predicted ALSPAC CNVs using PennCNV, generated from 23andMe raw genotype data.

3.4.2. Methodology

LRR and BAF data was missing from the 23andMe raw genotype data, so we had to generate this data ourselves using an in house algorithm. Once this data was generated, we ran PennCNV using the hh550 libraries.

There are filtered PennCNV calls. Multiple calls were merged using the 'clean_cnv.pl' script, using a merge fraction of 0.5. Individuals with > 30 CNVs, a Log R Ratio SD of >0.3, a BAF drift of > 0.002, and a waviness factor of > 0.05 were removed. CNVs in which at least 50% of the length of the CNV call overlapped with any of telomeric centromeric, immunoglobulin regions were removed using the 'scan_region.pl' script in PennCNV.

In addition, CNVs covering fewer than 5 probes, of a length < 5kb, and with a confidence score of below 10 were removed. Density was calculated as the number of probes in a CNV divided by the length of the CNV, and CNVs where the density of probes across the call was < 1 probe per 20kb was removed.

These QC parameters are suggestions only and provided in filtered.cnv. Analysts can apply their own filter parameters to the raw calls in data.cnv

3.4.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:cnv_550_g1_2015-11-09_f3
name: Genome-wide - CNV - G1 release version 2015-11-09 freeze 3
description: >-
  This is the third freeze of the 2015-11-09 version of
  cnv_550_g1 dataset.
  It contains two csv versions of the cnv called data, the unfilterd
  and filtered versions.
freeze_size: 27m
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_cnv_550_g1/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
previous_freeze: alspacdcs:cnv_550_g1_2015-11-09_f2
freeze_of_alspac_dataset_version: alspacdcs:cnv_550_g1_2015-11-09
freeze_of_named_alspac_dataset: alspacdcs:cnv_550_g1
has_parts:
  - id: alspacdcs:cnv_550_g1_2015-11-09_cnvdata_f3
    name: Unfiltered CNV data
    description: >- 
      This is the output of Penncnv before filtering.
      columns
	V1 - Position
	V2 - Number of markers in the region
	V3 - CNV length
	V4 - Copy number estimate
	V6 - Start SNP
	V7 - End SNP
	V8 - Confidence score
	qlet - within pregnancy ID
	cnv_550_g1 - Individual ID
    data_distributions:
      - id: alspacdcs:cdd6bfc2e28db5a76806aa24c73df110_new_cnvdata.csv
	name: new_cnvdata.csv
	description: >- 
	  This is the csv file for the output of Penncnv before filtering.
	md5sum: cdd6bfc2e28db5a76806aa24c73df110
	filesize: 21M
	filetype: .csv
	number_of_participants: 7450  #data$id_qlet <- paste(data$cnv_550_g1, data$qlet, sep="_")
	#length(unique(data$id_qlet))
	number_of_cnv_variants: 70030 # Read file into R as data then:
	# dim(unique(data[1]))
	belongs_to_container: alspacdcs:bd0fb41e-f720-46a7-9ed0-04dd3e0b22bd

  - id: alspacdcs:cnv_550_g1_2015-11-09_filtered_f3
    name: Filtered CNV data
    description: >-
      CNV data that has been filtered.
      columns
	V1 - Position
	V2 - Number of markers in the region
	V3 - CNV length
	V4 - Copy number estimate
	V6 - Start SNP
	V7 - End SNP
	V8 - Confidence score
	qlet - within pregnancy ID
	cnv_550_g1 - Individual ID
    data_distributions:
      - id: alspacdcs:98eb9cb3bfd21eb807800de82f1e8099_new_filtered.csv
	name: new_filtered.csv
	description: >-
	  This is the csv file for the output of Penncnv after filtering.
	md5sum: 98eb9cb3bfd21eb807800de82f1e8099
	filesize: 5.9M
	filetype: .csv
	number_of_participants: 6793 # Read into data 2 in r
	# data2$id_qlet <- paste(data2$cnv_550_g1, data2$qlet, sep="_") and length(unique(data2$id_qlet))
	number_of_cnv_variants: 14244 #Read into data2 in r then
	#length(unique(data2$V1))
	belongs_to_container: alspacdcs:bd0fb41e-f720-46a7-9ed0-04dd3e0b22bd

has_containers:
  - id: alspacdcs:bd0fb41e-f720-46a7-9ed0-04dd3e0b22bd ## uuid
    name: data
    description: A dir/folder containing the two freeze data files

4. Imputed Data

4.1. Genome-wide - HRC imputed - G0 mothers + G1 (gi_hrc_g0m_g1)

SNP chips are useful for the generation of data on hundreds of thousands of SNPs, but there are millions more polymorphisms that remain untyped with this technology. If suitable numbers of whole genome sequences exist (e.g. 1000 genomes data) then millions of genotypes that are missing from a sample because they have not been typed by SNP chips can be imputed using probabilistic methods. Here the ALSPAC mother and children data were imputed to a new reference panel known as the Haplotype Reference Consortium (HRC) panel. This comprises around 31000 sequenced individuals (mostly European), so the coverage of European haplotypes is much greater than in other panels. As a consequence imputation accuracy is expected to improve, particularly at lower frequencies.

4.1.1. Description

This dataset contains genotype data imputed to HRC for G0 mothers and G1.

4.1.2. Methodology

ALSPAC children were genotyped using the Illumina HumanHap550 quad chip genotyping platforms by 23andme subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, US. The resulting raw genome-wide data were subjected to standard quality control methods. Individuals were excluded on the basis of gender mismatches; minimal or excessive heterozygosity; disproportionate levels of individual missingness (>3%) and insufficient sample replication (IBD < 0.8). Population stratification was assessed by multidimensional scaling analysis and compared with Hapmap II (release 22) European descent (CEU), Han Chinese, Japanese and Yoruba reference populations; all individuals with non-European ancestry were removed. SNPs with a minor allele frequency of < 1%, a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 5E-7) were removed. Cryptic relatedness was measured as proportion of identity by descent (IBD > 0.1). Related subjects that passed all other quality control thresholds were retained during subsequent phasing and imputation. 9,115 subjects and 500,527 SNPs passed these quality control filters.

ALSPAC mothers were genotyped using the Illumina human660W-quad array at Centre National de Génotypage (CNG) and genotypes were called with Illumina GenomeStudio. PLINK (v1.07) was used to carry out quality control measures on an initial set of 10,015 subjects and 557,124 directly genotyped SNPs. SNPs were removed if they displayed more than 5% missingness or a Hardy-Weinberg equilibrium P value of less than 1.0e-06. Additionally SNPs with a minor allele frequency of less than 1% were removed. Samples were excluded if they displayed more than 5% missingness, had indeterminate X chromosome heterozygosity or extreme autosomal heterozygosity. Samples showing evidence of population stratification were identified by multidimensional scaling of genome-wide identity by state pairwise distances using the four HapMap populations as a reference, and then excluded. Cryptic relatedness was assessed using a IBD estimate of more than 0.125 which is expected to correspond to roughly 12.5% alleles shared IBD or a relatedness at the first cousin level. Related subjects that passed all other quality control thresholds were retained during subsequent phasing and imputation. 9,048 subjects and 526,688 SNPs passed these quality control filters.

We combined 477,482 SNP genotypes in common between the sample of mothers and sample of children. We removed SNPs with genotype missingness above 1% due to poor quality (11,396 SNPs removed) and removed a further 321 subjects due to potential ID mismatches. This resulted in a dataset of 17,842 subjects containing 6,305 duos and 465,740 SNPs (112 were removed during liftover and 234 were out of HWE after combination). We estimated haplotypes using ShapeIT (v2.r644) which utilises relatedness during phasing. The phased haplotypes were then imputed to the Haplotype Reference Consortium (HRCr1.1, 2016) panel of approximately 31,000 phased whole genomes. The HRC panel was phased using ShapeIt v2.r727, and the imputation was performed using the Michigan imputation server.

This gave 8,237 eligible children and 8,196 eligible mothers with available genotype data after exclusion of related subjects using cryptic relatedness measures described previously.

Phasing parameters: States: 100 Window: 2 Effective population size: 11418 Genetic map: 1000 genomes phase 1 version 3 (release date 21/05/2011) Burn in iterations: 7 Pruning iterations: 8 Main iterations: 20

4.1.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:gi_hrc_g0m_g1_2017-05-04_f3
name: >-
  Genome-wide - HRC imputed - G0 mothers + G1 version 2017-05-04
  freeze 3
description: >-
  Freeze 3 of version 2017-05-04 Genome-wide array data imputed to the HRC reference panel for G0 mothers and G1 individuals in bgen and sample file format (version 1.2). 
freeze_size: 115G
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_gi_hrc_g0m_g1/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
previous_freeze: alspacdcs:gi_hrc_g0m_g1_2017-05-04_f2.1
next_freeze:
freeze_of_alspac_dataset_version: alspacdcs:gi_hrc_g0m_g1_2017-05-04
freeze_of_named_alspac_dataset: alspacdcs:gi_hrc_g0m_g1

has_containers:
  - id: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c ## uuid
    name: data
    description: A dir/folder containing the freeze data bgen and .sample files

has_parts:
  - id: alspacdcs:0209cfa4-2362-484a-922d-022bac8f1dc9
    name: swapped_23_female
    data_distributions:
    - id: alspacdcs:6382dee1-05cc-48e7-badc-b44c7bd8cc42
      name: swapped_23_female.sample
      md5sum: c2798a33724ef0b56889f36d36420aab
      filesize: 746.1KB
      filetype: .sample
      number_of_participants: 12948
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:138dceb1-a573-48fe-86bb-96e7e6e1ec15
    name: filtered_17
    data_distributions:
    - id: alspacdcs:87c528dc-6ac8-45e2-94d0-7f620ab9d3c4
      name: filtered_17.bgen
      md5sum: a3e2c94bd25abbbb12b88b837a70b627
      filesize: 3.6GB
      filetype: .bgen
      number_of_variants: 1090072
      number_of_participants: 17450 
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:42c56ef1-eba2-418f-b20a-a8a0d65544a5
    name: filtered_11
    data_distributions:
    - id: alspacdcs:03ca58c5-d0ec-401f-9308-b6d0f8eb4107
      name: filtered_11.bgen
      md5sum: 1091def1834d9df7a1250bd9e906771b
      filesize: 5.2GB
      filetype: .bgen
      number_of_variants: 1936990
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:30076bbe-b1b1-4308-8f21-ef89e99fc649
    name: filtered_23female
    data_distributions:
    - id: alspacdcs:049f1f5e-3982-4ad5-b820-36f154cd2309
      name: filtered_23female.bgen
      md5sum: 6a2870bfad1f6dc302e30f348141c702
      filesize: 4.2GB
      filetype: .bgen
      number_of_variants: 1228035
      number_of_participants: 12948
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:12b615a9-c0dc-48b2-ad00-d0fb9555630c
    name: filtered_10
    data_distributions:
    - id: alspacdcs:7fa6070c-e7a9-4a7e-9c3a-680be67ad3cc
      name: filtered_10.bgen
      md5sum: f918d645452701e313c73a497ce0a7d3
      filesize: 5.1GB
      filetype: .bgen
      number_of_variants: 1927504
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:4e3ffa23-5232-4a62-9253-cf1c8b77a850
    name: filtered_16
    data_distributions:
    - id: alspacdcs:b0a79594-0c96-4c8f-b800-489acebeef35
      name: filtered_16.bgen
      md5sum: 06703f1d5b4ea054278c674f03c7fc99
      filesize: 4.1GB
      filetype: .bgen
      number_of_variants: 1281298
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:106c9401-9550-4d0f-b79a-58e556523876
    name: filtered_12
    data_distributions:
    - id: alspacdcs:13841df6-ed45-451e-b6a5-f33d19528c65
      name: filtered_12.bgen
      md5sum: 9cb8162b978697ad6c7ac32073cdf30a
      filesize: 5.1GB
      filetype: .bgen
      number_of_variants: 1848118
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:27eb64d7-9cfd-4a6d-9c9f-70b40e2b8fa7
    name: filtered_08
    data_distributions:
    - id: alspacdcs:72904e08-6d37-49e3-af93-3291c331779b
      name: filtered_08.bgen
      md5sum: 2bbc8cb921804dfd62c24d9b4250a179
      filesize: 5.7GB
      filetype: .bgen
      number_of_variants: 2242706
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:eae1d46d-3999-4a6f-a8fe-a5ab643f4c9b
    name: swapped
    data_distributions:
    - id: alspacdcs:ad8ac21f-60b5-4dbc-920c-e190048d0ec7
      name: swapped.sample
      md5sum: 7d4874c35dd01c388c62bc4fc3ac1409
      filesize: 1005.5KB
      filetype: .sample
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:e1fa5fe7-a9e5-4e99-a401-602b882e01ed
    name: filtered_04
    data_distributions:
    - id: alspacdcs:6b41585e-94bb-483b-aa6c-c2784db43bad
      name: filtered_04.bgen
      md5sum: a548f469f6883f4b1800a9f5af485731
      filesize: 7.9GB
      filetype: .bgen
      number_of_variants: 2787582
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:ccc93888-421b-4196-9a45-ed77094fb288
    name: filtered_23male
    data_distributions:
    - id: alspacdcs:25ce4f1a-01f1-4d1e-bf82-6e662bdb7560
      name: filtered_23male.bgen
      md5sum: 3724d6e36f9e29bfaadeaf28cf3bff19
      filesize: 1.2GB
      filetype: .bgen
      number_of_variants: 1228035
      number_of_participants: 4502
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:f82ab219-40ad-479d-a81a-b6884e62ec5f
    name: filtered_05
    data_distributions:
    - id: alspacdcs:251df0c3-924b-4bdf-b17f-d161abcf410b
      name: filtered_05.bgen
      md5sum: b2cc0285b722d34a91beae96a0336021
      filesize: 6.7GB
      filetype: .bgen
      number_of_variants: 2588170
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:5a4281e6-6a66-4d67-ab41-bb3ad8b28a70
    name: filtered_19
    data_distributions:
    - id: alspacdcs:034b00e1-b895-43df-87c5-4eb50534134d
      name: filtered_19.bgen
      md5sum: 57c941571a65f230946b6cceac9389c5
      filesize: 3.4GB
      filetype: .bgen
      number_of_variants: 868554
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:204cc6dc-3d90-4206-8311-cc0d13c426ce
    name: filtered_15
    data_distributions:
    - id: alspacdcs:45cf7a62-b9fd-4dc6-96e9-d141092fee95
      name: filtered_15.bgen
      md5sum: f667e6a2c995a6d0e34c2088c46a73aa
      filesize: 3.4GB
      filetype: .bgen
      number_of_variants: 1139215
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:50b32525-c314-45aa-8e9b-13aa334e4829
    name: filtered_06
    data_distributions:
    - id: alspacdcs:d54f5f9d-4a18-4578-b583-8ee442c232d0
      name: filtered_06.bgen
      md5sum: ca6e82024a0967dfbabef45f0ee0a36a
      filesize: 6.3GB
      filetype: .bgen
      number_of_variants: 2460112
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:6492b05b-c96e-4212-a9df-66fd8166a303
    name: filtered_07
    data_distributions:
    - id: alspacdcs:302a2770-064d-47c6-89c7-486bae0885de
      name: filtered_07.bgen
      md5sum: 6e1a041862675c915607fba05d2921d9
      filesize: 6.6GB
      filetype: .bgen
      number_of_variants: 2289306
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:190f6eca-5f94-4612-b21b-5ac29c96ed8a
    name: filtered_09
    data_distributions:
    - id: alspacdcs:1634ab85-cff8-4c22-8ffa-a7132473d6c1
      name: filtered_09.bgen
      md5sum: 95dc66c33bae79e5e0e37a165d10382c
      filesize: 4.5GB
      filetype: .bgen
      number_of_variants: 1675899
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:81eea65d-58a4-4761-8718-53d41fce694f
    name: filtered_01
    data_distributions:
    - id: alspacdcs:576d1c2c-bbd5-4ea6-8979-9ff0a1bb8d64
      name: filtered_01.bgen
      md5sum: 761d9e2114870f7d3e7dfe5c94b7ce47
      filesize: 8.6GB
      filetype: .bgen
      number_of_variants: 3069932
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:d6abd7c5-0094-4af7-9bcc-a01a43bf819c
    name: filtered_20
    data_distributions:
    - id: alspacdcs:3915759f-bb8c-4605-aca8-211381ca2ecc
      name: filtered_20.bgen
      md5sum: 7f6feb66e94a43ccc17dd9cee4246885
      filesize: 2.6GB
      filetype: .bgen
      number_of_variants: 884983
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:98612003-9a2a-41df-b5e5-52df68a5509e
    name: filtered_18
    data_distributions:
    - id: alspacdcs:685b818e-5c55-4c28-8d79-5bab676d34f4
      name: filtered_18.bgen
      md5sum: d9a36ab3d148e15190279503d17c67e6
      filesize: 3.1GB
      filetype: .bgen
      number_of_variants: 1104755
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:8ea74f9e-062c-4845-98a9-c8fc8ad4bb32
    name: filtered_03
    data_distributions:
    - id: alspacdcs:69234c86-7086-457d-9c69-71568aaef7e1
      name: filtered_03.bgen
      md5sum: 38813c0a552aa33edb2b4eabb99dae61
      filesize: 7.3GB
      filetype: .bgen
      number_of_variants: 2821895
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:669682ff-de83-463d-a6b3-77ccecbe7668
    name: swapped_23_male
    data_distributions:
    - id: alspacdcs:3621a305-27b0-4637-acdf-018ab3b95127
      name: swapped_23_male.sample
      md5sum: 9e3d7be6d73c999c3c55a0a0dbdfa468
      filesize: 259.5KB
      filetype: .sample
      number_of_participants: 4502
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:047161b7-af54-4314-986b-92001ccb0d5c
    name: filtered_21
    data_distributions:
    - id: alspacdcs:9baef6f9-86d8-4fed-be4f-051da7f0f739
      name: filtered_21.bgen
      md5sum: 222f20b8fb823008e4a66cb9df14ee25
      filesize: 1.7GB
      filetype: .bgen
      number_of_variants: 531276
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:65f75f83-ad4b-4b79-b27d-5121a15625a1
    name: filtered_13
    data_distributions:
    - id: alspacdcs:e3bdf2f7-2d7f-4404-82ee-2ff785c52d08
      name: filtered_13.bgen
      md5sum: df09412f808737ced2e275a3bec68a75
      filesize: 3.7GB
      filetype: .bgen
      number_of_variants: 1385434
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:4c00ce92-d7e7-47c6-a9a8-455fda77b5ef
    name: filtered_14
    data_distributions:
    - id: alspacdcs:f2442b3d-44b3-4478-a447-1e59b965ae48
      name: filtered_14.bgen
      md5sum: 176cb3af44e9a771d9909f4a0a7219f2
      filesize: 3.5GB
      filetype: .bgen
      number_of_variants: 1266536
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:6980c0da-8698-4a0c-bff9-b03dea47369d
    name: filtered_02
    data_distributions:
    - id: alspacdcs:763cd279-809d-45ab-855b-e2492160df9c
      name: filtered_02.bgen
      md5sum: bd3acd8e0c90cf94bd0d0e889b317591
      filesize: 8.7GB
      filetype: .bgen
      number_of_variants: 3392238
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c
  - id: alspacdcs:72a3efb1-bc9d-4cea-ad47-11c602c4ab37
    name: filtered_22
    data_distributions:
    - id: alspacdcs:cd0cd547-c990-48c5-b52f-937e4f9f0678
      name: filtered_22.bgen
      md5sum: 15d8c159b8a3a01ebb16db4f915f52c5
      filesize: 1.8GB
      filetype: .bgen
      number_of_variants: 524544
      number_of_participants: 17450
      belongs_to_container: alspacdcs:f5eeb1f7-159b-4068-b876-b09d4864377c

4.2. Genome-wide - HapMap2 imputed - G1 (gi_hapmap2_g1)

4.2.1. Description

This dataset contains genotype data imputed to HapMap 2 for G1.

4.2.2. Methodology

A total of 9912 subjects were genotyped using the Illumina HumanHap550 quad genome-wide SNP genotyping platform by 23 and Me subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, USA. Individuals were excluded from further analysis on the basis of having incorrect gender assignments; minimal or excessive heterozygosity (<0.320 and >0.345 for the Sanger data and <0.310 and >0.330 for the LabCorp data); disproportionate levels of individual missingness (>3%); evidence of cryptic relatedness (>10% IBD) and being of non-European ancestry (as detected by a multidimensional scaling analysis seeded with HapMap 2 individuals, EIGENSTRAT analysis revealed no additional obvious population stratification and genome-wide analyses with other phenotypes indicate a low lambda). The resulting data set consisted of 8365 individuals (84% of those genotyped). SNPs with a minor allele frequency of <1% and call rate of <95% were removed. Furthermore, only SNPs which passed an exact test of Hardy-Weinberg equilibrium (P > 5 x 10-7) were considered for analysis. Genotypes were subsequently imputed with MACH 1.0.16 Markov Chain Haplotyping software, using CEPH individuals from phase 2 of the HapMap project as a reference set (release 22).

Associated publication: https://doi.org/10.1093/hmg/ddr309

4.2.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:gi_hapmap2_g1_2022-12-07_f3
name: Genome-wide - HapMap2 imputed - G1 version 2022-12-07 freeze 3
description: >-
  Freeze 3 of 2022-12-07 version of Genome-wide array data imputed to the HapMap2 reference panel for G1 individuals

freeze_size: 5G
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_gi_hapmap2_g1/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
previous_freeze: alspacdcs:gi_hapmap2_g1_2022-12-07_f2
next_freeze:
freeze_of_alspac_dataset_version: alspacdcs:gi_hapmap2_g1_2022-12-07
freeze_of_named_alspac_dataset: alspacdcs:gi_hapmap2_g1


has_containers:

  - id: alspacdcs:63f74523-9ddd-4cc7-9037-d166bd1edba9
    name: data
    description: A dir/folder containing the plink freeze data files



has_parts:
  - id: alspacdcs:1219c7c8-6a5b-482a-bdfc-2f26a4df5885
    name: freeze_id
    data_distributions:
    - id: alspacdcs:60f60918-c307-4350-a88e-1138c360a72b
      name: freeze_id.fam
      md5sum: 0b42e5ffecb0fef6ed4702d1932eb424
      filesize: 274KB
      filetype: .fam
      number_of_participants: 8224
      belongs_to_container: alspacdcs:63f74523-9ddd-4cc7-9037-d166bd1edba9
  - id: alspacdcs:dd090263-6041-48f4-a28c-af81ff709614
    name: freeze_id
    data_distributions:
    - id: alspacdcs:25a9b8c3-0426-4cfb-b916-d6807c36fc50
      name: freeze_id.bim
      md5sum: 854d50582220c70ae5645b1a1c799af1
      filesize: 68MB
      filetype: .bim
      number_of_variants: 2543887
      belongs_to_container: alspacdcs:63f74523-9ddd-4cc7-9037-d166bd1edba9
  - id: alspacdcs:8b10969b-0e48-4c27-8e9d-efc812042839
    name: freeze_id
    data_distributions:
    - id: alspacdcs:b2436f3c-fbd7-4470-ba44-dac809c767a3
      name: freeze_id.bed
      md5sum: 90f19b52657a4fff8b301efbd87ea057
      filesize: 4.9GB
      filetype: .bed
      number_of_variants: 2543887
      number_of_participants: 8224
      belongs_to_container: alspacdcs:63f74523-9ddd-4cc7-9037-d166bd1edba9
  - id: alspacdcs:260ae08f-7077-4fa6-b59d-e1249afddbcf
    name: freeze_id
    data_distributions:
    - id: alspacdcs:3313fbd3-cd49-4c77-a1c6-5cc48d17fdbb
      name: freeze_id.log
      md5sum: 01f4f9e9e96f38d23fc50f386fd9a081
      filesize: 958B
      filetype: .log
      belongs_to_container: alspacdcs:63f74523-9ddd-4cc7-9037-d166bd1edba9

4.3. Genome-wide - HapMap2 imputed - G0 mothers (gi_hapmap2_g0m)

4.3.1. Description

This dataset contains genotype data imputed to HapMap 2 for G0 mothers.

4.3.2. Methodology

A total of 10 015 women (mothers from the ALSPAC cohort) were genotyped using the Illumina 660 quad SNP chip which contains 557 124 SNP markers. Markers with minor allele frequency < 1%, SNPs with >5% missing genotypes and any markers that failed an exact test of Hardy-Weinberg equilibrium (P < 1 x 10-6) were excluded from further analyses. Genome-wide identity by state sharing was calculated for each pair of individuals in the cohort to identify cryptic relatedness. In order to identify individuals who might have ancestries other than Western European, we merged data from both cohorts with the 60 western European (CEU) founder, 60 Nigerian (YRI) founder and 90 Japanese (JPT) and Han Chinese (CHB) individuals from the International HapMap Project. Genome-wide IBS distances for each pair of individuals were calculated on markers shared between the HapMap and the Illumina 660K SNP chip, and then the multidimensional scaling option in R was used to generate a two-dimensional plot based upon individuals' scores on the first two principal coordinates from this analysis. Samples that did not cluster with the CEU individuals were excluded from subsequent analyses. In addition, we plotted the proportion of missing data for each individual against their genome-wide heterozygosity. Any individual, who did not cluster with others, was removed from further analyses. Samples were also excluded from analyses in the case of excessive missingness (>5%), unusual genome-wide or X chromosome heterozygosity, as well as one individual from each pair of putatively related individuals (genome-wide IBD >10%). After data cleaning, 8340 individuals and 526688 SNPs were left in the genome-wide data set.

We then conducted imputation using the MACH Markov Chain Haplotyping software with CEU individuals from phase 2 of the HapMap project as a reference set (release 22). The final imputed data set consisted of 8340 individuals, each with 2 594 390 imputed markers. Only imputed genotypes with minor allele frequencies ≥1% and R-sqr ≥0.3 were considered for association. Of these 8340 with genetic data, 2874 mothers also had phenotype data available.

Associated publication: https://doi.org/10.1093/hmg/ddt239

4.3.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:gi_hapmap2_g0m_2022-12-07_f3
name: Genome-wide - HapMap2 imputed - G0 mothers version 2022-12-07 freeze 3
description: >-
  Version 2022-12-07 freeze 3 of Genome-wide array data imputed to the HapMap2 reference panel for G0 mothers.
  The number of variants & individuals within each plink file set can be viewed within the log file.
freeze_size: 4.9G
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_gi_hapmap2_g0m/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
previous_freeze: alspacdcs:gi_hapmap2_g0m_2022-12-07_f2
freeze_of_alspac_dataset_version: alspacdcs:gi_hapmap2_g0m_2022-12-07
freeze_of_named_alspac_dataset: alspacdcs:gi_hapmap2_g0m


has_containers:
  - id: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6 ## uuid
    name: plink
    description: A dir/folder containing the plink freeze data files. There are 8123 individuals within this dataset. 


has_parts:
  - id: alspacdcs:2a0c29b0-c0d3-4b42-a1af-ca050ffc4c69
    name: freeze_id_chr7
    data_distributions:
    - id: alspacdcs:a338bb73-c267-4744-87ec-445574c38a7b
      name: freeze_id_chr7.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:cf60a483-992a-4288-a209-7bb8e475c232
    name: freeze_id_chr16
    data_distributions:
    - id: alspacdcs:1e666615-f5f3-48e8-844a-947deb910273
      name: freeze_id_chr16.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:eee7a728-b4c1-4dba-8895-92945ee1d5da
    name: freeze_id_chr6
    data_distributions:
    - id: alspacdcs:c77a7f1f-f262-4c69-b5be-875817dc987f
      name: freeze_id_chr6.log
      md5sum: 1f402fdbae9972ac3de162d17e07686f
      filesize: 988.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:f1a9a04d-cffc-4ec6-8bde-c7dbd52d11db
    name: freeze_id_chr20
    data_distributions:
    - id: alspacdcs:c6d97d9a-7ac8-49a4-b2bb-433de0913204
      name: freeze_id_chr20.log
      md5sum: 75a0270192c017296773f508784bf94a
      filesize: 992.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:d0212c8a-b2cd-42d7-9023-9d379588e522
    name: freeze_id_chr5
    data_distributions:
    - id: alspacdcs:941aaf89-6286-455e-b980-a557d2a3d940
      name: freeze_id_chr5.log
      md5sum: 9f55daab5fb6e209e7276744e454a9e7
      filesize: 988.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:c559f04b-256f-4157-ba60-acb657915425
    name: freeze_id_chr5
    data_distributions:
    - id: alspacdcs:56d2a337-61a1-497b-a05a-bbf455e3c6ae
      name: freeze_id_chr5.bim
      md5sum: a54c2542a07c18c0303c29b4f34a107e
      filesize: 4.4MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:0e403787-f35f-4d35-a40d-baf78c4ef900
    name: freeze_id_chr20
    data_distributions:
    - id: alspacdcs:dc1dd187-987d-4fc4-94e8-14d91446f22b
      name: freeze_id_chr20.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:6568b5c1-83e7-4c2f-8b05-3aad674a6cc2
    name: freeze_id_chr6
    data_distributions:
    - id: alspacdcs:eddd9880-45f2-44c8-a9ae-720adf2995b8
      name: freeze_id_chr6.bed
      md5sum: 684a7aed0e2b84bed80fd59175adb083
      filesize: 353.3MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:3042183d-897a-434b-939f-7fcda032740e
    name: freeze_id_chr21
    data_distributions:
    - id: alspacdcs:efd7e847-bd17-4f16-a25b-eee3ad71574a
      name: freeze_id_chr21.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:2ad71f36-75c6-4862-a6e2-b1405fbffec6
    name: freeze_id_chr15
    data_distributions:
    - id: alspacdcs:b6e9d45e-b3a6-42bb-b93e-961b6332b74b
      name: freeze_id_chr15.bim
      md5sum: 5a6ef9aa0087da88eef574bc16d2a359
      filesize: 1.9MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:54626420-1dda-48a7-a817-eba0ce443baa
    name: freeze_id_chr1
    data_distributions:
    - id: alspacdcs:b14f3adb-fca2-4b2f-b86f-f256ac90498c
      name: freeze_id_chr1.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:1a28b5e2-f9bf-4a18-90e2-b21e42847d73
    name: freeze_id_chr3
    data_distributions:
    - id: alspacdcs:9203dd2c-1a3e-453b-bb75-9aeea5f6fff0
      name: freeze_id_chr3.bed
      md5sum: 3c516f3dc65bef87665fa1d4a9293a2a
      filesize: 337.7MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:aa843393-d65c-42b0-a668-ae82db2bfa5a
    name: freeze_id_chr1
    data_distributions:
    - id: alspacdcs:c4e968a3-098e-4c30-af50-2a6c135516b4
      name: freeze_id_chr1.bed
      md5sum: 281cd2702a11d797a9465d63d80a3fc9
      filesize: 374.9MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:1fa3e3a1-d145-4424-adce-439a23e39f23
    name: freeze_id_chr18
    data_distributions:
    - id: alspacdcs:fe71b45e-244e-4dc3-8e76-2c843a9e41fc
      name: freeze_id_chr18.bim
      md5sum: b2c3323ec24277ecff0b221f6d2795e3
      filesize: 2.1MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:5e3b344a-9a0b-4920-80da-c228b7ea9e69
    name: freeze_id_chr12
    data_distributions:
    - id: alspacdcs:ab775cd6-55ad-42b6-a6e4-b56b51f90ecd
      name: freeze_id_chr12.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:d9fd5026-873b-4338-9392-00d550549aad
    name: freeze_id_chr4
    data_distributions:
    - id: alspacdcs:03fab819-6746-44e4-9a8e-b2a0765eb485
      name: freeze_id_chr4.bed
      md5sum: e3d259ed4e229f0b0d82933e1e3e9c00
      filesize: 316.0MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:be3078e0-da76-40f9-99ee-20d87dfc9a1b
    name: freeze_id_chr12
    data_distributions:
    - id: alspacdcs:f7f50212-6814-431c-9839-41aa2adc3e70
      name: freeze_id_chr12.bed
      md5sum: d5472422c20fb1bf4a4dd7d54e55ef72
      filesize: 241.8MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:eaaf914b-2f14-4b45-9d9e-34fd676365e0
    name: freeze_id_chr2
    data_distributions:
    - id: alspacdcs:f5d7d497-eb03-42c7-a976-528e384b9325
      name: freeze_id_chr2.log
      md5sum: 0a49fbf307da5c8d72504241c9604685
      filesize: 988.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:d1a2426b-701a-47d4-b294-70c7983ab615
    name: freeze_id_chr2
    data_distributions:
    - id: alspacdcs:2f6be232-c686-405d-b94d-e28536bc2bbe
      name: freeze_id_chr2.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:c6645029-416d-4773-97fd-d4cf5b8efb75
    name: freeze_id_chr18
    data_distributions:
    - id: alspacdcs:db741e70-cde2-41e9-b03c-f5fe07b43403
      name: freeze_id_chr18.log
      md5sum: 37dc1ac70a199c45688c54a50dbfe62a
      filesize: 992.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:f80d1aae-a683-474e-a817-4a10bcb1e403
    name: freeze_id_chr20
    data_distributions:
    - id: alspacdcs:6ef7c246-cad3-440b-ab48-6ac5576d6071
      name: freeze_id_chr20.bed
      md5sum: bff6128829dcce99eaa87ce643b53894
      filesize: 122.8MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:801b6dc1-acf3-4531-a343-23ae06f84fac
    name: freeze_id_chr4
    data_distributions:
    - id: alspacdcs:fd01c3e0-5fcc-4aae-8396-90e950896735
      name: freeze_id_chr4.log
      md5sum: c13619d1640920bff07d3c7a59c971d3
      filesize: 988.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:2689a053-ef8e-4c06-b6b1-61105f4026fe
    name: freeze_id_chr10
    data_distributions:
    - id: alspacdcs:360e890a-738d-40f6-bbb0-2012c2125ac9
      name: freeze_id_chr10.log
      md5sum: c4ccca8e64f12bfcf15a7e70be1a3f06
      filesize: 994.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:c74b56df-5025-4a2e-86d5-66d21fd23ffe
    name: freeze_id_chr14
    data_distributions:
    - id: alspacdcs:884070a5-a2d1-4ec1-8bef-e7e5a79567d8
      name: freeze_id_chr14.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:646350e7-af4f-4521-86dc-4efb78d78b39
    name: freeze_id_chr17
    data_distributions:
    - id: alspacdcs:df2cd9cd-bf28-499a-9de2-26f5676e6e4e
      name: freeze_id_chr17.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:456a3a2c-5452-4147-8a44-d8ce22b2cced
    name: freeze_id_chr8
    data_distributions:
    - id: alspacdcs:a376176f-c1d6-4ce1-895d-54a1b3a2c12a
      name: freeze_id_chr8.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:bcc01530-fe95-4c6f-98c1-768a0f9345bf
    name: freeze_id_chr9
    data_distributions:
    - id: alspacdcs:f216feb5-a5ed-4496-9333-74ea7bdf2f9f
      name: freeze_id_chr9.bed
      md5sum: a20184b5477ec3292a7d321404191751
      filesize: 236.5MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:68b6e7cc-1655-4858-87ea-16a8c1afbe94
    name: freeze_id_chr6
    data_distributions:
    - id: alspacdcs:269b45c4-98f5-43d5-abe8-5dd7bf9a1183
      name: freeze_id_chr6.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:f8b35cf8-ccd4-42ad-937c-f69109cfd02d
    name: freeze_id_chr12
    data_distributions:
    - id: alspacdcs:49c411bc-2bb5-4d82-8b42-a625660e1695
      name: freeze_id_chr12.log
      md5sum: a9905ea972946b7522b872a8103df342
      filesize: 994.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:91755239-994d-4458-92f2-cd306ffd4278
    name: freeze_id_chr4
    data_distributions:
    - id: alspacdcs:3d9d591e-aff0-4156-bdf9-6446d48d66af
      name: freeze_id_chr4.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:631dc001-27ef-49ff-94c2-918003a62bc9
    name: freeze_id_chr19
    data_distributions:
    - id: alspacdcs:a933b3eb-a440-4d39-bd5f-ddefbbfa219e
      name: freeze_id_chr19.bim
      md5sum: eb33f90e6c4f631aa9683c2fff5e75f7
      filesize: 1012.3KB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:3ed7b96a-da6e-4f83-abc3-6470482eff59
    name: freeze_id_chr20
    data_distributions:
    - id: alspacdcs:3f0a17a7-52d7-4125-bd48-1b84ff359d53
      name: freeze_id_chr20.bim
      md5sum: 2839fc89dc4e7a0b88107ed8c7d848b5
      filesize: 1.7MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:fb01e5fb-2135-41d4-a81b-fb7e9528ff46
    name: freeze_id_chr21
    data_distributions:
    - id: alspacdcs:52fd64f9-d143-4c05-b528-406b7bfbb14c
      name: freeze_id_chr21.bim
      md5sum: b9b6bcad51c47f94a6b42c9b564c48b3
      filesize: 924.7KB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:488d784f-9ad8-4d35-8066-7b9d219bc1a9
    name: freeze_id_chr13
    data_distributions:
    - id: alspacdcs:eb0c1deb-d2b8-4117-9d72-99560c96e9f1
      name: freeze_id_chr13.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:a26378b3-d3b4-4310-bc08-9daa0ba284f8
    name: freeze_id_chr15
    data_distributions:
    - id: alspacdcs:9d991a05-1e1f-4e5b-b4fd-937c8ed462b9
      name: freeze_id_chr15.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:443a8575-72af-46c3-9c88-5aa01799ac92
    name: freeze_id_chr11
    data_distributions:
    - id: alspacdcs:3b0092d3-8515-4a5a-a750-9a6725d959f8
      name: freeze_id_chr11.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:fdb290af-3470-4bb7-83a2-1df1a812517e
    name: freeze_id_chr14
    data_distributions:
    - id: alspacdcs:ef876dc3-698b-4a5b-a9d2-526644b5d615
      name: freeze_id_chr14.bim
      md5sum: e6098ee196c0fb7b5db26cbd31bcb096
      filesize: 2.3MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:e4b2ec7a-0fee-47ea-b592-61edc0af2eb5
    name: freeze_id_chr2
    data_distributions:
    - id: alspacdcs:590d3a80-352e-47f8-84b9-11958e5364e4
      name: freeze_id_chr2.bed
      md5sum: d6de77dc17d5eb8892b203171d0ced48
      filesize: 427.7MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:e431119d-30b7-46c7-baff-991d30503ffd
    name: freeze_id_chr18
    data_distributions:
    - id: alspacdcs:018f21f5-5f99-4b78-bd94-3dda5dae643b
      name: freeze_id_chr18.bed
      md5sum: 79a3cf5cccd16362910c770e40effa1a
      filesize: 148.8MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:35415f32-a6f8-47d6-bd17-4f9069e28d42
    name: freeze_id_chr7
    data_distributions:
    - id: alspacdcs:cd3b229f-8873-437f-aeb6-a9fc01f79950
      name: freeze_id_chr7.bim
      md5sum: 54756145070ef0d80ba126d0c5ae8ea6
      filesize: 3.8MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:75d323f3-bdf0-468b-9196-c1c55de89b58
    name: freeze_id_chr15
    data_distributions:
    - id: alspacdcs:2c39ea83-48c7-400a-85c9-5b83ec92f9fb
      name: freeze_id_chr15.log
      md5sum: 33ff3129151bfaa0d98d82e628d082e4
      filesize: 992.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:c58f6371-6d98-4f2b-9ea8-1b4346d582ca
    name: freeze_id_chr21
    data_distributions:
    - id: alspacdcs:86375ecf-b414-4392-8ab6-fcd6afbab8d9
      name: freeze_id_chr21.bed
      md5sum: 9e6b8a4df4b7e9f0877f65d1d8de87b9
      filesize: 65.6MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:606de011-edb1-43f1-baa1-54f24048479f
    name: freeze_id_chr4
    data_distributions:
    - id: alspacdcs:7fd75972-cbea-4994-b21c-91dc7d5dffa0
      name: freeze_id_chr4.bim
      md5sum: a2604d5a956bd4cbbdab49202aa69eea
      filesize: 4.3MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:dbf2ba9c-18c6-4097-8c05-583ebb47c7cb
    name: freeze_id_chr17
    data_distributions:
    - id: alspacdcs:12864476-402f-4d29-9fb2-bdcb6f4942a7
      name: freeze_id_chr17.log
      md5sum: cdfc7db390ca5dd8352e2a529bbd3e00
      filesize: 992.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:5bc41cec-50db-4cf0-ba86-ce685812062d
    name: freeze_id_chr14
    data_distributions:
    - id: alspacdcs:5567581e-b298-4cb9-8433-3ee9c104987f
      name: freeze_id_chr14.bed
      md5sum: 13d314d1b415f6ed09b498bbc722bfd0
      filesize: 162.6MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:bda525de-9e7b-41d3-bca1-34f7788591ff
    name: freeze_id_chr3
    data_distributions:
    - id: alspacdcs:bff82c67-4c01-447d-a41d-b05803940923
      name: freeze_id_chr3.log
      md5sum: d0ea2d5c24cdf426e2c78123ac4574ea
      filesize: 988.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:272ae0a1-b6bb-4f2c-ba34-23119e2a5275
    name: freeze_id_chr10
    data_distributions:
    - id: alspacdcs:c952d88a-e7e3-4f85-a656-ca7977b67101
      name: freeze_id_chr10.bim
      md5sum: e775a7afd2a2eed2608f2b65b2d829b9
      filesize: 3.8MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:bc80fdc6-cbcb-4a67-a203-04db70d1d527
    name: freeze_id_chr11
    data_distributions:
    - id: alspacdcs:eeaf7bc7-e792-4e8f-957c-c0766a66a08c
      name: freeze_id_chr11.bim
      md5sum: 313af6755f961e49ae6de817c365d509
      filesize: 3.5MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:c9bcb40a-20ce-4e47-90e7-18879d64f9b1
    name: freeze_id_chr9
    data_distributions:
    - id: alspacdcs:9b876769-7afe-465f-a0ae-ea11800e9be7
      name: freeze_id_chr9.log
      md5sum: bde38daafb9615032be46b6968cd116b
      filesize: 988.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:d9df8cc9-910d-45a3-9db3-5d4d632de1f9
    name: freeze_id_chr16
    data_distributions:
    - id: alspacdcs:a0ed7465-65e5-4966-9625-7a3da114b8a0
      name: freeze_id_chr16.bed
      md5sum: 450e93cf17c7306ca1bfc9a5b7d4bbcb
      filesize: 138.6MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:0cdc4c6f-8e65-481c-8d43-7da09480dd04
    name: freeze_id_chr13
    data_distributions:
    - id: alspacdcs:d49044e2-1053-4d53-baec-8ccca789f4cc
      name: freeze_id_chr13.bed
      md5sum: ce0b1edef716d842cc4dadcbea3fbe2f
      filesize: 201.7MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:5792fa9d-fb9b-4f40-9fd2-0aa8f1a1a452
    name: freeze_id_chr16
    data_distributions:
    - id: alspacdcs:6999289c-b783-4f0d-8e0b-6984bdd04787
      name: freeze_id_chr16.bim
      md5sum: b16a743ffcb5b7914b17b20edc7b9d2a
      filesize: 1.9MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:c34459eb-d69f-4b62-a452-9d00ef6609f1
    name: freeze_id_chr2
    data_distributions:
    - id: alspacdcs:071dfc83-36ea-4796-9a23-d00d6a2e53ad
      name: freeze_id_chr2.bim
      md5sum: 3462f2499de7d0a050821ce398b54d1a
      filesize: 5.9MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:8cb108f2-8391-4fde-8b5e-83a89c0240a5
    name: freeze_id_chr1
    data_distributions:
    - id: alspacdcs:4c8483bd-e02e-4dbb-bcc7-0185a70bfb36
      name: freeze_id_chr1.log
      md5sum: 6cc071f7b7ff5f12fbcf6eea6dd73c43
      filesize: 988.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:c5aa8d14-feb6-4a7d-914e-7b0181635a90
    name: freeze_id_chr22
    data_distributions:
    - id: alspacdcs:4e2840e7-d233-40dc-bd87-29041a07e130
      name: freeze_id_chr22.bed
      md5sum: 456cb4e948d412c835857055f441e1e8
      filesize: 65.5MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:867c75d3-2575-4a54-ab64-651ff91f1521
    name: freeze_id_chr17
    data_distributions:
    - id: alspacdcs:db449126-ab5c-4458-a175-0784f5941f37
      name: freeze_id_chr17.bim
      md5sum: fa35dae8d4fb5433460cc8d42deddb6e
      filesize: 1.6MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:3038426d-3f94-4957-bdc9-c96788c39157
    name: freeze_id_chr18
    data_distributions:
    - id: alspacdcs:98c73740-bc04-4814-a409-aba44758509f
      name: freeze_id_chr18.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:d56d3675-a4bd-489a-a837-b080d8300fc2
    name: freeze_id_chr14
    data_distributions:
    - id: alspacdcs:707295b8-cd15-4685-b876-74a69e1cee23
      name: freeze_id_chr14.log
      md5sum: c54920b54bd39996480f01e6121ac944
      filesize: 992.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:bf515d14-b521-4570-96f2-da4d53c49c88
    name: freeze_id_chr9
    data_distributions:
    - id: alspacdcs:51848fd3-3315-41ce-a7ed-de89c26c1480
      name: freeze_id_chr9.bim
      md5sum: 729b7f9b759d4b97fb32cad65887b498
      filesize: 3.2MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:bd53b8a3-bdeb-4530-a855-5eb3802cea7d
    name: freeze_id_chr1
    data_distributions:
    - id: alspacdcs:d42ce2ab-e264-4604-85e7-a4c9bdfa443d
      name: freeze_id_chr1.bim
      md5sum: 443f081673deb28df6fd408bf1accb50
      filesize: 5.1MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:c7b4cd49-2719-4e34-a507-14d5b7f5b4b9
    name: freeze_id_chr6
    data_distributions:
    - id: alspacdcs:36b0f6b0-f6f5-4cdc-a2d6-0fdeb005b786
      name: freeze_id_chr6.bim
      md5sum: 079964fd3600112a6241f47a0fa778ce
      filesize: 4.8MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:17f253d0-9d0e-4de7-91bd-236f169a32f2
    name: freeze_id_chr7
    data_distributions:
    - id: alspacdcs:437e339b-691d-4b22-a2ee-fbf7ed2fe32f
      name: freeze_id_chr7.log
      md5sum: 2b418ae4df815b3cb34cdd13fdb6b0fb
      filesize: 988.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:363ab6c5-2c5f-4e8b-aa8b-d20044a26e06
    name: freeze_id_chr11
    data_distributions:
    - id: alspacdcs:71d4d592-0282-4db0-9037-b23137f1db5c
      name: freeze_id_chr11.bed
      md5sum: 5736da3f7d53daaa4f74f9f0fd8baf40
      filesize: 251.9MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:197f336f-d902-4bf1-ae63-d40308fa8e23
    name: freeze_id_chr12
    data_distributions:
    - id: alspacdcs:1a11a64b-c328-4ae8-8a4b-68ffcb30597e
      name: freeze_id_chr12.bim
      md5sum: 656b255438b8175b1739abf77e171526
      filesize: 3.4MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:7f9a7384-e4eb-47e3-9e82-4857ed172383
    name: freeze_id_chr7
    data_distributions:
    - id: alspacdcs:634bc271-7760-4813-9633-aefc9207a047
      name: freeze_id_chr7.bed
      md5sum: da1d1397bdb574aa88e32401065f80c5
      filesize: 277.4MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:6b31c029-3d6b-4f87-953d-f90e8b6d6d09
    name: freeze_id_chr22
    data_distributions:
    - id: alspacdcs:a0afed4e-dbc6-41f6-bcb6-86462a0fa4e5
      name: freeze_id_chr22.log
      md5sum: 6de55fc23e840f7c2d3e3f4b13296d6e
      filesize: 992.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:474b2704-2e9a-4ff8-b1a2-b60be16363fe
    name: freeze_id_chr3
    data_distributions:
    - id: alspacdcs:a089aee0-b41d-4668-b22a-559631bb2402
      name: freeze_id_chr3.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:aac0c218-a214-49d9-ba58-eb4311b62e98
    name: freeze_id_chr16
    data_distributions:
    - id: alspacdcs:3a59e242-c2d9-4844-8e0c-b38e226acaa5
      name: freeze_id_chr16.log
      md5sum: 7146b2b9e7a3d5ab8a3e2c73d23e0f4a
      filesize: 992.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:c01959e8-02a4-494b-9cb2-a8efc4860007
    name: freeze_id_chr9
    data_distributions:
    - id: alspacdcs:2465cb98-ccea-42a1-bcc0-672e1fa9e767
      name: freeze_id_chr9.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:8fdeeceb-a829-44a7-bec2-07d7ada330d3
    name: freeze_id_chr3
    data_distributions:
    - id: alspacdcs:6cf89d0d-b954-4ae2-bce7-7df2e833c392
      name: freeze_id_chr3.bim
      md5sum: 3232f0e3bcf4f18841a75e96ca89dec5
      filesize: 4.6MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:11752567-c3bb-41e2-84f5-1bfe4857290c
    name: freeze_id_chr10
    data_distributions:
    - id: alspacdcs:4abd22b9-fedf-422b-80e8-ceed311a222b
      name: freeze_id_chr10.bed
      md5sum: 27c2dd1b9770772584ef2480c8a68e04
      filesize: 268.1MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:2055447a-93cb-43db-ae2d-24e02e87daa3
    name: freeze_id_chr8
    data_distributions:
    - id: alspacdcs:45792020-f1bf-430d-bec7-0e6790e246f1
      name: freeze_id_chr8.log
      md5sum: cb9f7054a4b8d577de31cf6b91ff11bf
      filesize: 988.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:51c93059-2dfd-4264-8721-b6df496f438c
    name: freeze_id_chr19
    data_distributions:
    - id: alspacdcs:f9bae995-2699-4e0f-9d17-3c192c133759
      name: freeze_id_chr19.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:38c6cb09-02f5-4fe0-abd6-4da5b4df0bce
    name: freeze_id_chr19
    data_distributions:
    - id: alspacdcs:985aaf6f-c3bb-4fbd-a07b-e91f9a3dfbf0
      name: freeze_id_chr19.log
      md5sum: 7a40cbff0a1c6db3fdba28f03a3d43a5
      filesize: 992.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:5fc77dfb-4515-4a4e-aa68-25ef98407f4d
    name: freeze_id_chr15
    data_distributions:
    - id: alspacdcs:42ba278c-bba0-4287-9ec3-cdb5e38e6464
      name: freeze_id_chr15.bed
      md5sum: 8562fa62c12180b23c101cb361b773cf
      filesize: 140.0MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:b47e36f2-9137-44d5-a2d5-224d373a9186
    name: freeze_id_chr19
    data_distributions:
    - id: alspacdcs:98a3e062-3233-4c09-9520-9ae3f3df7c85
      name: freeze_id_chr19.bed
      md5sum: 29af6d46019438d57c718cb36c71a2b2
      filesize: 71.8MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:4678a09c-3488-4d3c-9f87-787cd2cdbc3d
    name: freeze_id_chr10
    data_distributions:
    - id: alspacdcs:a88879b6-b183-469b-a554-21029bed8da5
      name: freeze_id_chr10.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:7d789eff-f9cd-443a-886f-7a6b0ff2dd4f
    name: freeze_id_chr13
    data_distributions:
    - id: alspacdcs:62277cf1-c229-45af-9675-d66c66ff61a3
      name: freeze_id_chr13.log
      md5sum: ed83e0b365b8955f7e5045f9553b9b28
      filesize: 994.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:94fcb145-3bba-4c7a-b115-1c3d245c5f14
    name: freeze_id_chr22
    data_distributions:
    - id: alspacdcs:3a4bfa2b-71d7-49e2-aa4f-a24863d61e1b
      name: freeze_id_chr22.bim
      md5sum: 86a1da3366ba87e62f561dc09f64f9ac
      filesize: 920.9KB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:aa5a1f7c-8a60-4bd8-b12b-d01efc0134dd
    name: freeze_id_chr8
    data_distributions:
    - id: alspacdcs:b5cb6681-aa68-4102-ae82-b568ed04e1e7
      name: freeze_id_chr8.bim
      md5sum: 72ecdd13e1c56de521d86b684218172d
      filesize: 3.9MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:3f7c5f16-2b9a-467e-ba02-5ef0092c712b
    name: freeze_id_chr17
    data_distributions:
    - id: alspacdcs:e7ab3b7a-0dcd-4778-a2da-7d52707b0dc9
      name: freeze_id_chr17.bed
      md5sum: 8860b99a8dceae07e3254f4f882f64de
      filesize: 113.2MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:430cf27a-376d-4082-9d86-b60beda1e1e6
    name: freeze_id_chr22
    data_distributions:
    - id: alspacdcs:df8edbb4-dfb4-4eb8-b6d9-5aded6c7e6c3
      name: freeze_id_chr22.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:1c4e7e0d-cae4-49c4-9b35-0bd22dc78d1b
    name: freeze_id_chr13
    data_distributions:
    - id: alspacdcs:7a42734f-674d-418f-a225-3a90c0083679
      name: freeze_id_chr13.bim
      md5sum: 59c47611e6fc92f7561366c3dcd6e3fd
      filesize: 2.8MB
      filetype: .bim
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:5af6660f-84dc-4a4c-adf0-54763b36b0d2
    name: freeze_id_chr21
    data_distributions:
    - id: alspacdcs:4091dfef-1578-46c4-b5f5-070c3a9e8318
      name: freeze_id_chr21.log
      md5sum: 4da56042ef5ef9f3be98805a49920350
      filesize: 992.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:78192dcf-2817-4bf1-83e2-d23d4ee74e95
    name: freeze_id_chr11
    data_distributions:
    - id: alspacdcs:9b763e0b-c2f7-44a7-9edf-072506c43a7a
      name: freeze_id_chr11.log
      md5sum: d236624b3f13fe0404baf76b862b6b17
      filesize: 994.0B
      filetype: .log
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:d955659d-27ae-47a0-9268-d3770d313d39
    name: freeze_id_chr8
    data_distributions:
    - id: alspacdcs:13c01f7b-9d87-4b52-b5e7-4ec49464b41e
      name: freeze_id_chr8.bed
      md5sum: 8794a1ea76d763ff1742b65bc3f94289
      filesize: 285.7MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:009c3430-ae51-43d7-93cf-ffe5dc0f4f25
    name: freeze_id_chr5
    data_distributions:
    - id: alspacdcs:9ced704b-db10-47fa-9462-2c48a79099c3
      name: freeze_id_chr5.bed
      md5sum: c4ec69fa754d7687661d56055a891f15
      filesize: 325.7MB
      filetype: .bed
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6
  - id: alspacdcs:003b9ab6-afce-4f9d-8a65-35f502f5581a
    name: freeze_id_chr5
    data_distributions:
    - id: alspacdcs:b38a50e6-344a-4b83-95a8-83be59e2fb99
      name: freeze_id_chr5.fam
      md5sum: 67185256052f381f1dc8c8eb3c1b18d2
      filesize: 277.6KB
      filetype: .fam
      belongs_to_container: alspacdcs:dabc73ca-1d45-40bb-b0ed-11fab248ddf6

4.4. Genome-wide - 1000G imputed - G0 partners (gi_1000g_g0p)

4.4.1. Description

This dataset contains genome-wide array data imputed to the 1000 genomes reference panel for G0 partners, with some additional G0 mothers and G1 individuals. This data has been cleaned, flipped to the positive strand and in b37 coordinates and imputed to the 1000 genomes phase I version 3.

4.4.2. Methodology

3,453 ALSPAC mother and fathers and 535,478 SNPs were genotyped using the Illumina HumanCoreExome chip genotyping platforms by the ALSPAC lab and called using GenomeStudio. The resulting raw genome-wide data were subjected to standard quality control methods using PLINK (v1.07). Individuals were excluded on the basis of gender mismatches (n = 80); minimal or excessive heterozygosity (n = 64); disproportionate levels of individual missingness (>5%, n = 60) and possible contamination (n = 3). Population stratification was assessed by multidimensional scaling analysis and compared with 1000 Genomes phase 3 data and principal component analysis (n = 266); all individuals with non-European ancestry were removed. Cryptic relatedness was measured as SNP relatedness in GCTA (relatedness > 0.1, n = 69 removed). SNPs with a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 1E-7) and those which failed GenomeStudio quality control measures were removed (n = 21,298). 6,594 duplicate SNPs were also removed. This resulted in 2,911 unrelated mothers and father genotypes at 507,586 SNPs. We then identified 2217 samples where aln assigned historically by the lab matched genetically assigned aln.

We phased data of 3074 samples that passed qc but contained related subjects in shapeit v2.r837. We then removed 155,336 monomorphic SNPs, 1033 markers not in 1000 genomes, 11,842 A/T or G/C SNPs and 10 duplicate sites to give 337,732 SNPs on chromosomes 1-23. Of the 329,363 markers on chromosomes 1-22, 298,742 overlapped the reference genome. We imputed to the 1000 genomes phase 1 version 3 using the Michigan Imputation Server. We then identified 2217 samples where aln assigned historically by the lab matched genetically assigned aln. We then removed 12 subjects who have withdrawn consent and 6 subjects genotyped in an earlier work package to give 2201 subjects.

4.4.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:gi_1000g_g0p_2016-11-22_f3
name: Genome-wide - 1000G imputed - G0 partners version 2016-11-22 freeze 3
description: >-
  This dataset is the third freeze of 2016-11-22 versiono of the Genome-wide array data imputed to the 1000 genomes reference panel
  for G0 partners, with some additional G0 mothers and G1 individuals.

freeze_size: 44G
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_gi_1000g_g0p/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
previous_freeze: alspacdcs:gi_1000g_g0p_2016-11-22_f2
next_freeze:
freeze_of_alspac_dataset_version: alspacdcs:gi_1000g_g0p_2016-11-22
freeze_of_named_alspac_dataset: alspacdcs:gi_1000g_g0p



has_containers:
  - id: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc
    name: data
    description: A dir/folder containing the data bgen and sample files

has_parts:
  - id: alspacdcs:gi_1000g_g0p_2016-11-22_sample_f3 
    name: Samples
    description: >-
      The samples in the data. To be used with the genetic data.
    data_distributions:
      - id: alspacdcs:ec597142ca1f0ace0a33f476c7bf68eb_swapped.sample
	name: swapped.sample
	description: >-
	  A plain text .sample file.
	  See https://doi.org/10.1101/308296 for file format details.
	md5sum: ec597142ca1f0ace0a33f476c7bf68eb
	filesize: 165k
	filetype: .sample
	number_of_participants: 2198
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc



  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr1_f3
    name: Chr1
    description: Data for Chr1
    data_distributions:
      - id: alspacdcs:a5eb049e4df5a8b005ae51b47947d830_filtered_data_chr01.bgen
	name: filtered_data_chr01.bgen
	description: >- 
	  An Oxford Bgen file for Chr1. To be used with
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	     (bgen v1.2)       
	md5sum: a5eb049e4df5a8b005ae51b47947d830
	filesize: 3.4G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 2159337
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr2_f3
    name: Chr2
    description: Data for Chr2
    data_distributions:
      - id: alspacdcs:e297c8d30455053d23ac360bcc886bb0_filtered_data_chr02.bgen
	name: filtered_data_chr02.bgen
	description: >- 
	  An Oxford Bgen file for Chr2. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	   (bgen v1.2)         
	md5sum: e297c8d30455053d23ac360bcc886bb0
	filesize: 3.6G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 2349883
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr3_f3
    name: Chr3
    description: Data for Chr3
    data_distributions:
      - id: alspacdcs:c0b55e9d65c219ffb1b8c58a0ebb7c18_filtered_data_chr03.bgen
	name: filtered_data_chr03.bgen
	description: >- 
	  An Oxford Bgen file for Chr1. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	   (bgen v1.2)         
	md5sum: c0b55e9d65c219ffb1b8c58a0ebb7c18
	filesize: 3.0G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 1969275
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr4_f3
    name: Chr4
    description: Data for Chr4
    data_distributions:
      - id: alspacdcs:514f09f02c74fc3eca83379e9e99c5dc_filtered_data_chr04.bgen
	name: filtered_data_chr04.bgen
	description: >- 
	  An Oxford Bgen file for Chr4. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 514f09f02c74fc3eca83379e9e99c5dc
	filesize: 3.1G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 1969883

  - id:  alspacdcs:gi_1000g_g0p_2016-11-22_chr5_f3
    name: Chr5
    description: Data for Chr5
    data_distributions:
      - id: alspacdcs:f4accbf5bdd6a2ccc9598e9e2221915d_filtered_data_chr05.bgen
	name: filtered_data_chr05.bgen
	description: >- 
	  An Oxford Bgen file for Chr5. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: f4accbf5bdd6a2ccc9598e9e2221915d
	filesize: 2.8G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 1809961
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

  - id:  alspacdcs:gi_1000g_g0p_2016-11-22_chr6_f3
    name: Chr6
    description: Data for Chr6
    data_distributions:
      - id: alspacdcs:a9327ad1591fdf7d349b066544e71c3a_filtered_data_chr06.bgen
	name: filtered_data_chr06.bgen
	description: >- 
	  An Oxford Bgen file for Chr6. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	    (bgen v1.2)        
	md5sum: a9327ad1591fdf7d349b066544e71c3a
	filesize: 2.6G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 1758025
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr7_f3
    name: Chr7
    description: Data for Chr7
    data_distributions:
      - id: alspacdcs:f832922558eddcf3feed87091c2ec0ae_filtered_data_chr07.bgen
	name: filtered_data_chr07.bgen
	description: >- 
	  An Oxford Bgen file for Chr7. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	    (bgen v1.2)        
	md5sum: f832922558eddcf3feed87091c2ec0ae
	filesize: 2.7G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 1601293
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr8_f3
    name: Chr8
    description: Data for Chr8
    data_distributions:
      - id: alspacdcs:47d79712e676a0048f90858cbb888179_filtered_data_chr08.bgen
	name: filtered_data_chr08.bgen
	description: >- 
	  An Oxford Bgen file for Chr8. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 47d79712e676a0048f90858cbb888179
	filesize: 2.4G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 1558902
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr9_f3
    name: Chr9
    description: Data for Chr9
    data_distributions:
      - id: alspacdcs:82a480f3e8792db2c1cec3adc50e1357_filtered_data_chr09.bgen
	name: filtered_data_chr09.bgen
	description: >- 
	  An Oxford Bgen file for Chr9. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	   (bgen v1.2)         
	md5sum: 82a480f3e8792db2c1cec3adc50e1357
	filesize: 1.9G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 1189463
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr10_f3
    name: Chr10
    description: Data for Chr10
    data_distributions:
      - id: alspacdcs:8f64fe184e4c876a345a728ed5eeddcf_filtered_data_chr10.bgen
	name: filtered_data_chr10.bgen
	description: >- 
	  An Oxford Bgen file for Chr10. To be used with
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 8f64fe184e4c876a345a728ed5eeddcf
	filesize: 2.2G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 1363104
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr11_f3 
    name: Chr11
    description: Data for Chr11
    data_distributions:
      - id: alspacdcs:b1b7e3bef0fe72cd90bd0ba456f687aa_filtered_data_chr11.bgen
	name: filtered_data_chr11.bgen
	description: >- 
	  An Oxford Bgen file for Chr11. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: b1b7e3bef0fe72cd90bd0ba456f687aa
	filesize: 2.2G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 1359640
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr12_f3
    name: Chr12
    description: Data for Chr12
    data_distributions:
      - id: alspacdcs:509202db22200fe0bd58210ab8e9c757_filtered_data_chr12.bgen
	name: filtered_data_chr12.bgen
	description: >- 
	  An Oxford Bgen file for Chr12. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 509202db22200fe0bd58210ab8e9c757
	filesize: 2.1G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 1316510
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr13_f3
    name: Chr13
    description: Data for Chr13
    data_distributions:
      - id: alspacdcs:176a10d38ab80783a8e392e5791edea7_filtered_data_chr13.bgen
	name: filtered_data_chr13.bgen
	description: >- 
	  An Oxford Bgen file for Chr13. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 176a10d38ab80783a8e392e5791edea7
	filesize: 1.6G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 988473

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr14_f3
    name: Chr14
    description: Data for Chr14
    data_distributions:
      - id: alspacdcs:1ecd96aab2925bafd7d20497d85dd937_filtered_data_chr14.bgen
	name: filtered_data_chr14.bgen
	description: >- 
	  An Oxford Bgen file for Chr14. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	   (bgen v1.2)         
	md5sum: 1ecd96aab2925bafd7d20497d85dd937
	filesize: 1.5G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 903811
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr15_f3 
    name: Chr15
    description: Data for Chr15
    data_distributions:
      - id: alspacdcs:f8c5b54206189808e9a361cc0da63798_filtered_data_chr15.bgen
	name: filtered_data_chr15.bgen
	description: >- 
	  An Oxford Bgen file for Chr15. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: f8c5b54206189808e9a361cc0da63798
	filesize: 1.4G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 814028
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr16_f3 
    name: Chr16
    description: Data for Chr16
    data_distributions:
      - id: alspacdcs:52f065575d3cb2dff34df6763a583766_filtered_data_chr16.bgen
	name: filtered_data_chr16.bgen
	description: >- 
	  An Oxford Bgen file for Chr16. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	   (bgen v1.2)         
	md5sum: 52f065575d3cb2dff34df6763a583766
	filesize: 1.6G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 867901
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr17_f3  
    name: Chr17
    description: Data for Chr17
    data_distributions:
      - id: alspacdcs:73d85caf67dcedc63b11a43bd5ccb44d_filtered_data_chr17.bgen
	name: filtered_data_chr17.bgen
	description: >- 
	  An Oxford Bgen file for Chr17. To be used with
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 73d85caf67dcedc63b11a43bd5ccb44d
	filesize: 1.4G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 755467
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr18_f3  
    name: Chr18
    description: Data for Chr18
    data_distributions:
      - id: alspacdcs:b8e055a6c0955bb67161c9f7a1d8cad7_filtered_data_chr18.bgen
	name: filtered_data_chr18.bgen
	description: >- 
	  An Oxford Bgen file for Chr18. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: b8e055a6c0955bb67161c9f7a1d8cad7
	filesize: 1.4G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 783661
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr19_f3  
    name: Chr19
    description: Data for Chr19
    data_distributions:
      - id: alspacdcs:37ea045cd9f4027cba547b7b89c3a1a0_filtered_data_chr19.bgen
	name: filtered_data_chr19.bgen
	description: >- 
	  An Oxford Bgen file for Chr19. To be used with
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 37ea045cd9f4027cba547b7b89c3a1a0
	filesize: 1.3G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 606147
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr20_f3
    name: Chr20
    description: Data for Chr20
    data_distributions:
      - id: alspacdcs:d241eb21be3188c26c460e1f65f0d8c1_filtered_data_chr20.bgen
	name: filtered_data_chr20.bgen
	description: >- 
	  An Oxford Bgen file for Chr20. To be used with
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: d241eb21be3188c26c460e1f65f0d8c1
	filesize: 1.1G
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 618749
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr21_f3
    name: Chr21
    description: Data for Chr21
    data_distributions:
      - id: alspacdcs:7881bdc24e7f0adbfb800b49d1efd590_filtered_data_chr21.bgen
	name: filtered_data_chr21.bgen
	description: >- 
	  An Oxford Bgen file for Chr21. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 7881bdc24e7f0adbfb800b49d1efd590
	filesize: 672M
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 378064
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

  - id: alspacdcs:gi_1000g_g0p_2016-11-22_chr22_f3
    name: Chr22
    description: Data for Chr22
    data_distributions:
      - id: alspacdcs:824412e963441699f260c6245f65659d_filtered_data_chr22.bgen
	name: filtered_data_chr22.bgen
	description: >- 
	  An Oxford Bgen file for Chr22. To be used with

	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)
	md5sum: 824412e963441699f260c6245f65659d
	filesize: 722M
	filetype: .bgen
	number_of_participants: 2198
	number_of_variants: 366590
	belongs_to_container: alspacdcs:70b53764-4ed1-4e46-9188-a38d356279dc        

4.5. Genome-wide - 1000G imputed - G0 mothers + G1 (gi_1000g_g0m_g1)

4.5.1. Description

This dataset contains genome-wide 1000G imputed data for G0 mothers + G1. This data has been cleaned, flipped to the positive strand and in b37 coordinates and imputed to the 1000 genomes phase I version 3.

4.5.2. Methodology

ALSPAC children were genotyped using the Illumina HumanHap550 quad chip genotyping platforms by 23andme subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, US. The resulting raw genome-wide data were subjected to standard quality control methods. Individuals were excluded on the basis of gender mismatches; minimal or excessive heterozygosity; disproportionate levels of individual missingness (>3%) and insufficient sample replication (IBD < 0.8). Population stratification was assessed by multidimensional scaling analysis and compared with Hapmap II (release 22) European descent (CEU), Han Chinese, Japanese and Yoruba reference populations; all individuals with non-European ancestry were removed. SNPs with a minor allele frequency of < 1%, a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 5E-7) were removed. Cryptic relatedness was measured as proportion of identity by descent (IBD > 0.1). Related subjects that passed all other quality control thresholds were retained during subsequent phasing and imputation. 9,115 subjects and 500,527 SNPs passed these quality control filters.

ALSPAC mothers were genotyped using the Illumina human660W-quad array at Centre National de Génotypage (CNG) and genotypes were called with Illumina GenomeStudio. PLINK (v1.07) was used to carry out quality control measures on an initial set of 10,015 subjects and 557,124 directly genotyped SNPs. SNPs were removed if they displayed more than 5% missingness or a Hardy-Weinberg equilibrium P value of less than 1.0e-06. Additionally SNPs with a minor allele frequency of less than 1% were removed. Samples were excluded if they displayed more than 5% missingness, had indeterminate X chromosome heterozygosity or extreme autosomal heterozygosity. Samples showing evidence of population stratification were identified by multidimensional scaling of genome-wide identity by state pairwise distances using the four HapMap populations as a reference, and then excluded. Cryptic relatedness was assessed using a IBD estimate of more than 0.125 which is expected to correspond to roughly 12.5% alleles shared IBD or a relatedness at the first cousin level. Related subjects that passed all other quality control thresholds were retained during subsequent phasing and imputation. 9,048 subjects and 526,688 SNPs passed these quality control filters.

We combined 477,482 SNP genotypes in common between the sample of mothers and sample of children. We removed SNPs with genotype missingness above 1% due to poor quality (11,396 SNPs removed) and removed a further 321 subjects due to potential ID mismatches. This resulted in a dataset of 17,842 subjects containing 6,305 duos and465,740 SNPs (112 were removed during liftover and 234 were out of HWE after combination). We estimated haplotypes using ShapeIT(v2.r644) which utilises relatedness during phasing. We obtained a phased version of the 1000 genomes reference panel (Phase 1, Version3) from the Impute2 reference data repository (phased using ShapeItv2.r644, haplotype release date Dec 2013). Imputation of the target data was performed using Impute V2.2.2 against the reference panel(all polymorphic SNPs excluding singletons), using all 2186 reference haplotypes (including non-Europeans).

This gave 8,237 eligible children and 8,196 eligible mothers withavailable genotype data after exclusion of related subjects using cryptic relatedness measures described previously.

4.5.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_f3
name: >-
  Genome-wide - 1000G imputed - G0 mothers + G1 version 2015-10-30
  freeze 3
description: >-
   This is the third freeze of the the 2015-10-30 version of
   gi_1000g_g0m_g1 datatset. It contains data in the oxford format
   which is a combination of bgen and sample (version 1.2) files. It is a subset of
   the data in gi_1000g_g0m_g1_2015-10-30 limited to one format and
   with participants who have withdrawn their consent removed.

   The Dec 2013 haplotype release of 1000 genomes phase 1 version 3 have 199 reported SNPs
   with incorrect strand. The strand issues are present in this imputation version. For more 
   information and the origins of this list please visit:
   https://mathgen.stats.ox.ac.uk/impute/data_download_1000G_phase1_integrated_SHAPEIT2_16-06-14.html

   It is very unlikely that they have systematic effects across the genome and most probably are just isolated to these 199 known problematic SNPs.

   The user is advised to discard them from their analysis. This will be addressed in the next imputation release
freeze_size: 122G
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_gi_1000g_g0m_g1/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
previous_freeze: alspacdcs:gi_1000g_g0m_g1_2015-10-30_f2
freeze_of_alspac_dataset_version: alspacdcs:gi_1000g_g0m_g1_2015-10-30
freeze_of_named_alspac_dataset: alspacdcs:gi_1000g_g0m_g1
has_parts:
  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_sample_f3
    name: Samples
    description: >-
      The samples in the data. To be used with the genetic data.
    data_distributions:
      - id: alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	name: swapped.sample
	description: >-
	  A plain text .sample file.
	  See https://doi.org/10.1101/308296 for file format details.
	md5sum: 86398f756a748b40e51d0b02ad86ce5b
	filesize: 1.2M
	filetype: .sample
	number_of_participants: 17450



  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr1_f3
    name: Chr1
    description: Data for Chr1
    data_distributions:
      - id: alspacdcs:d4386fe4fcbfd1464fec97335693bb47_filtered_01.bgen
	name: filtered_01.bgen
	description: >- 
	  An Oxford Bgen file for Chr1. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	     (bgen v1.2)       
	md5sum: d4386fe4fcbfd1464fec97335693bb47
	filesize: 9.1G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 2155158

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr2_f3
    name: Chr2
    description: Data for Chr2
    data_distributions:
      - id: alspacdcs:a021b75c0bc519ed48c3342d428d988d_filtered_02.bgen
	name: filtered_02.bgen
	description: >- 
	  An Oxford Bgen file for Chr2. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	   (bgen v1.2)         
	md5sum: a021b75c0bc519ed48c3342d428d988d
	filesize: 9.1G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 2346862

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr3_f3
    name: Chr3
    description: Data for Chr3
    data_distributions:
      - id: alspacdcs:bc61d427013f6a143209714af43fd3a7_filtered_03.bgen
	name: filtered_03.bgen
	description: >- 
	  An Oxford Bgen file for Chr1. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	   (bgen v1.2)         
	md5sum: bc61d427013f6a143209714af43fd3a7
	filesize: 7.7G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1966662


  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr4_f3
    name: Chr4
    description: Data for Chr4
    data_distributions:
      - id: alspacdcs:9616c502415e3aefd3cec770201a1db9_filtered_04.bgen
	name: filtered_04.bgen
	description: >- 
	  An Oxford Bgen file for Chr4. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 9616c502415e3aefd3cec770201a1db9
	filesize: 8.4G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1968171

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr5_f3
    name: Chr5
    description: Data for Chr5
    data_distributions:
      - id: alspacdcs:f7146ed5bfdcc4d6399bbef64809d7a6_filtered_05.bgen
	name: filtered_05.bgen
	description: >- 
	  An Oxford Bgen file for Chr5. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: f7146ed5bfdcc4d6399bbef64809d7a6
	filesize: 6.9G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1808090


  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr6_f3
    name: Chr6
    description: Data for Chr6
    data_distributions:
      - id: alspacdcs:3834f0465729fed20bcf89d7f27a7ef6_filtered_06.bgen
	name: filtered_06.bgen
	description: >- 
	  An Oxford Bgen file for Chr6. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	    (bgen v1.2)        
	md5sum: 3834f0465729fed20bcf89d7f27a7ef6
	filesize: 6.8G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1755859

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr7_f3
    name: Chr7
    description: Data for Chr7
    data_distributions:
      - id: alspacdcs:5de9ed5dc646de7a7a5c9ca503d1212e_filtered_08.bgen
	name: filtered_07.bgen
	description: >- 
	  An Oxford Bgen file for Chr7. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	    (bgen v1.2)        
	md5sum: 5de9ed5dc646de7a7a5c9ca503d1212e
	filesize: 7.1G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1599387

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr8_f3
    name: Chr8
    description: Data for Chr8
    data_distributions:
      - id: alspacdcs:e78c84b883bc8fe52f0c33598cc815a3_filtered_08.bgen
	name: filtered_08.bgen
	description: >- 
	  An Oxford Bgen file for Chr8. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: e78c84b883bc8fe52f0c33598cc815a3
	filesize: 5.9G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1557429

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr9_f3
    name: Chr9
    description: Data for Chr9
    data_distributions:
      - id: alspacdcs:9948344bfdebdcd38a2b09224f1af23d_filtered_09.bgen
	name: filtered_09.bgen
	description: >- 
	  An Oxford Bgen file for Chr9. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	   (bgen v1.2)         
	md5sum: 9948344bfdebdcd38a2b09224f1af23d
	filesize: 5.1G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1187731

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr10_f3
    name: Chr10
    description: Data for Chr10
    data_distributions:
      - id: alspacdcs:1775551d5bac7b13d0e884b2015ba421_filtered_10.bgen
	name: filtered_10.bgen
	description: >- 
	  An Oxford Bgen file for Chr10. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 1775551d5bac7b13d0e884b2015ba421
	filesize: 5.4G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1361506

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr11_f3
    name: Chr11
    description: Data for Chr11
    data_distributions:
      - id: alspacdcs:99685738aff1b79b3028428983bed3f2_filtered_11.bgen
	name: filtered_11.bgen
	description: >- 
	  An Oxford Bgen file for Chr11. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 99685738aff1b79b3028428983bed3f2
	filesize: 5.4G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1356882

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr12_f3
    name: Chr12
    description: Data for Chr12
    data_distributions:
      - id: alspacdcs:c08cd053752044364b342e9873dedaea_filtered_12.bgen
	name: filtered_12.bgen
	description: >- 
	  An Oxford Bgen file for Chr12. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: c08cd053752044364b342e9873dedaea
	filesize: 5.4G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1314328

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr13_f3
    name: Chr13
    description: Data for Chr13
    data_distributions:
      - id: alspacdcs:d6aec668a231fd5509b20f6f99cc5d26_filtered_13.bgen
	name: filtered_13.bgen
	description: >- 
	  An Oxford Bgen file for Chr13. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: d6aec668a231fd5509b20f6f99cc5d26
	filesize: 4.0G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 987740

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr14_f3
    name: Chr14
    description: Data for Chr14
    data_distributions:
      - id: alspacdcs:33ee444ac5cccbc4d5a938f20cfc9506_filtered_14.bgen
	name: filtered_14.bgen
	description: >- 
	  An Oxford Bgen file for Chr14. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	   (bgen v1.2)         
	md5sum: 33ee444ac5cccbc4d5a938f20cfc9506
	filesize: 3.9G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 904351

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr15_f3
    name: Chr15
    description: Data for Chr15
    data_distributions:
      - id: alspacdcs:35a01cfb74f7006fc267a915c5f96531_filtered_15.bgen
	name: filtered_15.bgen
	description: >- 
	  An Oxford Bgen file for Chr15. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 35a01cfb74f7006fc267a915c5f96531
	filesize: 3.7G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 812545

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr16_f3
    name: Chr16
    description: Data for Chr16
    data_distributions:
      - id: alspacdcs:a2be7316bcf32fd554f293650d99b265_filtered_16.bgen
	name: filtered_16.bgen
	description: >- 
	  An Oxford Bgen file for Chr16. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	   (bgen v1.2)         
	md5sum: a2be7316bcf32fd554f293650d99b265
	filesize: 4.3G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 865998

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr17_f3
    name: Chr17
    description: Data for Chr17
    data_distributions:
      - id: alspacdcs:97f06fcb1f5857e9510d2ba30eee6c4c_filtered_17.bgen
	name: filtered_17.bgen
	description: >- 
	  An Oxford Bgen file for Chr17. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 97f06fcb1f5857e9510d2ba30eee6c4c
	filesize: 3.8G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 753174

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr18_f3
    name: Chr18
    description: Data for Chr18
    data_distributions:
      - id: alspacdcs:88606600d2352a1127acf21a440273e2_filtered_18.bgen
	name: filtered_18.bgen
	description: >- 
	  An Oxford Bgen file for Chr18. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 88606600d2352a1127acf21a440273e2
	filesize: 3.5G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 783010

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr19_f3
    name: Chr19
    description: Data for Chr19
    data_distributions:
      - id: alspacdcs:b2d78224a6ab150996caca3e4d3ef1df_filtered_19.bgen
	name: filtered_19.bgen
	description: >- 
	  An Oxford Bgen file for Chr19. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: b2d78224a6ab150996caca3e4d3ef1df
	filesize: 4.0G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 603516

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr20_f3
    name: Chr20
    description: Data for Chr20
    data_distributions:
      - id: alspacdcs:657274f33d9d44a243c59feae7ec561e_filtered_20.bgen
	name: filtered_20.bgen
	description: >- 
	  An Oxford Bgen file for Chr20. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 657274f33d9d44a243c59feae7ec561e
	filesize: 2.8G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 617694

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr21_f3
    name: Chr21
    description: Data for Chr21
    data_distributions:
      - id: alspacdcs:1d85b37ade01bf9921be5a10950e28c2_filtered_21.bgen
	name: filtered_21.bgen
	description: >- 
	  An Oxford Bgen file for Chr21. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)          
	md5sum: 1d85b37ade01bf9921be5a10950e28c2
	filesize: 1.9G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 377554

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr22_f3
    name: Chr22
    description: Data for Chr22
    data_distributions:
      - id: alspacdcs:a25f95d0477de8dc16234a93a9a4108c_filtered_22.bgen
	name: filtered_22.bgen
	description: >- 
	  An Oxford Bgen file for Chr22. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)
	md5sum: a25f95d0477de8dc16234a93a9a4108c
	filesize: 2.1G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 365644

  - id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_chr23_f3
    name: Chr23
    description: Data for Chr23
    data_distributions:
      - id: alspacdcs:9fdb2874bc5f30f22c71be64037ebc70_filtered_23.bgen
	name: filtered_23.bgen
	description: >- 
	  An Oxford Bgen file for Chr23. To be used with
	  alspacdcs:86398f756a748b40e51d0b02ad86ce5b_swapped.sample
	  file.
	  See https://doi.org/10.1101/308296 for file format details.
	  (bgen v1.2)
	md5sum: 9fdb2874bc5f30f22c71be64037ebc70
	filesize: 5.9G
	filetype: .bgen
	number_of_participants: 17450
	number_of_variants: 1250218

5. Sequence Data

5.1. Whole genome sequencing - G1 (wgs_hiseq_g1)

5.1.1. Description

This dataset contains whole genome sequencing for G1 individuals, part of the UK10K dataset.

5.1.2. Methodology

ALSPAC and TwinsUK cohorts were sequenced at an average read depth of 6.7x through the UK10K program (http://www.uk10k.org) using the Illumina HiSeq platform, and aligned to the GRCh37 human reference using BWA. SNV calls were completed using samtools/bcftools and VQSR and GATK were used to recall these calls.

Associated publication: http://www.ncbi.nlm.nih.gov/pubmed/26367797

Please ensure you have permission to access this data (http://www.uk10k.org/data_access.html) before using it.

5.1.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:wgs_hiseq_g1_2016-08-18_f3
name: Whole genome sequencing - G1 version 2016-08-18 freeze 3
description: >-
   This is the freeze 3 of version 2016-08-18 of the Whole genome sequencing for G1 individuals, part of the UK10K dataset.
freeze_size: 350G
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_wgs_hiseq_g1/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
previous_freeze: alspacdcs:wgs_hiseq_g1_2016-08-18_f2
freeze_of_alspac_dataset_version: alspacdcs:wgs_hiseq_g1_2016-08-18
freeze_of_named_alspac_dataset: alspacdcs:wgs_hiseq_g1

has_containers:
  - id: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571 ## uuid
    name: data
    description: A dir/folder containing the freeze data files
has_parts:
  - id: alspacdcs:5633a76c-fdc5-4cb2-9ff4-e42df8619662
    name: 10_freeze
    data_distributions:
    - id: alspacdcs:146a37b8-6eec-41e3-b369-0044a20e429b
      name: 10_freeze.vcf.gz.csi
      md5sum: 91511bf844e95e2c3589c5e4b8d29dc0
      filesize: 97K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:48be8dec-e137-474d-808e-535563799202
      name: 10_freeze.vcf.gz
      md5sum: 0fbc3391092f6528bed700bb9678b160
      filesize: 17G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:8b27a150-8809-4471-9374-1977c84496f6
    name: 11_freeze
    data_distributions:
    - id: alspacdcs:4124a844-e108-42ed-8dfb-b0cebd4f1911
      name: 11_freeze.vcf.gz.csi
      md5sum: 0ab4531bb18ae0e0ff4d2b60a91e03bc
      filesize: 97K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:3b35d468-471e-4f13-beef-16e7877f0a8b
      name: 11_freeze.vcf.gz
      md5sum: 9b0fca0596d382b861670db9d1f39a5d
      filesize: 17G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:3ae0b8ea-47f9-4a99-a8df-67a2b62d8688
    name: 12_freeze
    data_distributions:
    - id: alspacdcs:58b10348-5dba-43f5-87c9-0d7dcb7de96c
      name: 12_freeze.vcf.gz.csi
      md5sum: 6cb3ea848c9d2148666ab9d10bf18116
      filesize: 98K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:60ccbba8-6747-46c6-9f87-31889c028199
      name: 12_freeze.vcf.gz
      md5sum: 51c4767c52e3771f6a1cf76f92686389
      filesize: 17G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:ad543513-2135-4780-9af2-502401a46420
    name: 13_freeze
    data_distributions:
    - id: alspacdcs:0f0c59e8-a860-4684-a5fa-8fed592af5f0
      name: 13_freeze.vcf.gz.csi
      md5sum: 0d9c2f70487241a97ffe5076c40bedb6
      filesize: 71K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:b14dd88d-d6c3-4ebc-bd08-5219c9531292
      name: 13_freeze.vcf.gz
      md5sum: 89fbead51842c68140f9232264d0ee8d
      filesize: 13G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:1fcb13e9-5275-4c42-a402-6cdafdbecaaa
    name: 14_freeze
    data_distributions:
    - id: alspacdcs:d580c4bf-e41b-4e71-a24e-a5d2f94e8f82
      name: 14_freeze.vcf.gz.csi
      md5sum: a8073057cb520c8f581b359ad2e2838a
      filesize: 65K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:4c5daf36-da19-4ef9-b2bf-58130d1ef311
      name: 14_freeze.vcf.gz
      md5sum: a8b0753bc6e7abc4b237eb01311a601f
      filesize: 12G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:9ef54eaa-edea-46f1-b528-a8e3f29d15a0
    name: 15_freeze
    data_distributions:
    - id: alspacdcs:a08d92a9-f555-46ab-afb9-92bc8dcc4d6c
      name: 15_freeze.vcf.gz.csi
      md5sum: add4c019bdb6588fa7a215ee326984b5
      filesize: 59K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:de91ccf4-39b2-4084-8c6f-79912b8a926b
      name: 15_freeze.vcf.gz
      md5sum: a25568744021b27b89e5e402d86e7e74
      filesize: 10G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:6208f513-618a-4cbc-8cdb-d5ae602ebf8c
    name: 16_freeze
    data_distributions:
    - id: alspacdcs:e88c099e-7c0b-4473-93bd-51f3083f1b90
      name: 16_freeze.vcf.gz.csi
      md5sum: ee4ce9b3479a9c7ce482b59f0b2bd93d
      filesize: 58K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:a456feb6-7e1a-498a-8d30-39065f4af552
      name: 16_freeze.vcf.gz
      md5sum: 31b6a197d0d939743199bf622e2abdab
      filesize: 11G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:4972b5af-7c9f-4004-8c7a-ffa6c490791b
    name: 17_freeze
    data_distributions:
    - id: alspacdcs:a4d3b46b-7ceb-4d59-ad1e-f1856712470f
      name: 17_freeze.vcf.gz.csi
      md5sum: 7120caa24202ee64a5a4b99e91bbac4d
      filesize: 57K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:dca2737f-b31b-4768-ac29-5a0a64b6aa05
      name: 17_freeze.vcf.gz
      md5sum: f9550f900a3304209eeb3cfd74ffb973
      filesize: 9.4G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:30815635-3a0d-4bbf-bceb-7128fe47c46b
    name: 18_freeze
    data_distributions:
    - id: alspacdcs:bb32b4e7-82d9-4f95-8d72-2c61f9ef1961
      name: 18_freeze.vcf.gz.csi
      md5sum: b8682ac0f332494d59c44601221b42f3
      filesize: 56K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:d47c688b-6819-4469-b3bf-a7dc591abbd9
      name: 18_freeze.vcf.gz
      md5sum: 56120050bd79d79c4cbdf2af9f47cc52
      filesize: 9.7G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571  
  - id: alspacdcs:8a371ba2-bbba-484f-9cee-9bdce9d4253e
    name: 19_freeze
    data_distributions:
    - id: alspacdcs:93ccee60-f677-4d67-ba9a-30a47638126b
      name: 19_freeze.vcf.gz.csi
      md5sum: fd710d3410d43fd0c711eb2685f31d65
      filesize: 41K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:366d7fd4-00fd-45a2-b2e5-f7261081b896
      name: 19_freeze.vcf.gz
      md5sum: 286780a3cd433f41669ebcbbb2797592
      filesize: 7.2G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:27b7b3c6-0bfd-4dbc-b181-a3c5fe4847df
    name: 20_freeze
    data_distributions:
    - id: alspacdcs:91e493ea-8e4f-4e8e-a22e-8f88c6b6766e
      name: 20_freeze.vcf.gz.csi
      md5sum: 2e5117bb3c3e50fb1ad169112975d6c4
      filesize: 44K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:871922c6-eef7-4fdd-bcf6-d81c1c984a2e
      name: 20_freeze.vcf.gz
      md5sum: 60db6e4041f7d5ec491d5a86ffc92756
      filesize: 7.6G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:b2e07654-8076-4fc4-8027-4afa99e34f40
    name: 21_freeze
    data_distributions:
    - id: alspacdcs:518c911f-b753-4a69-a920-4f04c49a5587
      name: 21_freeze.vcf.gz.csi
      md5sum: 918199bf16c55ee8b099e90c78f7722a
      filesize: 25K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:b699bbdb-d686-4796-8d25-da87a06cd122
      name: 21_freeze.vcf.gz
      md5sum: cbdc96d94b2a4bdd2265ac364020bd9e
      filesize: 4.5G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:f2d80d1e-af15-42d0-a8a0-aa18ecf06b2c
    name: 22_freeze
    data_distributions:
    - id: alspacdcs:6a0ae694-8985-419e-a4fb-7a84cd244628
      name: 22_freeze.vcf.gz.csi
      md5sum: 88e896a168ee7773a3bb179985d6dfd9
      filesize: 4.6G
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:f0a555db-93f8-470b-9c76-c8f2c7d2bdc5
      name: 22_freeze.vcf.gz
      md5sum: 3dee0194f9f3279d7a9d63df36373c56
      filesize: 25K
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571  
  - id: alspacdcs:df7a4f81-5f57-4d95-b07a-fe2c81147827
    name: 1_freeze
    data_distributions:
    - id: alspacdcs:a9048d61-ccb4-4570-be15-66e1f7f5edd3
      name: .vcf.gz.csi
      md5sum: 8015cf7a7c445913ab20e26c82e391d6
      filesize: 165K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:0f331c08-9209-4523-bdc2-ba664fbdf290
      name: .vcf.gz
      md5sum: c57904a609267af336527b89c6b7d352
      filesize: 28G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:d469118b-f62c-41ef-a686-2de4db94b534
    name: 2_freeze
    data_distributions:
    - id: alspacdcs:6f05c026-1ede-4518-b911-89e3ae372eaf
      name: 2_freeze.vcf.gz.csi
      md5sum: 7614c0827ffd3d6b9330b06e17c63b70
      filesize: 177K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:a3a71754-a81a-4b31-8c81-8581ca14d256
      name: 2_freeze.vcf.gz
      md5sum: 2931880238de0df38880a1a23cc2572c
      filesize: 30G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:3364d4fb-3423-4e1d-8704-6006e15be96c
    name: 3_freeze
    data_distributions:
    - id: alspacdcs:dcae2bd9-658a-4d04-b5bb-fc0c74f9f463
      name: 3_freeze.vcf.gz.csi
      md5sum: ba133388d8dd77faa69f2516f54371b1
      filesize: 146K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:c1a8bda7-6453-49f8-852f-4f1445d87940
      name: 3_freeze.vcf.gz
      md5sum: e30f7e0ab96014e4e3c47005f61537d2
      filesize: 25G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:8a7521a6-30bf-400e-9608-f16c98e1f174
    name: 4_freeze
    data_distributions:
    - id: alspacdcs:aaa62301-361f-4779-83e8-8ab6a135a61b
      name: 4_freeze.vcf.gz.csi
      md5sum: 28e7addb16dd0405872b09489880b887
      filesize: 139K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:526084e4-2c2d-4723-ba10-b6d26364fd99
      name: 4_freeze.vcf.gz
      md5sum: ecb5afbf8da9d64d016ba768807a8744
      filesize: 24G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571  
  - id: alspacdcs:81b2e40b-f29e-4390-ac7a-dd90f38e69c5
    name: 5_freeze
    data_distributions:
    - id: alspacdcs:6ca7b22f-f416-439f-ade3-5ec00672c6f6
      name: 5_freeze.vcf.gz.csi
      md5sum: bd20d10d9cd6ba10d4a60ffb80e91466
      filesize: 132K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:f67ce6d2-ee1c-4f2e-9841-0f76374f4969
      name: 5_freeze.vcf.gz
      md5sum: df1fb13feb38cdca9e0f8f1e97675be9
      filesize: 23G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:8bceb3b2-0bed-4844-b3a8-49b9ab225a21
    name: 6_freeze
    data_distributions:
    - id: alspacdcs:c29c3da4-6401-4d0a-b2bc-9a8466071b79
      name: 6_freeze.vcf.gz.csi
      md5sum: de346032233062928343da7b881b451e
      filesize: 125K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:8f4a5481-c05f-4bb8-9c88-50da342a517e
      name: 6_freeze.vcf.gz
      md5sum: bba432e85f3fddfa10a6629e55b84ca9
      filesize: 22G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:2dd0e39f-ea09-4148-bc79-7b6d0f53e18f
    name: 7_freeze
    data_distributions:
    - id: alspacdcs:86afafd8-47db-4bd6-bf56-44349a0048cf
      name: 7_freeze.vcf.gz.csi
      md5sum: 116K
      filesize: 212200545dbbbdadcb9d46250b29909b
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:a282b5fe-4737-46e7-918e-2c1cb556d7e7
      name: 7_freeze.vcf.gz
      md5sum: 8290747625ec7d1a049771cac46cf508
      filesize: 20G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:d109d0e3-17a3-4a16-ae0b-820fce0ac7c8
    name: 8_freeze
    data_distributions:
    - id: alspacdcs:e4908d8d-5be0-4f8c-9b6c-eaf7699803a1
      name: 8_freeze.vcf.gz.csi
      md5sum: e89d4b8718cf80caf269ad710439e420
      filesize: 106K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:9df5ee5c-6804-4fce-a5cc-22acdaed6583
      name: 8_freeze.vcf.gz
      md5sum: dae9706dcaa2ef486df7273a4625cd07
      filesize: 20G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:3ccc6e8a-4199-4691-be97-6e1772badab0
    name: 9_freeze
    data_distributions:
    - id: alspacdcs:0829f3ad-fdfd-4dfc-a955-deb0cb015de1
      name: 9_freeze.vcf.gz.csi
      md5sum: 06983116ea48e1f1476d1284d67a708e
      filesize: 86K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:bfbd9db9-1df0-491d-85fe-0d491464f8fe
      name: 9_freeze.vcf.gz
      md5sum: 67f7dfeff3eeb33b9201c504492b3036
      filesize: 15G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
  - id: alspacdcs:46c814b7-fa5a-4dae-a039-b3f0a8cfc81d
    name: X_freeze
    data_distributions:
    - id: alspacdcs:49783fc3-d727-45ff-9f82-37b7fe28e9c0
      name: X_freeze.vcf.gz.csi
      md5sum: 206d0f2365bdc9b128c8dad17b38039a
      filesize: 110K
      filetype: .csi
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571
    - id: alspacdcs:be2bfe88-5a33-4046-9f36-db90b1fa108c
      name: X_freeze.vcf.gz
      md5sum: cfd01b886762f1a53e6b928a0718f005
      filesize: 11G
      filetype: vcf.bgz
      belongs_to_container: alspacdcs:90e90672-c949-4ac1-bd68-62a40b9f6571

6. Epigenetic Data

6.1. DNA methylation - 450k - G0 mothers + G1 (dnam_450_g0m_g1)

6.1.1. Description

This dataset contains Illumina Infinium HumanMethylation450K BeadChip array on G1 mothers at two timepoints (pregnancy and middle age), G1 participants at 5 timepoints and G0 participants at three timepoints (birth, childhood and adolescence).

This dataset was generated as part of the Accessible Resource for Integrated Epigenomics Studies (http://www.ariesepigenomics.org.uk/). This dataset is superseded by dnam_epic450_g0_g1.

6.1.2. Methodology

Associated publication: https://doi.org/10.1093/ije/dyv072

Associated R package: https://github.com/MRCIEU/aries

6.1.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:dnam_450_g0m_g1_2016-05-03_f3
name: >-
  DNA methylation - 450k - G0 mothers + G1 version 2016-05-03 Freeze 3
description: >-
  This is the third freeze of the 2016-05-03 version of
  dnam_450_g0m_g1 dataset.

freeze_size: 18G
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_dnam_450_g0m_g1/releases/tag/Freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
previous_freeze: alspacdcs:dnam_450_g0m_g1_2016-05-03_f2
freeze_of_alspac_dataset_version: alspacdcs:dnam_450_g0m_g1_2016-05-03
freeze_of_named_alspac_dataset: alspacdcs:dnam_450_g0m_g1


has_containers:
  - id: alspacdcs:ea27b439-5647-4656-b3dd-568437a9d972
    name: data
    description: A dir/folder containing the data files
  - id: alspacdcs:71babe5d-8096-4b81-badd-f092a285d9da
    name: betas
    description: A dir/folder containing the beta files
    belongs_to_container: alspacdcs:ea27b439-5647-4656-b3dd-568437a9d972
  - id: alspacdcs:fd31262e-5dcb-48a3-a5b0-5b295110094b
    name: control_matrix
    description: A dir/folder containing the control matrix files 
    belongs_to_container: alspacdcs:ea27b439-5647-4656-b3dd-568437a9d972
  - id: alspacdcs:0928ae7e-6d94-47ce-9890-a8350bcd46aa
    name: derived
    description: A dir/folder containing the derived data (e.g. Cell count predictions)
    belongs_to_container: alspacdcs:ea27b439-5647-4656-b3dd-568437a9d972
  - id: alspacdcs:e877f56c-6174-4427-a8ba-333b5632d85a
    name: cellcounts
    description: A dir/folder containing the cell count predictions
    belongs_to_container: alspacdcs:0928ae7e-6d94-47ce-9890-a8350bcd46aa
  - id: alspacdcs:8b59a158-94d0-4244-a779-f4695ceb3d9a
    name: cord
    description: >-
      A dir/folder containing the cell count predictions
      for cord.
    belongs_to_container: alspacdcs:e877f56c-6174-4427-a8ba-333b5632d85a
  - id: alspacdcs:3279aec3-c6c3-4d04-809e-94eadc51c0c8
    name: andrews-and-bakulski
    description: >-
      A dir/folder containing the cell count predictions by
      andrews-and-bakulski algorithm
    belongs_to_container: alspacdcs:8b59a158-94d0-4244-a779-f4695ceb3d9a

  - id: alspacdcs:8abae404-42a9-452a-9a26-7f6c8eed5c6b
    name: gervinandlyle
    description: >-
      A dir/folder containing the cell count predictions by
      gervinandlyle algorithm/method.
    belongs_to_container: alspacdcs:8b59a158-94d0-4244-a779-f4695ceb3d9a

  - id: alspacdcs:021ad3f5-6e32-42c0-91c6-f996a9b6e62b
    name: gse68456
    description: >-
      A dir/folder containing the cell count predictions by
      the gse68456 method.
    belongs_to_container: alspacdcs:8b59a158-94d0-4244-a779-f4695ceb3d9a
  - id: alspacdcs:ea167030-d783-46c5-b8d5-3cbd9431f396
    name: houseman
    description: >-
      A dir/folder containing the cell count predictions by
      houseman method. 
    belongs_to_container: alspacdcs:e877f56c-6174-4427-a8ba-333b5632d85a
  - id: alspacdcs:6e79ad66-78a5-4102-a071-7c259151d0af
    name: detection_p_values
    description: A dir/folder containing the matrix of detection values
    belongs_to_container: alspacdcs:ea27b439-5647-4656-b3dd-568437a9d972
  - id: alspacdcs:4e32e07e-181d-46d2-b134-71ee5f6bd53e
    name: qc.objects_all
    description: >-
      A dir/folder containing the samples extracted from
      lims and not cleaned. 
    belongs_to_container: alspacdcs:ea27b439-5647-4656-b3dd-568437a9d972
  - id: alspacdcs:5a4d2e29-aa60-493e-a33c-7bcb63be8088
    name: qc.objects_clean
    description: A dir/folder containing the cleaned samples from Lims 
    belongs_to_container: alspacdcs:ea27b439-5647-4656-b3dd-568437a9d972
  - id: alspacdcs:3275bc26-4695-43b8-915e-4bbc4d13018f
    name: samplesheet
    description: A dir/folder containing the manifest file from Lims.
    belongs_to_container: alspacdcs:ea27b439-5647-4656-b3dd-568437a9d972

has_parts:
  - id: alspacdcs:5e9f67ac-4ddd-4535-9991-ea99fa112d45
    name: betas
    description: >-
      Normalized betas using functional normalization.
      We used 10 PCs on the controlmatrix to regress out technical
      variation. Slide was regressed out as random effect before
      normaliziation.
      CpGs are in rows and samples in columns.
    data_distributions:
      - id: alspacdcs:8cace59a-0aef-4977-95eb-d6ef0bccc8b6
	name: data.Robj
	description: >-
	  R data object for the Normalized beta data.
	md5sum: f28327f68c4286c3e0ae721020f55f49
	filesize: 17G
	filetype: .Robj
	belongs_to_container: alspacdcs:71babe5d-8096-4b81-badd-f092a285d9da
	number_of_participants: 
  - id: alspacdcs:1100e73c-c40d-41d8-943b-978a155fbc5e
    name: control matrix
    description: >-
      The 850 control probes are summarized in 42 control types.
      These probes can roughly be divided into negative control probes
	(613), probes intended for between array normalization (186)
	and the remainder (49), which are designed for quality
	control, including assessing the
	bisulfite conversion rate. None of these probes are designed
	to measure a biological signal.
	The summarized control probes can be used as surrogates for
	unwanted variation and are used for the functional
	normalization.
	Samples are rows and 42 control types are in columns.
    data_distributions:
      - id: alspacdcs:eaef665f-213a-40e7-9c4b-65dfc1955623
	name: data.txt
	description: >-
	  Plain text file of the control matrix.


	md5sum: 443369530a5d75fdac8c1cac2fe45e15
	filesize: 1.8M
	filetype: .txt
	belongs_to_container: alspacdcs:fd31262e-5dcb-48a3-a5b0-5b295110094b
	number_of_participants:        

  - id: alspacdcs:e87e270b-b4e9-45c8-85a9-80489d1a99d3
    name: andrews and bakulksi cord cell counts
    description: >-
      Cellcounts in cord predicted using cord reference published in
      Bakulski et al 2016 (PMID: 27019159). This reference has been
      implemented in meffil. In this text file, samples are in rows and cell types in columns.
    data_distributions:
      - id: alspacdcs:c2b9acf3-09f5-4d38-8d66-26a37c8f804c
	name: data.txt
	description: >-
	  Plain text file of cellcounts in cord predicted using Bakulski.


	md5sum: 79b04868cc502a1a34ade01958f22790
	filesize: 118k
	filetype: .txt
	belongs_to_container: alspacdcs:3279aec3-c6c3-4d04-809e-94eadc51c0c8
	number_of_participants: 912     

  - id: alspacdcs:d059b0ff-65c1-4e39-8d4a-b129b5898811
    name: geervin and lyle cord cell counts
    description: >-
      Cellcounts in cord predicted using GervinandLyle cord reference
      (unpublised). This reference has been implemented in meffil.
      Samples are in rows and cell types in columns.
    data_distributions:
      - id: alspacdcs:877af9e8-ecfb-46d6-8767-a76ee4c68b2c
	name: data.txt
	description: >-
	  Plain text file of cell counts predicted using GervinandLyle
	  cord reference.


	md5sum: 0d8535330ac6e12e7f3c5a5f3f30e600
	filesize: 100k
	filetype: .txt
	belongs_to_container: alspacdcs:8abae404-42a9-452a-9a26-7f6c8eed5c6b
	number_of_participants: 912       

  - id: alspacdcs:8cc1141b-13da-483a-aa47-2b3ca5b7b1c1
    name: gse68456 cord cell counts
    description: >-
      Cellcounts in cord predicted using cord reference published in
      de Goede et al (PMID: 26366232). This reference has been implemented in meffil.
      Samples are in rows and cell types in columns.
    data_distributions:
      - id: alspacdcs:4df00b08-2234-4231-a408-c17f64f8e75d
	name: data.txt
	description: >-
	  Plain text file containinng cell counts predicted using cord reference.


	md5sum: 837e1e40bf27d8f6bd1a402f016b798e
	filesize: 120k
	filetype: .txt
	belongs_to_container: alspacdcs:021ad3f5-6e32-42c0-91c6-f996a9b6e62b
	number_of_participants: 912

  - id: alspacdcs:1153615c-a3d4-4bdf-a294-293994144626
    name: houseman cell counts
    description: >-
      Cell counts extracted using Houseman algorithm implemented in
      meffil (PMID: 22568884). Samples are in rows and cell types in columns.
    data_distributions:
      - id: alspacdcs:4b430991-4329-415c-8781-9f12e7944359
	name: data.txt
	description: >-
	  Text file of the cell counts calculated using Houseman algorithm.


	md5sum: 2792f7708e710536c069b05c0192c57d
	filesize: 569k
	filetype: .txt
	belongs_to_container: alspacdcs:ea167030-d783-46c5-b8d5-3cbd9431f396
	number_of_participants: 4843           

  - id: alspacdcs:0225c24c-a4c6-4c29-a791-71ee7049f899
    name: detection p values
    description: >-
       This matrix shows the detection pvalues for each sample and
       each CpG and is extracted from the idat files using the "meffil.load.detection.pvalues"
       function in meffil. CpGs are in rows and samples in columns.
    data_distributions:
      - id: alspacdcs:284e4a48-0ec9-4988-bbc9-55c752e94145
	name: data.Robj
	description: >-
	  R object file for the detection p values matrix

	md5sum: 5b3445d77c5f212dcd10b1645aca7632
	filesize: 418M
	filetype: .Robj
	belongs_to_container: alspacdcs:6e79ad66-78a5-4102-a071-7c259151d0af
	number_of_participants:   

  - id: alspacdcs:183a7d3b-16c9-427c-a2ff-ff4f303bdad6
    name: qc objects all
    description: >-
      This objects contain samples extracted from LIMS and is not
      cleaned up. This object has been used to do the data cleaning.
      All data processing has been conducted using Meffil.
      Meffil uses illuminaio R package to parse Illumina IDAT files
      into a meffil object called qc.objects. All meffil functions,
      QC summary, functional normalization and post-normalization QC summary
      operate on the qc or norm.objects. Specifically, the qc.objects contain
      raw control probe intensities, poor quality probes based on
      detection Pvalues and number of beads, predicted sex,  predicted
      cellcounts and a samplesheet with batch variables.
      In addition, copy number variation can be extracted. This object is a list of individuals.
    data_distributions:
      - id: alspacdcs:32c99449-9401-4ccc-8806-1476a535acae
	name: data.Robj
	description: >-
	  R data file of the qc objects.

	md5sum: 4e754e357d16b507650a5c5f56621dd3
	filesize: 497M
	filetype: .Robj
	belongs_to_container: alspacdcs:4e32e07e-181d-46d2-b134-71ee5f6bd53e
	number_of_participants:             

  - id: alspacdcs:76972370-cdf4-4887-b4a5-14fe31236813
    name: qc objects clean
    description: >-
      All data processing has been conducted using Meffil. Meffil uses
      illuminaio R package to parse Illumina IDAT files into a meffil
      object called norm.objects. All meffil functions, QC summary,
      functional normalization and post-normalization QC summary operate on the norm.objects.
      Specifically, the norm.objects contain raw control probe
      intensities, quantile distributions of the raw intensities, poor
      quality probes based on detection Pvalues and number of beads,
      predicted sex, predicted cellcounts and a samplesheet with batch
      variables. In addition, copy number variation can be extracted. This object is a list of individuals.
    data_distributions:
      - id: alspacdcs:5f2b149e-73dd-44b9-ab15-58d8ffded660
	name: data.Robj
	description: >-
	  R object file  of qc objects clean.


	md5sum: c69ed033e28ea6f822a85c165cf78b83
	filesize: 659M
	filetype: .Robj
	belongs_to_container: alspacdcs:5a4d2e29-aa60-493e-a33c-7bcb63be8088
	number_of_participants:           

  - id: alspacdcs:cfd86d55-286a-42cf-86af-ac72ffce4893
    name: samplesheet
    description: >-
      Manifest file with columns extracted directly from LIMS and age,
      sex, aln, timepoint, timecode, sampletype, genotypeQC columns to
      remove population stratification samples, duplicate.rm column to
      remove duplicates.
      Samples in rows, variables in columns.
    data_distributions:
      - id: alspacdcs:727cb669-bda3-44c7-adac-57f67f53eb41
	name: data.Robj
	description: >-
	  R data object manifest file.


	md5sum: f47ad58a27ebd89d3fe3c81d25b4dc08
	filesize: 100K
	filetype: .Robj
	belongs_to_container: alspacdcs:3275bc26-4695-43b8-915e-4bbc4d13018f
	number_of_participants: 4843               

6.2. DNA methylation - EPIC & 450k - G0 + G1 (dnam_epic450_g0_g1)

6.2.1. Description

This dataset contains methylation data collected from both G0 and G1 on two arrays at different timepoints. This dataset supersedes dnam_450_g0m_g1.

There is data from Illumina Infinium HumanMethylation450K BeadChip array on G1 mothers at two timepoints (pregnancy and middle age), G1 participants at 5 timepoints and G0 participants at three timepoints (birth, childhood and adolescence). This dataset also contains data from Infinium MethylationEPIC v1.0 data on 2721 G1 individuals at 2 timepoints.

This dataset was generated as part of the Accessible Resource for Integrated Epigenomics Studies (http://www.ariesepigenomics.org.uk/).

6.2.2. Methodology

Preprocessing and quality control for this dataset was conducted using Meffil.

Associated publications:

Associated R packages:

6.2.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:dnam_epic450_g0_g1_2022-7-13_f3
name: >-
  DNA methylation - EPIC & 450k - G0 + G1 version 2022-7-13 Freeze 3
description: >-
  This is the freeze 3 version of dnam_epic450_g0_g1, which was first introduced
  in freeze 2 and first released 2022-7-13.

freeze_size: 137G
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_dnam_epic450_g0_g1/releases/tag/Freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13 ### Update to align with date of release
previous_freeze: 2
freeze_of_alspac_dataset_version: alspacdcs:dnam_epic450_g0_g1_2022-7-13
freeze_of_named_alspac_dataset: alspacdcs:dnam_epic450_g0_g1

has_containers:
  - id: alspacdcs:fb5112cd-dab1-4616-be19-507aedb071cb
    name: data
    description: A dir/folder containing the data files
  - id: alspacdcs:300c72c2-a2aa-44c1-b8a3-ef89140e65ce
    name: betas
    description: A dir/folder containing the beta files
    belongs_to_container: alspacdcs:fb5112cd-dab1-4616-be19-507aedb071cb
  - id: alspacdcs:15cf4ce4-d704-4d08-85f1-4d6d0fd02792
    name: control_matrix
    description: A dir/folder containing the control matrix files 
    belongs_to_container: alspacdcs:fb5112cd-dab1-4616-be19-507aedb071cb
  - id: alspacdcs:d20f9a3c-b6e4-4b41-a302-621192b12124
    name: derived
    description: A dir/folder containing the derived data (e.g. Cell count predictions and dnamage) 
    belongs_to_container: alspacdcs:fb5112cd-dab1-4616-be19-507aedb071cb
  - id: alspacdcs:05b2cd5c-e6ab-455d-a66e-153571d40a4f
    name: cellcounts
    description: A dir/folder containing the cell count predictions
    belongs_to_container: alspacdcs:d20f9a3c-b6e4-4b41-a302-621192b12124

  - id: alspacdcs:37ad6257-6c83-4467-a299-98268099f09a
    name: detection_p_values
    description: A dir/folder containing the matrix of detection values
    belongs_to_container: alspacdcs:fb5112cd-dab1-4616-be19-507aedb071cb

  - id: alspacdcs:5ba9bdfa-dd6b-40fd-aeaf-a43b55186c56
    name: samplesheet
    description: A dir/folder containing matrices of the sample identification.
    belongs_to_container: alspacdcs:fb5112cd-dab1-4616-be19-507aedb071cb 


has_parts:
  - id: alspacdcs:8bc24e6e-7577-43dd-91a8-a298822f568c
    name: betas
    description: >-
      Normalized betas using functional normalization.
      We used 10 PCs on the controlmatrix to regress out technical
      variation. Slide was regressed out as random effect before
      normaliziation. CpGs are in rows and samples in columns.
    data_distributions:
      - id: alspacdcs:bff01858-14f6-4d31-be4a-cc5b0da20327
	name: 450.gds
	description: >-
	  R data object for the Normalized beta data for the 450 array only.
	md5sum: 02e9b3cdda39d3476bfce111f5935f93
	filesize: 22G
	filetype: .gds
	belongs_to_container: alspacdcs:300c72c2-a2aa-44c1-b8a3-ef89140e65ce
	number_of_participants: 5927
      - id: alspacdcs:598ac47f-876d-472a-b6b5-c35bc8101a5f
	name: common.gds
	description: >-
	  R data object for the Normalized beta data for both the EPIC and 450 arrays.
	md5sum: 26a3ccd7c99f8074522295d649f277bf
	filesize: 30G
	filetype: .gds
	belongs_to_container: alspacdcs:300c72c2-a2aa-44c1-b8a3-ef89140e65ce
	number_of_participants: 8670
      - id: alspacdcs:955e38f2-bf05-42e9-a15a-0b3091bb5066
	name: epic.gds
	description: >-
	  R data object for the Normalized beta data for  the EPIC array only.
	md5sum: 2433412ede73c7bb85eee51763c6797b
	filesize: 18G
	filetype: .gds
	belongs_to_container: alspacdcs:300c72c2-a2aa-44c1-b8a3-ef89140e65ce
	number_of_participants: 2743

  - id: alspacdcs:7306504c-e329-43c6-a65b-428c8e3fd6bf
    name: control_matrix
    description: >-
      The 850 control probes are summarized in 42 control types.
      These probes can roughly be divided into negative control probes
      (613), probes intended for between array normalization (186)
      and the remainder (49), which are designed for quality
      control, including assessing the
      bisulfite conversion rate. None of these probes are designed
      to measure a biological signal.
      The summarized control probes can be used as surrogates for
      unwanted variation and are used for the functional
      normalization.
      Samples are rows and 42 control types are in columns.
    data_distributions:
      - id: alspacdcs:1e616a92-34b9-4607-898d-062d5bd735ca
	name: 450.txt
	description: >-
	  Plain text file of the control matrix for the 450 array only.
	md5sum: 9e6aa62498c5bb7493f7512e274056ba
	filesize: 2.1M
	filetype: .txt
	belongs_to_container: alspacdcs:15cf4ce4-d704-4d08-85f1-4d6d0fd02792
	number_of_participants:  
      - id: alspacdcs:3506af8d-8d94-4a2d-936e-9bcf306027ca
	name: common.txt
	description: >-
	  Plain text file of the control matrix for both the EPIC and 450 arrays.
	md5sum: f3d58c03dafcd4fc10292fc1338d34f7
	filesize: 3.2M
	filetype: .txt
	belongs_to_container: alspacdcs:15cf4ce4-d704-4d08-85f1-4d6d0fd02792
	number_of_participants:  
      - id: alspacdcs:5877e6be-2c2b-4dc8-b534-3989b175bee0
	name: epic.txt
	description: >-
	  Plain text file of the control matrix for the EPIC array only.
	md5sum: ea20f22f63bf9855a0e159945cbc10e3
	filesize: 1010K
	filetype: .txt
	belongs_to_container: alspacdcs:15cf4ce4-d704-4d08-85f1-4d6d0fd02792
	number_of_participants:  

  - id: alspacdcs:b73ed28a-7219-49b5-94b8-e39d2bbda6f2
    name: DNA methylation age
    description: >-
      DNA methylation aging estimates from within the dataset. 
      Further information on this data and its usage is found
      within the `dnamage.html` and `dnamage.md` within the docs
      dir/folder.
    data_distributions:
      - id: alspacdcs:a3963582-7e06-433b-8694-77821324cc5d
	name: dnamage.csv
	description: >-
	  A csv file containing DNA methylation aging estimates within the dataset. 
	md5sum: 668bfe2a3c713801eb1b02e920eab964
	filesize: 12M
	filetype: .csv
	belongs_to_container: alspacdcs:d20f9a3c-b6e4-4b41-a302-621192b12124
	number_of_participants:  8192

  - id: alspacdcs:453b4986-d992-4525-98a9-051b55a9296e
    name: cell counts
    description: >-
      Files contain cell counts estimated using a variety of cell type 
      references using the Houseman deconvolution algorithm (PMID: 22568884).
      In each file, samples correspond to rows and cell types to columns.
    data_distributions:
      - id: alspacdcs:c39f3a33-8b19-414f-a8bc-d193847314d7
	name: andrews and bakulski cord blood.txt
	description: >-
	  Cord blood cell count estimates derived using the Bakulski et al. 2016 reference 
	  (PMID 27019159; https://bioconductor.org/packages/release/data/experiment/html/FlowSorted.CordBlood.450k.html).
	  This reference has been implemented in meffil. Cell counts estimated for b-cells, 
	  cd4+ t cells, cd8+ t cells, granulocytes, monocytes, natural killer cells and 
	  nucleated red blood cells. In this text file, samples are in rows and cell types in columns.
	md5sum: 33c69aa8e50deb28355dcb82d01c7510
	filesize: 114K
	filetype: .txt
	belongs_to_container: alspacdcs:05b2cd5c-e6ab-455d-a66e-153571d40a4f
	number_of_participants: 914    
      - id: alspacdcs:ae1ac6fc-baf0-4098-aa0a-76f499119645
	name: gervin and lyle cord blood.txt
	description: >-
	  Cord blood cell count estimates derived using the Gervin et al. 2019
	  reference (PMID 31455416; GEO accession GSE127824). Cell counts 
	  estimated for b-cells, cd4+ t cells, cd8+ t cells, granulocytes, monocytes,
	  and natural killer cells. This reference has been implemented in meffil. 
	  In this text file, samples are in rows and cell types in columns.
	md5sum: 099c4cf9bd4ecfee91c19c3c2d2b6f70
	filesize: 100K
	filetype: .txt
	belongs_to_container: alspacdcs:05b2cd5c-e6ab-455d-a66e-153571d40a4f
	number_of_participants: 914   
      - id: alspacdcs:d0d978ae-82b9-4820-89b0-698a325e208a
	name: cord blood gse68456.txt
	description: >-
	  Cord blood cell count estimates derived using the de Goede et al. 2015 reference
	  (PMID 26366232; GEO accession GSE68456).  Cell counts estimated for b-cells, cd4+ t cells,
	  cd8+ t cells, granulocytes, monocytes, natural killer cells and nucleated red blood cells.
	  This reference has been implemented in meffil. In this text file, samples are in rows and
	  cell types in columns.
	md5sum: 941f8a9ce1289ab5baaf10fb29bd8941
	filesize: 130K
	filetype: .txt
	belongs_to_container: alspacdcs:05b2cd5c-e6ab-455d-a66e-153571d40a4f
	number_of_participants: 914
      - id: alspacdcs:1026dca3-8263-443f-9ae5-f07401987e2f
	name: blood gse35069 complete.txt
	description: >-
	  Cell counts in peripheral blood predicted using the peripheral blood reference published in 
	  Reinius et al. 2012 (PMID: 22848472). Same as 'blood gse35069.txt' but replaces granulocytes
	  with eosinophils and neutrophils. This reference has been implemented in meffil. 
	  In this text file, samples are in rows and cell types in columns.  
	md5sum: c8c1b071dfe501f54d59bd12757a2de0
	filesize: 1.2M
	filetype: .txt
	belongs_to_container: alspacdcs:05b2cd5c-e6ab-455d-a66e-153571d40a4f
	number_of_participants: 8671         
      - id: alspacdcs:4fa62702-1eaa-4485-94dc-87c3d84c439b
	name: blood gse35069.txt
	description: >-
	  Blood cell count estimates derived using the Reinius et al. 2012 reference 
	  (PMID 25424692; GEO accession GSE35069).  Cell counts estimated for b-cells,
	  cd4+ t cells, cd8+ t cells, granulocytes, monocytes, and natural killer cells.
	  In this text file, samples are in rows and cell types in columns.
	md5sum: ba915880238cdec1f71b681e4b756d02
	filesize: 1021K
	filetype: .txt
	belongs_to_container: alspacdcs:05b2cd5c-e6ab-455d-a66e-153571d40a4f
	number_of_participants:  8671
      - id: alspacdcs:588856a5-0614-4475-a2ae-f0b22f51326c
	name: blood idoloptimized epic.txt
	description: >-
	  Cell counts in peripheral blood predicted using the cell type reference from Bioconductor 
	  package FlowSorted.Blood.EPIC. This reference has been implemented in meffil. In this text file,
	  samples are in rows and cell types in columns.
	md5sum: 0919908374891d1eb61c1cb4e12d8679
	filesize: 347K
	filetype: .txt
	belongs_to_container: alspacdcs:05b2cd5c-e6ab-455d-a66e-153571d40a4f
	number_of_participants: 2744
      - id: alspacdcs:5265d49a-b7b2-4907-9c4b-d8b71a7ba516
	name: blood idoloptimized.txt
	description: >-
	  Cell counts in peripheral blood predicted using the cell type reference from Bioconductor 
	  package FlowSorted.Blood.EPIC but restricted to the IDOLOptimizedCpGs450klegacy CpG sites. 
	  This reference has been implemented in meffil. In this text file, samples are in rows and 
	  cell types in columns.
	md5sum: 5a906d07d8c5c8ad4d85c0b436412666
	filesize: 1.1M
	filetype: .txt
	belongs_to_container: alspacdcs:05b2cd5c-e6ab-455d-a66e-153571d40a4f
	number_of_participants: 8671 
      - id: alspacdcs:bd4f6a06-5c39-4817-be00-18f920ef463c
	name: combined cord blood.txt
	description: >-
	  Cord blood cell count estimates derived using the Bakulski et al, Gervin et al., de Goede et al.,
	  and Lin et al. references (https://bioconductor.org/packages/release/data/experiment/html/FlowSorted.CordBloodCombined.450k.html)
	  for CpG sites selected using the IDOL algorithm and optimized for the Illumina Infinium 
	  HumanMethylation450 Beadchip. Cell counts estimated for b-cells, cd4+ t cells, cd8+ t cells,
	  granulocytes, monocytes, natural killer cells and nucleated red blood cells.
	  In this text file, samples are in rows and cell types in columns.
	md5sum: 7cbcf72ca00012d17d22ff6d21b7575c
	filesize: 129K
	filetype: .txt
	belongs_to_container: alspacdcs:05b2cd5c-e6ab-455d-a66e-153571d40a4f
	number_of_participants: 914

  - id: alspacdcs:82233c09-cafd-4128-8bdb-e00ae7d86dd5
    name: detection p values
    description: >-
       This matrix shows the detection pvalues for each sample and
       each CpG and is extracted from the idat files using the "meffil.load.detection.pvalues"
       function in meffil. CpGs are in rows and samples in columns.
    data_distributions:
      - id: alspacdcs:284e4a48-0ec9-4988-bbc9-55c752e94145
	name: 450.gds
	description: >-
	  R object file for the detection p values matrix for the 450 array only.
	md5sum: 1c437226b2aab0c00aed7098e739f49d
	filesize: 22G
	filetype: .gds
	belongs_to_container: alspacdcs:37ad6257-6c83-4467-a299-98268099f09a
	number_of_participants: 5927
      - id: alspacdcs:4c208bea-3511-11ee-be56-0242ac120002
	name: common.gds
	description: >-
	  R object file for the detection p values matrix for both EPIC and 450 arrays.
	md5sum: 538ffd8177ecb8adcba5095a7d5f75c0
	filesize: 30G
	filetype: .gds
	belongs_to_container: alspacdcs:37ad6257-6c83-4467-a299-98268099f09a
	number_of_participants: 8670
      - id: alspacdcs:51c397c2-3511-11ee-be56-0242ac120002
	name: epic.gds
	description: >-
	  R object file for the detection p values matrix for the EPIC array only.
	md5sum: 542967aac7f77f2c2c8208df37283f29
	filesize: 18G
	filetype: .gds
	belongs_to_container: alspacdcs:37ad6257-6c83-4467-a299-98268099f09a
	number_of_participants: 2743

  - id: alspacdcs:19ee6082-058b-446f-b424-76e75b641766
    name: samplesheet
    description: >-
      Manifest files with columns extracted directly from LIMS and age,
      sex, omics ID, timepoint, timecode, sampletype, genotype columns to report
      sample mismatches, duplicate.rm column to remove duplicates.
      Samples in rows, variables in columns.
    data_distributions:
      - id: alspacdcs:fc5413e9-cc79-4cbd-8ca9-ee722ee18c0d
	name: samplesheet-450.csv
	description: >-
	  R data object manifest file for the 450 array only.
	md5sum: 9410525be519472de354134626192864
	filesize: 2.2M
	filetype: .csv
	belongs_to_container: alspacdcs:5ba9bdfa-dd6b-40fd-aeaf-a43b55186c56
	number_of_participants: 5927              
      - id: alspacdcs:a5f905c8-6c47-4168-92f5-1bd9bfa59a6e
	name: samplesheet-common.csv
	description: >-
	  R data object manifest file for both the EPIC and 450 arrays. This is a duplicate with samplesheet.csv.
	md5sum: 164b6418a8c6a0f5dbea3da5255b1b96
	filesize: 3.2M
	filetype: .csv
	belongs_to_container: alspacdcs:5ba9bdfa-dd6b-40fd-aeaf-a43b55186c56
	number_of_participants: 8670
      - id: alspacdcs:376cb854-a0ee-4735-bcac-108795a3b9c3
	name: samplesheet-epic.csv
	description: >-
	  R data object manifest file for the EPIC array only.
	md5sum: f5a8ba932af085c0a9e1603afc94d23d
	filesize: 1.1M
	filetype: .csv
	belongs_to_container: alspacdcs:5ba9bdfa-dd6b-40fd-aeaf-a43b55186c56
	number_of_participants: 2743  
      - id: alspacdcs:38c5b752-447c-4270-874e-fbf60825a0bc
	name: samplesheet.csv
	description: >-
	  R data object manifest file for both the EPIC and 450 arrays. This is a duplicate with samplesheet-common.csv.
	md5sum: 164b6418a8c6a0f5dbea3da5255b1b96 # should be the same as samplesheet-common.csv
	filesize: 3.2M
	filetype: .csv
	belongs_to_container: alspacdcs:5ba9bdfa-dd6b-40fd-aeaf-a43b55186c56
	number_of_participants: 8670

7. Gene Expression Data

7.1. Gene expression - array - G1 (ge_ht12_g1)

7.1.1. Description

There are two different types of QC'd data available in this version, one performed by David Evans for the Bryois et al 2014 paper, and one performed by Gibran Hemani for the molgenis eQTL mapping meta analysis. A version without QC is available as well. Details on the QC'd versions can be seen below.

7.1.2. Methodology

Bryois:

  • LCL's from unrelated individuals were grown under identical conditions and cells frozen in RNAlater. RNA was extracted using an RNeasy extraction kit (Qiagen) and was amplified using the Illumina TotalPrep-96 RNA Amplification kit (Ambion). Expression profiling of the samples, each with two technical replicates, were performed using the Illumina Human HT-12 V3 BeadChips (Illumina Inc) including 48,804 probes where 200 ng of total RNA was processed according to the protocol supplied by Illumina. Raw data was imported to the Illumina Beadstudio software and probes with less than three beads present were excluded. Log2 - transformed expression signals were then normalized with quantile normalization of the replicates of each individual followed by quantile normalization across all individuals. We restricted our analysis to 23'935 probes tagging genes annotated in Ensembl. Principal component analysis was performed on 931 individuals. 62 individuals with principal component 1 or 2 greater than one standard deviation of the population were excluded from further analysis. See http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004461 for full details.

Molgenis:

  • Genetic outliers were removed, any individuals that were clear outliers in the first 2 genetic principal components. Each probe was simply quantile normalised and then log2 transformed. Then adjusted for the first 4 genetic MDS, expression principal components (excluding those that had genetic associations), and scaled to have mean 0 and variance 1. See https://github.com/molgenis/systemsgenetics/wiki/eQTL-mapping-analysis-cookbook for full details.

7.1.3. Freeze Docs

# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema

id: alspacdcs:ge_ht12_g1_2015-11-02_f3
name: Gene expression - array - G1 release version 2015-11-02 freeze 3
description: >-
  This is the third freeze of the 2015-11-02 version of
  ge_ht12_g1 dataset which has .csv distributions of the data rather than
  .Rdata files in order to be easier to use across differnt data
  science software and languages.

freeze_size: 2.6G
linker_file_md5sum: b528acad88cd1697129a7cd59aa14ada
woc_file_md5sum: cf9249c306e766a8689f78197e1f5f25
all_individuals_to_exclude_md5sum: 7faad74aeebaba4ed71aac783414d75b
git_tag: https://github.com/alspac/dataset_ge_ht12_g1/releases/tag/freeze3
is_current_freeze: true
freeze_number: 3
freeze_date: 2023-09-13
previous_freeze: alspacdcs:ge_ht12_g1_2015-11-02_f2
freeze_of_alspac_dataset_version: alspacdcs:ge_ht12_g1_2015-11-02
freeze_of_named_alspac_dataset: alspacdcs:ge_ht12_g1
has_parts:
  - id: alspacdcs:ge_ht12_g1_2015-11-02_bryosis_f3
    name: Bryosis data
    description: Dataset part for the Bryosis data in ge_ht12_g1 version 2015-11-02 freeze3
    data_distributions:
      - id: alspacdcs:272eb7917f3ef253f0b65f7b01d35574_bryosis.csv
	name: bryosis.csv
	description: >-
	  The freeze 3 csv version of the bryosis data.
	  IDs in columns and Illumina probe IDs in rows.
	  This is the normalised data used in Bryois et al 2014.
	  Probe IDs are mapped to Genes
	  in raw.csv
	md5sum: 272eb7917f3ef253f0b65f7b01d35574
	filesize: 742M
	filetype: .csv
	number_of_participants: 947
	number_of_gene_expression_probe_values: 48630
  - id: alspacdcs:ge_ht12_g1_2015-11-02_molgenis_f3
    name: Molgenis
    description: >-
      Dataset part for the Molgenis data in ge_ht12_g1 version 2015-11-02 freeze3
    data_distributions:
      - id: alspacdcs:d7a6826fe6b4d3a0c853ec7eaa8e55e6_molgenis.csv
	name: molgenis.csv
	description: >-
	  The freeze 3 csv version of the molgenis data.
	  IDs in columns and Illumina probe IDs in rows.
	  Normalised data following the molgenis pipeline,
	  found at
	  https://github.com/molgenis/systemsgenetics/wiki/eQTL-mapping-analysis-cookbook.
	  Probe IDs are mapped to Genes
	  in raw.csv

	md5sum: d7a6826fe6b4d3a0c853ec7eaa8e55e6
	filesize: 752M
	filetype: .csv
	number_of_participants: 879
	number_of_gene_expression_probe_values: 48630
  - id: alspacdcs:ge_ht12_g1_2015-11-02_raw_f3
    name: Raw
    description: Dataset part for the raw data in ge_ht12_g1 version 2015-11-02 freeze3
    data_distributions: 
      - id: alspacdcs:1e559d3a25a10f1f11387325981882a8_raw.csv
	name: raw.csv
	description: >-
	  The freeze 3 csv version of the raw ge data.
	  IDs in columns and probes in rows. Two columns per
	  individual, with one column for average signal and one column
	  for average number of beads.
	  Presumably this is a file generated by the Illumina Genome
	  Studio software.
	md5sum: 1e559d3a25a10f1f11387325981882a8
	filesize: 1.1G
	filetype: .csv
	number_of_participants: 994 ##This is not how wide this dataframe is
	number_of_gene_expression_probe_values: 48630

8. Omics tips

8.1. Introduction

This section is a guide to using 'Omics datasets. It explains which software to use and describes common file formats. It's a good starting point for beginners and helpful for problem-solving.

8.2. Disclaimer

Some information is copied or reworded from software documentation. Check the original documentation alongside this guide for up-to-date information. Note that some links may no longer work.

8.3. Operating systems

You can use ALSPAC data with any operating system, but Unix-based systems like Macintosh, Linux, or BSD are more convenient due to the data's size and complexity. We recommend using the command line and programming scripts with languages like Bash, R, Python, or Perl. Many online resources are available to learn these tools. Use free/libre and open-source software where possible.

Links:

8.4. Key Omics software

8.4.1. Plink

Plink is a tool for performing quality control and whole genome association analysis of genetic data.

8.4.2. SNPTest

SNPTest is a tool for performing whole genome association analysis of genetic data.

8.4.3. BoltLmm

BoltLmm is a tool for performing genome association analysis of genetic data. It is recommended for analysis of more than 5000 samples, its methods automatically take into account population substructures.

8.4.4. Qctools

A tool for quality control of genetic data. It is also useful to inspect and modify .gen .bgen and vcf files etc (see section 4 below).

8.4.5. SAMTOOLS

Samtools is a suite of tools which are used for genomic analysis.

8.4.6. VCFTOOLS

Part of samtools that allows you to work with vcf files.

8.4.7. BCFTOOLS

This is a part of samstools and allows users to manipulate .bcf files.

8.5. File types

In a Unix environment the postfix of a file name does not explicitly mean anything to the operating system, unlike in a Windows system which will look at the file types. In a Unix system it is just part of the name of the file and humans use it to distinguish file formats. The following is a non-exhaustive list of file types you may encounter whilst using ALSPAC Omics data.

8.5.1. .gen

This is an 'oxford' data format for genetic data. The .gen file is a plain text file, this means that standard Unix command line tools can be used to inspect the data. For example, 'head' or 'less'.

The .gen (genotype) file stores data on a one-line-per-SNP format. The first 5 entries of each line are the SNP ID, RS ID of the SNP, base-pair position of the SNP, the allele coded A and the allele coded B. The SNP ID can be used to denote the chromosome number of each SNP. The next three numbers on the line are the probabilities of the three genotypes AA, AB and BB at the SNP for the first individual in the cohort. The next three numbers are the genotype probabilities for the second individual in the cohort. The next three numbers are for the third individual and so on. The order of individuals in the genotype file should match the order of the individuals in the sample file (see below). It should be noted that the probabilities need not sum to 1 to allow for the possibility of a NULL genotype call. This format allows for genotype uncertainty. This genotype file format is the same as that produced by the genotype calling algorithm CHIAMO. NOTE : We recommend that you arrange SNPs in base-pair order in the genotype files. This is required if you want to use the files with IMPUTE and will make viewing the output of SNPTEST somewhat easier. For example, Suppose you want to create a genotype for 2 individuals at 5 SNPs whose genotypes are

SNP 1 AA AA
SNP 2 GG GT
SNP 3 CC CT
SNP 4 CT CT
SNP 5 AG GG

The correct genotype file would look like this:

SNP1 rs1 1000 A C 1 0 0 1 0 0
SNP2 rs2 2000 G T 1 0 0 0 1 0
SNP3 rs3 3000 C T 1 0 0 0 1 0
SNP4 rs4 4000 C T 0 1 0 0 1 0
SNP5 rs5 5000 A G 0 1 0 0 0 1

8.5.2. .bgen

A binary version of a .gen file. This file can not be visually inspected on the command line. .bgen files are used because they greatly increase the speed and storage efficiency of software for storing large amounts of Omics data. The full details of the file format are discussed in : https://www.well.ox.ac.uk/~gav/bgen_format/ bgen files are normally used with tools such as qctools and snptest There is also a library for reading .bgen files into R : https://bitbucket.org/gavinband/bgen/wiki/rbgen

8.5.3. .sample

The .sample file is paired with either .gen or .bgen files. It contains information on the samples that is not genetic. It is a plain text file that can be inspected with standard Unix command line tools.

Please note that the sample file format changed with the release of SNPTEST v2. Specifically, the way in which covariates and phenotypes are coded on the second line of the header file has changed. The sample file has three parts (a) a header line detailing the names of the columns in the file, (b) a line detailing the types of variables stored in each column, and (c) a line for each individual detailing the information for that individual. Here is an example of the start of a sample file for reference

ID_1 ID_2 missing cov_1 cov_2 cov_3 cov_4 pheno1 bin1
0 0 0 D D C C P B
1 1 0 .007 1 2 0 .0019 -0.008 1.233 1
2 2 0 .009 1 2 0 .0022 -0.001 6.234 0
3 3 0 .005 1 2 0 .0025 0.0028 6.121 1
4 4 0 .007 2 1 0 .0017 -0.011 3.234 1
5 5 0 .004 3 2 -0 .012 0.0236 2.786 0

The header line: This line needs a minimum of three entries. The first three entries should always be ID_1, ID_2 and missing. They denote that the first three columns contain the first ID, second ID and missing data proportion of each individual. Additional entries on this line should be the names of covariates or phenotypes that are included in the file. In the above example, there are 4 covariates named cov_1, cov_2, cov_3, cov_4, a continuous phenotype named pheno1 and a binary phenotype named bin1. NOTE : All phenotypes should appear after the covariates in this file. The second line of the file details the type of variables included in each column. The first three entries of this line should be set to 0. Subsequent entries in this line for covariates and phenotypes should be specified by the following rules

D Discrete covariate (coded using positive integers)
C Continuous covariates
P Continuous Phenotype
B Binary Phenotype (0 = Controls, 1 = Cases)

The remainder of the file should consist of a line for each individual containing the information specified by the entries of the header line (see example above). Use spaces to separate the entries of the sample file and not TABS because that is the expected character.

Missing values - Specifying missing values for covariates and phenotypes is possible. It was recommended that you use -9 for missing values. This was the default value assumed by SNPTEST v1, although the -missing_code option in SNPTEST v1 meant that you could use other numeric values for the missing code, In SNPTEST v2 the behavior of the -missing_code option has changed so that it now takes a comma-separated list of values, each of which is treated as missing when encountered in the sample file(s). Default missing values are now denoted by the two character string "NA".

8.5.4. .ped

A plink format file that is in plain text and can be viewed with standard tools. It contains genetic variant data. https://www.cog-genomics.org/plink/1.9/formats#ped

8.5.5. .map

A plink format file that is in plain text. It contains information about variants. https://www.cog-genomics.org/plink/1.9/formats#map

8.5.6. .bed

A plink format file that isa binary equivalent of a .ped file. It is smaller and faster to process but is not easily viewable or editable. https://www.cog-genomics.org/plink/1.9/formats#bed

8.5.7. .bim

A plink format, similar to a .map file but is used with binary .bed files. https://www.cog-genomics.org/plink/1.9/formats#bin

8.5.8. .fam

A plain text format that contains sample information for plink binary files. https://www.cog-genomics.org/plink/1.9/formats#fam

8.5.9. .csv

A plain text format where different fields are separated by commas. (Comma separated variables).

8.5.10. .vcf

VCF files are a flexible file format for storing different types of genetic variants. They are a plain text format that can be inspected on the command line with standard Unix tools. However they are often very large files, and specific tools such as 'vcftools' are useful for working with this data. Commonly SNPs are stored in these files but other variants such as Copy Number variations can also be stored. The basic form for a vcf file is: https://en.wikipedia.org/wiki/Variant_Call_Format

8.5.11. .bcf

This is a binary version of a vcf file. It cannot be inspected on the command line, but can be used with the genomic tools mentioned in this document.

8.5.12. .tar.gz

This is a standard Unix file format for bundling and compressing a set of files. It is similar to a .zip file. It is made by first bundling a set of files into a .tar file (sometimes called a tar ball). This is then compressed using 'gun zip'. https://en.wikipedia.org/wiki/Tar_(computing) https://en.wikipedia.org/wiki/Gzip

8.5.13. .enc

This file extension is used as a convention to mean that the file is encrypted. You will need to have that password that was used to encrypt the data in order to unencrypt the files. https://en.wikipedia.org/wiki/OpenSSL

8.6. Variant/SNP ids

There are many types of genetic variation. A common type is a single nucleotide polymorphism (SNP). Others include copy number variations.

Variants can be specified by a Chromosome and location in reference to a specific build of the human genome. They can also be given a reference SNP (rs) cluster identifier.

  • Chr:Location
  • Rs ids

8.7. Overview of Imputation reference panels

SNP array data frequently contain hundreds of thousands of variants. However due to linkage disequilibrium it is possible to estimate many more SNP values for an individual. This estimation procedure is called imputation and it works by combining an individuals SNP array data with a large reference population of sequenced data. In this way it is possible to have accurate estimations of millions of SNP values for an individual without the cost of fully sequencing each person. ALSPAC has prerun the imputation process using three different imputation panels.

8.7.1. Panels

  1. TOPmed

    An upcoming (to alspac) reference panel which will have the most snps

  2. HRC

    This is the latest reference panel and our data contains circa 40 millions of SNPs.

  3. 1000 Genomes

    This is the previous generation reference panel which is still widely used in ALSPAC studies. There are some SNPs that appear in this panel that are not in the HRC panel.

  4. Hapmap

    This was the first widely used imputation panel.

8.8. SNP data types from imputation.

SNPs that have been imputed can be stored and analysed in different formats. These can be appropriate for different types of analysis, for example an analysis could assume and additive effect for the minor allele or it could assume a recessive/dominant effect.

  • Best guess. The data will be presented as either 0,1, or 2 to represent how many of the minor alleles at that position a person has. The best guess is derived from the probability of a variant calculated from the imputation process.
  • Dosage. This is the probability that the person has 0, 1 or 2 of the minor allele. i.e. 0.1, 0.2,0.7. This will sum to one across the three possibilities (i.e for each SNP for each individual).

8.9. SNP Statistics

You can generate statistics on your SNP data using the program 'QCtools'. This will give you the imputation information scores. For example:

qctool -g example.bgen -s example.sample -sample-stats -osample sample-stats.txt

8.10. Best practice

8.10.1. GWAS

We recommend you follow the steps outlined in the following paper when performing GWAS: Marees, Andries T., et al. "A tutorial on conducting genome‐wide association studies: Quality control and statistical analysis." International journal of methods in psychiatric research 27.2 (2018): e1608. https://doi.org/10.1002/mpr.1608

8.10.2. Phewas

We recommend you follow the steps outlined in the following paper when performing Phewas: Millard, L., Davies, N., Timpson, N. et al. MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization. Sci Rep 5, 16645 (2015). https://doi.org/10.1038/srep16645

8.10.3. Methylation

The following paper describes the methylation data available in ALSPAC Relton, Caroline L., et al. "Data resource profile: accessible resource for integrated epigenomic studies (ARIES)." International journal of epidemiology 44.4 (2015): 1181-1190.

8.11. Population stratification

This is when an observed genetic association is due to the population/geography. Not taking this into account can lead to biased estimates of effects. One common method to account for these is to calculate principal components of the genetic data and then to include these as covariables in any models. Principal components can be generated using plink or other tools.

For more information about how to do this in plink see:https://www.cog-genomics.org/plink/1.9/strat

An common method used to account for population substructure is by using linear mixed models. For example using the bolt LMM software tool.

https://data.broadinstitute.org/alkesgroup/BOLT-LMM/

8.12. Common tasks

Here we provide links to webpages that provide instructions or provide brief details any code for completing common tasks using the various software we have described above (section x):

  • Extract some SNPs from a bgen data file and convert to plain text.

https://www.well.ox.ac.uk/~gav/qctool_v2/documentation/examples/filtering_variants.html

  • Extract some SNPs from bed data:

http://zzz.bwh.harvard.edu/plink/dataman.shtml

plink –bfile mydata –chr 2 –from-kb 5000 –to-kb 10000

  • Reading .bgen and .sample oxford files in plink

Plink supports bgen files but it is fussy about the types of its columns in the data.sample file. You may wish to remove or retype columns to read a data.sample file into plink. For more info see:

https://www.cog-genomics.org/plink/2.0/input

To make a new sample file removing some columns you can use the Unix command: 'cut -f 1,2,3 -d " " data.sample > data2.sample'

8.13. Courses

Working with 'Omics data can be complicated but there are many excellent resources available to help you learn how to do this. There are both paid in person courses and free online courses.

Details on paid courses offered by Bristol University can be found here: https://www.bristol.ac.uk/medical-school/study/short-courses/ In addition, a number of free online courses are summarised here: https://www.mooc-list.com/tags/bioinformatics

8.14. Further sources of help

8.14.1. Stack exchange

Stack exchange is an online Q&A community which is divided into different sub-communities. The first and most well-known is Stack overflow. This is one of the best place to ask questions about programming on the Internet. Other useful exchange sites include bioinformatics https://bioinformatics.stackexchange.com/, maths https://mathoverflow.net/ and statistics https://stats.stackexchange.com/.

8.14.2. Bio-stars

Biostars is bioinformatics community Q&A web-site: https://www.biostars.org/

8.14.3. Mailing lists

For individual product/projects there is often a mailing list. For example to get help using SNPTEST you can ask on the mailing list https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html#contact

8.14.4. AI tools

AI tools such as chatGPT can be useful to understand how to work with omics data.

8.14.5. Ask ALSPAC

If you can not find the answer to your question or you think there is something wrong with your data then please contact the alspac-omics@bristol.ac.uk mailbox and we will do our best to help you.

Author: ALSPAC Omics team

Created: 2023-12-04 Mon 11:41

Validate