Skip to content

LieberInstitute/Brain_INTERACT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains codes to predict DNA methylation regulatory variants in specific brain cell types.

Hardware requirements

INTERACT package requires only a standard computer with GPUs and enough RAM to support the in-memory operations.

Software requirements

OS Requirements

This package is supported by Linux. The package has been tested on Rocky Linux 9.2.

Python Dependencies

INTERACT mainly depends on the following Python packages.
PyTorch
apex
numpy
scipy
scikit-learn
pandas
loompy
json
h5py

Usage

1. Pretraining

1.1 Calculate DNAm levels for CpG and CH sites in pseudo-bulk for cell types.

Example

Construct DNAm levels and 2kb CpG-centered DNA sequences for CpG sites in chromosome 1 for L23-IT.

$python run_CG_methyl.py L23-IT chr1

Example

Construct DNAm levels and 2kb CAC-centered DNA seqeunces for CAC sites in chromosome 1 for L23-IT.

$python run_CAC_methyl.py L23-IT chr1

Example

Construct DNAm levels and 2kb CAC-centered DNA seqeunces for CAG sites in chromosome 1 for L23-IT.

$python run_CAG_methyl.py L23-IT chr1

Example

Construct DNAm levels and 2kb CAT-centered DNA seqeunces for CAT sites in chromosome 1 for L23-IT.

$python run_CAT_methyl.py L23-IT chr1

Example

Construct DNAm levels and 2kb CTC-centered DNA seqeunces for CTC sites in chromosome 1 for L23-IT.

$python run_CTC_methyl.py L23-IT chr1

1.2 Construct DNAm levels for CpG and CH sites for excitatory neuronal, inhibitory neuronal or glial cells by merging their corresponding cell types.

Example

Merge DNAm levels and 2kb CpG-centered DNA sequences for CpG sites in chromosome 1 for excitatory neuronal cells.

$python run_methylation.py CG chr1

Example

Merge DNAm levels and 2kb CAC-centered DNA sequences for CAC sites in chromosome 1 for excitatory neuronal cells.

$python run_methylation.py CAC chr1

Example

Merge DNAm levels and 2kb CAG-centered DNA sequences for CAG sites in chromosome 1 for excitatory neuronal cells.

$python run_methylation.py CAG chr1

Example

Merge DNAm levels and 2kb CAT-centered DNA sequences for CAT sites in chromosome 1 for excitatory neuronal cells.

$python run_methylation.py CAT chr1

Example

Merge DNAm levels and 2kb CTC-centered DNA sequences for CTC sites in chromosome 1 for excitatory neuronal cells.

$python run_methylation.py CTC chr1

1.3 Pretrain INTERACT model

Example

pretrain INTERACT model with DNAm levels for CpG sites from excitatory neuron using four GPUs

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_methylation_regression \
        --exp_name multask_methylation_regression \
        --learning_rate 0.000176 \
        --batch_size 128 \
        --data_dir ./datasets/snmQTL/snmC-seq3_CG/MulTask/Reference \
        --output_dir ./outputs/MulTask/snmC-seq3_CG/Excitatory \
        --warmup_steps 10000 \
        --gradient_accumulation_steps 1 \
        --fp16 --local_rank 0 \
        --nproc_per_node 4 \
        --model_config_file ./config/config.json

Example

pretrain INTERACT model with DNAm levels for CAC sites from excitatory neuron using four GPUs

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
        --exp_name multask_CH_regression \
        --learning_rate 0.000176 \
        --batch_size 128 \
        --data_dir ./datasets/snmQTL/snmC-seq3_CAC/MulTask \
        --output_dir ./outputs/MulTask/snmC-seq3_CAC/Excitatory \
        --warmup_steps 10000 \
        --gradient_accumulation_steps 1 \
        --fp16 --local_rank 0 \
        --nproc_per_node 4 \
        --model_config_file ./config/config.json

Example

pretrain INTERACT model with DNAm levels for CAG sites from excitatory neuron using four GPUs

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
        --exp_name multask_CH_regression \
        --learning_rate 0.000176 \
        --batch_size 128 \
        --data_dir ./datasets/snmQTL/snmC-seq3_CAG/MulTask \
        --output_dir ./outputs/MulTask/snmC-seq3_CAG/Excitatory \
        --warmup_steps 10000 \
        --gradient_accumulation_steps 1 \
        --fp16 --local_rank 0 \
        --nproc_per_node 4 \
        --model_config_file ./config/config.json

Example

pretrain INTERACT model with DNAm levels for CAT sites from excitatory neuron using four GPUs

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
        --exp_name multask_CH_regression \
        --learning_rate 0.000176 \
        --batch_size 128 \
        --data_dir ./datasets/snmQTL/snmC-seq3_CAT/MulTask \
        --output_dir ./outputs/MulTask/snmC-seq3_CAT/Excitatory \
        --warmup_steps 10000 \
        --gradient_accumulation_steps 1 \
        --fp16 --local_rank 0 \
        --nproc_per_node 4 \
        --model_config_file ./config/config.json

Example

pretrain INTERACT model with DNAm levels for CTC sites from excitatory neuron using four GPUs

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
        --exp_name multask_CH_regression \
        --learning_rate 0.000176 \
        --batch_size 128 \
        --data_dir ./datasets/snmQTL/snmC-seq3_CTC/MulTask \
        --output_dir ./outputs/MulTask/snmC-seq3_CTC/Excitatory \
        --warmup_steps 10000 \
        --gradient_accumulation_steps 1 \
        --fp16 --local_rank 0 \
        --nproc_per_node 4 \
        --model_config_file ./config/config.json

2. Training

2.1. Finetune pre-trained INTERACT models for each cell type.

Example

finetune the INTERACT model with DNAm levels for CpG sites from L23-IT using four GPUs from the pretrained excitatory neuron model

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_methylation_regression \
        --exp_name multask_methylation_regression \
        --learning_rate 0.000176 \
        --batch_size 128 \
        --data_dir ./datasets/snmQTL/snmC-seq3_CG/MulTask/Cell_type/L23-IT \
        --output_dir ./outputs/MulTask/snmC-seq3_CG/L23-IT \
        --warmup_steps 10000 \
        --gradient_accumulation_steps 1 \
        --fp16 --local_rank 0 \
        --nproc_per_node 4 \
        --model_config_file ./config/config.json
	--from_pretrained ./outputs/MulTask/snmC-seq3_CG/Excitatory

Example

finetune the INTERACT model with DNAm levels for CAC sites from L23-IT using four GPUs from the pretrained excitatory neuron model

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
        --exp_name multask_CH_regression \
        --learning_rate 0.000176 \
        --batch_size 128 \
        --data_dir ./datasets/snmQTL/snmC-seq3_CAC/Cell_type/L23-IT \
        --output_dir ./outputs/MulTask/snmC-seq3_CAC/L23-IT \
        --warmup_steps 10000 \
        --gradient_accumulation_steps 1 \
        --fp16 --local_rank 0 \
        --nproc_per_node 4 \
        --model_config_file ./config/config.json
        --from_pretrained ./outputs/MulTask/snmC-seq3_CAC/Excitatory

Example

finetune the INTERACT model with DNAm levels for CAG sites from L23-IT using four GPUs from the pretrained excitatory neuron model

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
        --exp_name multask_CH_regression \
        --learning_rate 0.000176 \
        --batch_size 128 \
        --data_dir ./datasets/snmQTL/snmC-seq3_CAG/Cell_type/L23-IT \
        --output_dir ./outputs/MulTask/snmC-seq3_CAG/L23-IT \
        --warmup_steps 10000 \
        --gradient_accumulation_steps 1 \
        --fp16 --local_rank 0 \
        --nproc_per_node 4 \
        --model_config_file ./config/config.json
        --from_pretrained ./outputs/MulTask/snmC-seq3_CAG/Excitatory

Example

finetune the INTERACT model with DNAm levels for CAT sites from L23-IT using four GPUs from the pretrained excitatory neuron model

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
        --exp_name multask_CH_regression \
        --learning_rate 0.000176 \
        --batch_size 128 \
        --data_dir ./datasets/snmQTL/snmC-seq3_CAT/Cell_type/L23-IT \
        --output_dir ./outputs/MulTask/snmC-seq3_CAT/L23-IT \
        --warmup_steps 10000 \
        --gradient_accumulation_steps 1 \
        --fp16 --local_rank 0 \
        --nproc_per_node 4 \
        --model_config_file ./config/config.json
        --from_pretrained ./outputs/MulTask/snmC-seq3_CAT/Excitatory

Example

finetune the INTERACT model with DNAm levels for CTC sites from L23-IT using four GPUs from the pretrained excitatory neuron model

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
        --exp_name multask_CH_regression \
        --learning_rate 0.000176 \
        --batch_size 128 \
        --data_dir ./datasets/snmQTL/snmC-seq3_CTC/Cell_type/L23-IT \
        --output_dir ./outputs/MulTask/snmC-seq3_CTC/L23-IT \
        --warmup_steps 10000 \
        --gradient_accumulation_steps 1 \
        --fp16 --local_rank 0 \
        --nproc_per_node 4 \
        --model_config_file ./config/config.json
        --from_pretrained ./outputs/MulTask/snmC-seq3_CTC/Excitatory

3. Prediction

3.1 Extract DNA sequences for mutations with reference allele or variation allele

3.1.1 CpG-centered DNA sequence extraction for mutations in chromosome 1 using reference and alternative alleles in L23-IT

python run_variant.py CG L23-IT chr1

3.1.2 CAC-centered DNA sequence extraction for mutations in chromosome 1 using reference and alternative alleles in L23-IT

python run_variant.py CAC L23-IT chr1

3.1.3 CAG-centered DNA sequence extraction for mutations in chromosome 1 using reference and alternative alleles in L23-IT

python run_variant.py CAG L23-IT chr1

3.1.4 CAT-centered DNA sequence extraction for mutations in chromosome 1 using reference and alternative alleles in L23-IT

python run_variant.py CAT L23-IT chr1

3.1.5 CTC-centered DNA sequence extraction for mutations in chromosome 1 using reference and alternative alleles in L23-IT

python run_variant.py CTC L23-IT chr1

3.2 Predict DNAm levels from DNA sequences with reference allele or variation allele using one GPU.

3.2.1 Predict DNAm levels for CpG sites from DNA sequences with reference allele or variation allele using one GPU.

Example

predict DNAm levels of CpG sites in chromosome 1 from DNA sequences with reference allele for L23-IT using the finetuned INTERACT model

CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CG_mQTL_prediction \
	--exp_name CG_mQTL_prediction \
	--batch_size 2048 \
	--num_workers 2 \
	--learning_rate 0.000176 \
	--warmup_steps 20000 \
	--gradient_accumulation_steps 1 \
	--data_dir ./datasets/genome_snp/CG/L23-IT/reference \
	--output_dir ./outputs/mQTL/CG_mQTL/L23-IT/reference \
	--num_train_epochs 1 \
	--from_pretrained ./outputs/MulTask/snmC-seq3_CG/L23-IT
	--split chr1

Example

predict DNAm levels of CpG sites in chromosome 1 from DNA sequences with variation allel for L23-IT using the finetuned INTERACT model

CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CG_mQTL_prediction \
        --exp_name CG_mQTL_prediction \
        --batch_size 2048 \
        --num_workers 2 \
        --learning_rate 0.000176 \
        --warmup_steps 20000 \
        --gradient_accumulation_steps 1 \
        --data_dir ./datasets/genome_snp/CG/L23-IT/variation \
        --output_dir ./outputs/mQTL/CG_mQTL/L23-IT/variation \
        --num_train_epochs 1 \
        --from_pretrained ./outputs/MulTask/snmC-seq3_CG/L23-IT
        --split chr1

3.2.2 Predict DNAm levels for CAC sites from DNA sequences with reference allele or variation allele using one GPU.

Example

predict DNAm levels of CAC sites in chromosome 1 from DNA sequences with reference allele for L23-IT using the finetuned INTERACT model

CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
        --exp_name CH_mQTL_prediction \
        --batch_size 2048 \
        --num_workers 2 \
        --learning_rate 0.000176 \
        --warmup_steps 20000 \
        --gradient_accumulation_steps 1 \
        --data_dir ./datasets/genome_snp/plus_CAC/L23-IT/reference \
        --output_dir ./outputs/mQTL/plus_CAC_mQTL/L23-IT/reference \
        --num_train_epochs 1 \
        --from_pretrained ./outputs/MulTask/snmC-seq3_CAC/L23-IT
        --split chr1

Example

predict DNAm levels of CAC sites in chromosome 1 from DNA sequences with variation allel for L23-IT using the finetuned INTERACT model

CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
        --exp_name CH_mQTL_prediction \
        --batch_size 2048 \
        --num_workers 2 \
        --learning_rate 0.000176 \
        --warmup_steps 20000 \
        --gradient_accumulation_steps 1 \
        --data_dir ./datasets/genome_snp/plus_CAC/L23-IT/variation \
        --output_dir ./outputs/mQTL/plus_CAC_mQTL/L23-IT/variation \
        --num_train_epochs 1 \
        --from_pretrained ./outputs/MulTask/snmC-seq3_CAC/L23-IT
        --split chr1

3.2.3 Predict DNAm levels for CAG sites from DNA sequences with reference allele or variation allele using one GPU.

Example

predict DNAm levels of CAG sites in chromosome 1 from DNA sequences with reference allele for L23-IT using the finetuned INTERACT model

CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
        --exp_name CH_mQTL_prediction \
        --batch_size 2048 \
        --num_workers 2 \
        --learning_rate 0.000176 \
        --warmup_steps 20000 \
        --gradient_accumulation_steps 1 \
        --data_dir ./datasets/genome_snp/plus_CAG/L23-IT/reference \
        --output_dir ./outputs/mQTL/plus_CAG_mQTL/L23-IT/reference \
        --num_train_epochs 1 \
        --from_pretrained ./outputs/MulTask/snmC-seq3_CAG/L23-IT
        --split chr1

Example

predict DNAm levels of CAG sites in chromosome 1 from DNA sequences with variation allel for L23-IT using the finetuned INTERACT model

CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
        --exp_name CH_mQTL_prediction \
        --batch_size 2048 \
        --num_workers 2 \
        --learning_rate 0.000176 \
        --warmup_steps 20000 \
        --gradient_accumulation_steps 1 \
        --data_dir ./datasets/genome_snp/plus_CAG/L23-IT/variation \
        --output_dir ./outputs/mQTL/plus_CAG_mQTL/L23-IT/variation \
        --num_train_epochs 1 \
        --from_pretrained ./outputs/MulTask/snmC-seq3_CAG/L23-IT
        --split chr1

3.2.4 Predict DNAm levels for CAT sites from DNA sequences with reference allele or variation allele using one GPU.

Example

predict DNAm levels of CAT sites in chromosome 1 from DNA sequences with reference allele for L23-IT using the finetuned INTERACT model

CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
        --exp_name CH_mQTL_prediction \
        --batch_size 2048 \
        --num_workers 2 \
        --learning_rate 0.000176 \
        --warmup_steps 20000 \
        --gradient_accumulation_steps 1 \
        --data_dir ./datasets/genome_snp/plus_CAT/L23-IT/reference \
        --output_dir ./outputs/mQTL/plus_CAT_mQTL/L23-IT/reference \
        --num_train_epochs 1 \
        --from_pretrained ./outputs/MulTask/snmC-seq3_CAT/L23-IT
        --split chr1

Example

predict DNAm levels of CAT sites in chromosome 1 from DNA sequences with variation allel for L23-IT using the finetuned INTERACT model

CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
        --exp_name CH_mQTL_prediction \
        --batch_size 2048 \
        --num_workers 2 \
        --learning_rate 0.000176 \
        --warmup_steps 20000 \
        --gradient_accumulation_steps 1 \
        --data_dir ./datasets/genome_snp/plus_CAT/L23-IT/variation \
        --output_dir ./outputs/mQTL/plus_CAT_mQTL/L23-IT/variation \
        --num_train_epochs 1 \
        --from_pretrained ./outputs/MulTask/snmC-seq3_CAT/L23-IT
        --split chr1

3.2.5 Predict DNAm levels for CTC sites from DNA sequences with reference allele or variation allele using one GPU.

Example

predict DNAm levels of CTC sites in chromosome 1 from DNA sequences with reference allele for L23-IT using the finetuned INTERACT model

CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
        --exp_name CH_mQTL_prediction \
        --batch_size 2048 \
        --num_workers 2 \
        --learning_rate 0.000176 \
        --warmup_steps 20000 \
        --gradient_accumulation_steps 1 \
        --data_dir ./datasets/genome_snp/plus_CTC/L23-IT/reference \
        --output_dir ./outputs/mQTL/plus_CTC_mQTL/L23-IT/reference \
        --num_train_epochs 1 \
        --from_pretrained ./outputs/MulTask/snmC-seq3_CTC/L23-IT
        --split chr1

Example

predict DNAm levels of CTC sites in chromosome 1 from DNA sequences with variation allel for L23-IT using the finetuned INTERACT model

CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
        --exp_name CH_mQTL_prediction \
        --batch_size 2048 \
        --num_workers 2 \
        --learning_rate 0.000176 \
        --warmup_steps 20000 \
        --gradient_accumulation_steps 1 \
        --data_dir ./datasets/genome_snp/plus_CTC/L23-IT/variation \
        --output_dir ./outputs/mQTL/plus_CTC_mQTL/L23-IT/variation \
        --num_train_epochs 1 \
        --from_pretrained ./outputs/MulTask/snmC-seq3_CTC/L23-IT
        --split chr1

3.3 Calculate absolute difference of DNAm levels between the two DNA sequences with reference and alternative alleles.

3.3.1 Calculate absolute difference of DNAm levels for CpG sites between the two DNA sequences with reference and alternative alleles.

Example

Calculates the absolute DNAm difference for CpG sites in chromsome 1 for L23-IT

python Fine_mapping.py CG L23-IT chr1

3.3.2 Calculate absolute difference of DNAm levels for CAC sites between the two DNA sequences with reference and alternative alleles.

Example

Calculates the absolute DNAm difference for CAC sites in chromsome 1 for L23-IT

python Fine_mapping.py CAC L23-IT chr1

3.3.3 Calculate absolute difference of DNAm levels for CAG sites between the two DNA sequences with reference and alternative alleles.

Example

Calculates the absolute DNAm difference for CAG sites in chromsome 1 for L23-IT

python Fine_mapping.py CAG L23-IT chr1

3.3.4 Calculate absolute difference of DNAm levels for CAT sites between the two DNA sequences with reference and alternative alleles.

Example

Calculates the absolute DNAm difference for CAT sites in chromsome 1 for L23-IT

python Fine_mapping.py CAT L23-IT chr1

3.3.5 Calculate absolute difference of DNAm levels for CTC sites between the two DNA sequences with reference and alternative alleles.

Example

Calculates the absolute DNAm difference for CTC sites in chromsome 1 for L23-IT

python Fine_mapping.py CTC L23-IT chr1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages