This repository contains codes to predict DNA methylation regulatory variants in specific brain cell types.
INTERACT package requires only a standard computer with GPUs and enough RAM to support the in-memory operations.
This package is supported by Linux. The package has been tested on Rocky Linux 9.2.
INTERACT mainly depends on the following Python packages.
PyTorch
apex
numpy
scipy
scikit-learn
pandas
loompy
json
h5py
Construct DNAm levels and 2kb CpG-centered DNA sequences for CpG sites in chromosome 1 for L23-IT.
$python run_CG_methyl.py L23-IT chr1
Construct DNAm levels and 2kb CAC-centered DNA seqeunces for CAC sites in chromosome 1 for L23-IT.
$python run_CAC_methyl.py L23-IT chr1
Construct DNAm levels and 2kb CAC-centered DNA seqeunces for CAG sites in chromosome 1 for L23-IT.
$python run_CAG_methyl.py L23-IT chr1
Construct DNAm levels and 2kb CAT-centered DNA seqeunces for CAT sites in chromosome 1 for L23-IT.
$python run_CAT_methyl.py L23-IT chr1
Construct DNAm levels and 2kb CTC-centered DNA seqeunces for CTC sites in chromosome 1 for L23-IT.
$python run_CTC_methyl.py L23-IT chr1
1.2 Construct DNAm levels for CpG and CH sites for excitatory neuronal, inhibitory neuronal or glial cells by merging their corresponding cell types.
Merge DNAm levels and 2kb CpG-centered DNA sequences for CpG sites in chromosome 1 for excitatory neuronal cells.
$python run_methylation.py CG chr1
Merge DNAm levels and 2kb CAC-centered DNA sequences for CAC sites in chromosome 1 for excitatory neuronal cells.
$python run_methylation.py CAC chr1
Merge DNAm levels and 2kb CAG-centered DNA sequences for CAG sites in chromosome 1 for excitatory neuronal cells.
$python run_methylation.py CAG chr1
Merge DNAm levels and 2kb CAT-centered DNA sequences for CAT sites in chromosome 1 for excitatory neuronal cells.
$python run_methylation.py CAT chr1
Merge DNAm levels and 2kb CTC-centered DNA sequences for CTC sites in chromosome 1 for excitatory neuronal cells.
$python run_methylation.py CTC chr1
pretrain INTERACT model with DNAm levels for CpG sites from excitatory neuron using four GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_methylation_regression \
--exp_name multask_methylation_regression \
--learning_rate 0.000176 \
--batch_size 128 \
--data_dir ./datasets/snmQTL/snmC-seq3_CG/MulTask/Reference \
--output_dir ./outputs/MulTask/snmC-seq3_CG/Excitatory \
--warmup_steps 10000 \
--gradient_accumulation_steps 1 \
--fp16 --local_rank 0 \
--nproc_per_node 4 \
--model_config_file ./config/config.jsonpretrain INTERACT model with DNAm levels for CAC sites from excitatory neuron using four GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
--exp_name multask_CH_regression \
--learning_rate 0.000176 \
--batch_size 128 \
--data_dir ./datasets/snmQTL/snmC-seq3_CAC/MulTask \
--output_dir ./outputs/MulTask/snmC-seq3_CAC/Excitatory \
--warmup_steps 10000 \
--gradient_accumulation_steps 1 \
--fp16 --local_rank 0 \
--nproc_per_node 4 \
--model_config_file ./config/config.jsonpretrain INTERACT model with DNAm levels for CAG sites from excitatory neuron using four GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
--exp_name multask_CH_regression \
--learning_rate 0.000176 \
--batch_size 128 \
--data_dir ./datasets/snmQTL/snmC-seq3_CAG/MulTask \
--output_dir ./outputs/MulTask/snmC-seq3_CAG/Excitatory \
--warmup_steps 10000 \
--gradient_accumulation_steps 1 \
--fp16 --local_rank 0 \
--nproc_per_node 4 \
--model_config_file ./config/config.jsonpretrain INTERACT model with DNAm levels for CAT sites from excitatory neuron using four GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
--exp_name multask_CH_regression \
--learning_rate 0.000176 \
--batch_size 128 \
--data_dir ./datasets/snmQTL/snmC-seq3_CAT/MulTask \
--output_dir ./outputs/MulTask/snmC-seq3_CAT/Excitatory \
--warmup_steps 10000 \
--gradient_accumulation_steps 1 \
--fp16 --local_rank 0 \
--nproc_per_node 4 \
--model_config_file ./config/config.jsonpretrain INTERACT model with DNAm levels for CTC sites from excitatory neuron using four GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
--exp_name multask_CH_regression \
--learning_rate 0.000176 \
--batch_size 128 \
--data_dir ./datasets/snmQTL/snmC-seq3_CTC/MulTask \
--output_dir ./outputs/MulTask/snmC-seq3_CTC/Excitatory \
--warmup_steps 10000 \
--gradient_accumulation_steps 1 \
--fp16 --local_rank 0 \
--nproc_per_node 4 \
--model_config_file ./config/config.jsonfinetune the INTERACT model with DNAm levels for CpG sites from L23-IT using four GPUs from the pretrained excitatory neuron model
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_methylation_regression \
--exp_name multask_methylation_regression \
--learning_rate 0.000176 \
--batch_size 128 \
--data_dir ./datasets/snmQTL/snmC-seq3_CG/MulTask/Cell_type/L23-IT \
--output_dir ./outputs/MulTask/snmC-seq3_CG/L23-IT \
--warmup_steps 10000 \
--gradient_accumulation_steps 1 \
--fp16 --local_rank 0 \
--nproc_per_node 4 \
--model_config_file ./config/config.json
--from_pretrained ./outputs/MulTask/snmC-seq3_CG/Excitatoryfinetune the INTERACT model with DNAm levels for CAC sites from L23-IT using four GPUs from the pretrained excitatory neuron model
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
--exp_name multask_CH_regression \
--learning_rate 0.000176 \
--batch_size 128 \
--data_dir ./datasets/snmQTL/snmC-seq3_CAC/Cell_type/L23-IT \
--output_dir ./outputs/MulTask/snmC-seq3_CAC/L23-IT \
--warmup_steps 10000 \
--gradient_accumulation_steps 1 \
--fp16 --local_rank 0 \
--nproc_per_node 4 \
--model_config_file ./config/config.json
--from_pretrained ./outputs/MulTask/snmC-seq3_CAC/Excitatoryfinetune the INTERACT model with DNAm levels for CAG sites from L23-IT using four GPUs from the pretrained excitatory neuron model
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
--exp_name multask_CH_regression \
--learning_rate 0.000176 \
--batch_size 128 \
--data_dir ./datasets/snmQTL/snmC-seq3_CAG/Cell_type/L23-IT \
--output_dir ./outputs/MulTask/snmC-seq3_CAG/L23-IT \
--warmup_steps 10000 \
--gradient_accumulation_steps 1 \
--fp16 --local_rank 0 \
--nproc_per_node 4 \
--model_config_file ./config/config.json
--from_pretrained ./outputs/MulTask/snmC-seq3_CAG/Excitatoryfinetune the INTERACT model with DNAm levels for CAT sites from L23-IT using four GPUs from the pretrained excitatory neuron model
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
--exp_name multask_CH_regression \
--learning_rate 0.000176 \
--batch_size 128 \
--data_dir ./datasets/snmQTL/snmC-seq3_CAT/Cell_type/L23-IT \
--output_dir ./outputs/MulTask/snmC-seq3_CAT/L23-IT \
--warmup_steps 10000 \
--gradient_accumulation_steps 1 \
--fp16 --local_rank 0 \
--nproc_per_node 4 \
--model_config_file ./config/config.json
--from_pretrained ./outputs/MulTask/snmC-seq3_CAT/Excitatoryfinetune the INTERACT model with DNAm levels for CTC sites from L23-IT using four GPUs from the pretrained excitatory neuron model
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch main.py transformer multask_CH_regression \
--exp_name multask_CH_regression \
--learning_rate 0.000176 \
--batch_size 128 \
--data_dir ./datasets/snmQTL/snmC-seq3_CTC/Cell_type/L23-IT \
--output_dir ./outputs/MulTask/snmC-seq3_CTC/L23-IT \
--warmup_steps 10000 \
--gradient_accumulation_steps 1 \
--fp16 --local_rank 0 \
--nproc_per_node 4 \
--model_config_file ./config/config.json
--from_pretrained ./outputs/MulTask/snmC-seq3_CTC/Excitatory3.1.1 CpG-centered DNA sequence extraction for mutations in chromosome 1 using reference and alternative alleles in L23-IT
python run_variant.py CG L23-IT chr13.1.2 CAC-centered DNA sequence extraction for mutations in chromosome 1 using reference and alternative alleles in L23-IT
python run_variant.py CAC L23-IT chr13.1.3 CAG-centered DNA sequence extraction for mutations in chromosome 1 using reference and alternative alleles in L23-IT
python run_variant.py CAG L23-IT chr13.1.4 CAT-centered DNA sequence extraction for mutations in chromosome 1 using reference and alternative alleles in L23-IT
python run_variant.py CAT L23-IT chr13.1.5 CTC-centered DNA sequence extraction for mutations in chromosome 1 using reference and alternative alleles in L23-IT
python run_variant.py CTC L23-IT chr13.2.1 Predict DNAm levels for CpG sites from DNA sequences with reference allele or variation allele using one GPU.
predict DNAm levels of CpG sites in chromosome 1 from DNA sequences with reference allele for L23-IT using the finetuned INTERACT model
CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CG_mQTL_prediction \
--exp_name CG_mQTL_prediction \
--batch_size 2048 \
--num_workers 2 \
--learning_rate 0.000176 \
--warmup_steps 20000 \
--gradient_accumulation_steps 1 \
--data_dir ./datasets/genome_snp/CG/L23-IT/reference \
--output_dir ./outputs/mQTL/CG_mQTL/L23-IT/reference \
--num_train_epochs 1 \
--from_pretrained ./outputs/MulTask/snmC-seq3_CG/L23-IT
--split chr1predict DNAm levels of CpG sites in chromosome 1 from DNA sequences with variation allel for L23-IT using the finetuned INTERACT model
CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CG_mQTL_prediction \
--exp_name CG_mQTL_prediction \
--batch_size 2048 \
--num_workers 2 \
--learning_rate 0.000176 \
--warmup_steps 20000 \
--gradient_accumulation_steps 1 \
--data_dir ./datasets/genome_snp/CG/L23-IT/variation \
--output_dir ./outputs/mQTL/CG_mQTL/L23-IT/variation \
--num_train_epochs 1 \
--from_pretrained ./outputs/MulTask/snmC-seq3_CG/L23-IT
--split chr13.2.2 Predict DNAm levels for CAC sites from DNA sequences with reference allele or variation allele using one GPU.
predict DNAm levels of CAC sites in chromosome 1 from DNA sequences with reference allele for L23-IT using the finetuned INTERACT model
CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
--exp_name CH_mQTL_prediction \
--batch_size 2048 \
--num_workers 2 \
--learning_rate 0.000176 \
--warmup_steps 20000 \
--gradient_accumulation_steps 1 \
--data_dir ./datasets/genome_snp/plus_CAC/L23-IT/reference \
--output_dir ./outputs/mQTL/plus_CAC_mQTL/L23-IT/reference \
--num_train_epochs 1 \
--from_pretrained ./outputs/MulTask/snmC-seq3_CAC/L23-IT
--split chr1predict DNAm levels of CAC sites in chromosome 1 from DNA sequences with variation allel for L23-IT using the finetuned INTERACT model
CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
--exp_name CH_mQTL_prediction \
--batch_size 2048 \
--num_workers 2 \
--learning_rate 0.000176 \
--warmup_steps 20000 \
--gradient_accumulation_steps 1 \
--data_dir ./datasets/genome_snp/plus_CAC/L23-IT/variation \
--output_dir ./outputs/mQTL/plus_CAC_mQTL/L23-IT/variation \
--num_train_epochs 1 \
--from_pretrained ./outputs/MulTask/snmC-seq3_CAC/L23-IT
--split chr13.2.3 Predict DNAm levels for CAG sites from DNA sequences with reference allele or variation allele using one GPU.
predict DNAm levels of CAG sites in chromosome 1 from DNA sequences with reference allele for L23-IT using the finetuned INTERACT model
CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
--exp_name CH_mQTL_prediction \
--batch_size 2048 \
--num_workers 2 \
--learning_rate 0.000176 \
--warmup_steps 20000 \
--gradient_accumulation_steps 1 \
--data_dir ./datasets/genome_snp/plus_CAG/L23-IT/reference \
--output_dir ./outputs/mQTL/plus_CAG_mQTL/L23-IT/reference \
--num_train_epochs 1 \
--from_pretrained ./outputs/MulTask/snmC-seq3_CAG/L23-IT
--split chr1predict DNAm levels of CAG sites in chromosome 1 from DNA sequences with variation allel for L23-IT using the finetuned INTERACT model
CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
--exp_name CH_mQTL_prediction \
--batch_size 2048 \
--num_workers 2 \
--learning_rate 0.000176 \
--warmup_steps 20000 \
--gradient_accumulation_steps 1 \
--data_dir ./datasets/genome_snp/plus_CAG/L23-IT/variation \
--output_dir ./outputs/mQTL/plus_CAG_mQTL/L23-IT/variation \
--num_train_epochs 1 \
--from_pretrained ./outputs/MulTask/snmC-seq3_CAG/L23-IT
--split chr13.2.4 Predict DNAm levels for CAT sites from DNA sequences with reference allele or variation allele using one GPU.
predict DNAm levels of CAT sites in chromosome 1 from DNA sequences with reference allele for L23-IT using the finetuned INTERACT model
CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
--exp_name CH_mQTL_prediction \
--batch_size 2048 \
--num_workers 2 \
--learning_rate 0.000176 \
--warmup_steps 20000 \
--gradient_accumulation_steps 1 \
--data_dir ./datasets/genome_snp/plus_CAT/L23-IT/reference \
--output_dir ./outputs/mQTL/plus_CAT_mQTL/L23-IT/reference \
--num_train_epochs 1 \
--from_pretrained ./outputs/MulTask/snmC-seq3_CAT/L23-IT
--split chr1predict DNAm levels of CAT sites in chromosome 1 from DNA sequences with variation allel for L23-IT using the finetuned INTERACT model
CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
--exp_name CH_mQTL_prediction \
--batch_size 2048 \
--num_workers 2 \
--learning_rate 0.000176 \
--warmup_steps 20000 \
--gradient_accumulation_steps 1 \
--data_dir ./datasets/genome_snp/plus_CAT/L23-IT/variation \
--output_dir ./outputs/mQTL/plus_CAT_mQTL/L23-IT/variation \
--num_train_epochs 1 \
--from_pretrained ./outputs/MulTask/snmC-seq3_CAT/L23-IT
--split chr13.2.5 Predict DNAm levels for CTC sites from DNA sequences with reference allele or variation allele using one GPU.
predict DNAm levels of CTC sites in chromosome 1 from DNA sequences with reference allele for L23-IT using the finetuned INTERACT model
CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
--exp_name CH_mQTL_prediction \
--batch_size 2048 \
--num_workers 2 \
--learning_rate 0.000176 \
--warmup_steps 20000 \
--gradient_accumulation_steps 1 \
--data_dir ./datasets/genome_snp/plus_CTC/L23-IT/reference \
--output_dir ./outputs/mQTL/plus_CTC_mQTL/L23-IT/reference \
--num_train_epochs 1 \
--from_pretrained ./outputs/MulTask/snmC-seq3_CTC/L23-IT
--split chr1predict DNAm levels of CTC sites in chromosome 1 from DNA sequences with variation allel for L23-IT using the finetuned INTERACT model
CUDA_VISIBLE_DEVICES=0 python3 main.py transformer CH_mQTL_prediction \
--exp_name CH_mQTL_prediction \
--batch_size 2048 \
--num_workers 2 \
--learning_rate 0.000176 \
--warmup_steps 20000 \
--gradient_accumulation_steps 1 \
--data_dir ./datasets/genome_snp/plus_CTC/L23-IT/variation \
--output_dir ./outputs/mQTL/plus_CTC_mQTL/L23-IT/variation \
--num_train_epochs 1 \
--from_pretrained ./outputs/MulTask/snmC-seq3_CTC/L23-IT
--split chr13.3 Calculate absolute difference of DNAm levels between the two DNA sequences with reference and alternative alleles.
3.3.1 Calculate absolute difference of DNAm levels for CpG sites between the two DNA sequences with reference and alternative alleles.
Calculates the absolute DNAm difference for CpG sites in chromsome 1 for L23-IT
python Fine_mapping.py CG L23-IT chr13.3.2 Calculate absolute difference of DNAm levels for CAC sites between the two DNA sequences with reference and alternative alleles.
Calculates the absolute DNAm difference for CAC sites in chromsome 1 for L23-IT
python Fine_mapping.py CAC L23-IT chr13.3.3 Calculate absolute difference of DNAm levels for CAG sites between the two DNA sequences with reference and alternative alleles.
Calculates the absolute DNAm difference for CAG sites in chromsome 1 for L23-IT
python Fine_mapping.py CAG L23-IT chr13.3.4 Calculate absolute difference of DNAm levels for CAT sites between the two DNA sequences with reference and alternative alleles.
Calculates the absolute DNAm difference for CAT sites in chromsome 1 for L23-IT
python Fine_mapping.py CAT L23-IT chr13.3.5 Calculate absolute difference of DNAm levels for CTC sites between the two DNA sequences with reference and alternative alleles.
Calculates the absolute DNAm difference for CTC sites in chromsome 1 for L23-IT
python Fine_mapping.py CTC L23-IT chr1