北京超算中心使用教程

超算机器使用心得

查看已加载的模块

1
2
3
4
module list

(base) [xxxx ]$ module list
No Modulefiles Currently Loaded.

查询已安装的模块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
module avail  | grep anaconda

--------------------------------------------------------- /usr/share/Modules/modulefiles ---------------------------------------------------------
dot module-git module-info modules null use.own

------------------------------------------------------------- /data/apps/modulefiles -------------------------------------------------------------
alphafold/2.0.0 gmp/6.2.1
alphafold/2.0.0_20210827 go/1.17.5
alphafold/2.0.1 gromacs/2019.4
alphafold/2.1.1 gromacs/2020.5-plumed2
alphafold/2.2.0 gromacs/2020.6
alphafold/ParallelFold-2.0.1 gromacs/2021.1
amber/Amber21_openmpi gromacs/2021.2
anaconda/2020.11 gromacs/2021.2-Parallel

加载anaconda模块

1
module load anaconda/2020.11

利用加载的anaconda 创建环境

1
conda create -n py37 python=3.7

加载cuda 模块

1
module load cuda/9.2

进入新建的环境

1
conda activate py37

安装 python 库依赖

1
2
3
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=9.2 -c pytorch -y
conda install scikit-learn -y
conda install pandas

调试python 脚本

1
python xxx.py

编写运行脚本

1
2
3
4
5
6
#!/bin/bash

module load anaconda/2020.11 && \
module load cuda/9.2 && \
source activate py37 && \
python xxx.py > run.log

提交到计算节点

1
2
rm -rf slurm-*.out && \
sbatch --gpus=1 ./submmitNormal.sh

查看任务运行情况

1
2
3
4
5
6
parajobs
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
176118 gpu submmitN scv4458 R 1:42:56 1 g0006
176118 作业GPU利用率为:
index, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB],memory.used
7, 13 %, 2 %, 32510 MiB, 32128 MiB, 382 MiB

查看运行日志

1
tail -100f slurm-xxx.out

取消任务

1
scancel [jobid]