SGE(qsub/qstat/qdel/qhost|SGE:qsub/qstat/qdel/qhost 任务投递和监控)

参考:
Oracle Grid Engine
qsub命令
SGE - qsub使用范例
SGE作业基本用法
qsub是最为稳定的底层任务投递系统,就是把一个脚本投递到集群的计算节点上运行。
注意,只有登录节点才有资格投递任务,计算节点没有权限投递任务,只能执行,所以千万不要在投递的脚本内嵌套投递,会报错的。
下面是我最为常用的投递命令:

qsub -cwd -l vf=5g -P 任务单元 -q 队列名

先逐条解释:
-cwd: 就是 current working directory,从当前的目录开始执行作业,也就是log文件会写到当前目录;如果不加cwd的话,就会默认输出到用户的 home 目录。如果你想指定输出目录的话,就可以使用wd命令,log会输出到你指定的目录。
-l:resource=value, 表明作业运行所需要的资源。可以看到我们后面指定了预估内存 vf=5g,一般不用指定 CPU 数。注意,实际这个没什么卵用,很少有集群能严格限制用户的内存使用,vf 只会影响你投递的效率,有人就会钻空子,尽量把内存往低了投,尽快排上。这一部分其实就是个道德约束。
-P:大型组织里会分团队,分项目,不同的项目需要制定项目名,主要是为了后期方便统计计算资源的消耗,算钱,其实这个命令没卵用。
-q:指定队列名,这个就非常重要了,队列就是计算机的队列,一个队列只有一些特定的计算节点,你投了哪个节点,你就只能用该节点指定的计算资源。
【SGE(qsub/qstat/qdel/qhost|SGE:qsub/qstat/qdel/qhost 任务投递和监控)】待续~
qsub -help

OGS/GE 2011.11p1 usage: qsub [options] [-a date_time]request a start time [-ac context_list]add context variable(s) [-ar ar_id]bind job to advance reservation [-A account_string]account string in accounting record [-b y[es]|n[o]]handle command as binary [-binding [env|pe|set] exp|lin|str]binds job to processor cores [-c ckpt_selector]define type of checkpointing for job [-ckpt ckpt-name]request checkpoint method [-clear]skip previous definitions for job [-cwd]use current working directory [-C directive_prefix]define command prefix for job script [-dc simple_context_list]delete context variable(s) [-dl date_time]request a deadline initiation time [-e path_list]specify standard error stream path(s) [-h]place user hold on job [-hard]consider following requests "hard" [-help]print this help [-hold_jid job_identifier_list]define jobnet interdependencies [-hold_jid_ad job_identifier_list]define jobnet array interdependencies [-i file_list]specify standard input stream file(s) [-j y[es]|n[o]]merge stdout and stderr stream of job [-js job_share]share tree or functional job share [-jsv jsv_url]job submission verification script to be used [-l resource_list]request the given resources [-m mail_options]define mail notification events [-masterq wc_queue_list]bind master task to queue(s) [-notify]notify job before killing/suspending it [-now y[es]|n[o]]start job immediately or not at all [-M mail_list]notify these e-mail addresses [-N name]specify job name [-o path_list]specify standard output stream path(s) [-P project_name]set job's project [-p priority]define job's relative priority [-pe pe-name slot_range]request slot range for parallel jobs [-q wc_queue_list]bind job to queue(s) [-R y[es]|n[o]]reservation desired [-r y[es]|n[o]]define job as (not) restartable [-sc context_list]set job context (replaces old context) [-shell y[es]|n[o]]start command with or without wrapping -c [-soft]consider following requests as soft [-sync y[es]|n[o]]wait for job to end and return exit code [-S path_list]command interpreter to be used [-t task_id_range]create a job-array with these tasks [-tc max_running_tasks]throttle the number of concurrent tasks (experimental) [-terse]tersed output, print only the job-id [-v variable_list]export these environment variables [-verify]do not submit just verify [-V]export all environment variables [-w e|w|n|v|p]verify mode (error|warning|none|just verify|poke) for jobs [-wd working_directory]use working_directory [-@ file]read commandline input from file [{command|-} [command_args]]account_stringaccount_name complex_listcomplex[,complex,...] context_listvariable[=value][,variable[=value],...] ckpt_selector`n' `s' `m' `x' date_time[[CC]YY]MMDDhhmm[.SS] job_identifier_list{job_id|job_name|reg_exp}[,{job_id|job_name|reg_exp},...] jsv_url[script:][username@]path mail_addressusername[@host] mail_listmail_address[,mail_address,...] mail_options`e' `b' `a' `n' `s' working_directorypath path_list[host:]path[,[host:]path,...] file_list[host:]file[,[host:]file,...] priority-1023 - 1024 resource_listresource[=value][,resource[=value],...] simple_context_listvariable[,variable,...] slot_range[n[-m]|[-]m] - n,m > 0 task_id_rangetask_id['-'task_id[':'step]] variable_listvariable[=value][,variable[=value],...] wc_cqueuewildcard expression matching a cluster queue wc_hostwildcard expression matching a host wc_hostgroupwildcard expression matching a hostgroup wc_qinstancewc_cqueue@wc_host wc_qdomainwc_cqueue@wc_hostgroup wc_queuewc_cqueue|wc_qdomain|wc_qinstance wc_queue_listwc_queue[,wc_queue,...] ar_idadvance reservation id max_running_tasksmaximum number of simultaneously running tasks expexplicit:,[:...] linlinear:[:,] strstriding::[:,]

    推荐阅读