Post Reply 
 
Thread Rating:
  • 2 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
GLAD common variables and default directory structure
02-05-2016, 11:14 AM (This post was last modified: 07-19-2018 11:19 AM by rayluk.)
Post: #1
GLAD common variables and default directory structure
The GLAD uses/export some variables for various users. Under default configuration/values, GLAD is organized into a default directory structure. This post summarize these info.

Common variables

Unless otherwise specified, variables here mentioned are glad configurable variables whose values can be changed in the glad config system but should not be changed in the execution environment as environment variables.
The GLAD users can configure them using the GLAD configuration system, especially using the $glad_pipeline_config way.

Variables and their default values:

Both installation and runtime programs need to understand these variables
Quote:$glad_user: username of the glad user. Default: 'gene'
$glad_root: root of the glad installation, containing symbolic links to DFS, installed GLAD tools and etc. Default: ~$glad_user/glad
$glad_base: The base directory where GLAD is installed to. Default: $glad_root/store/$glad_user/glad/
$glad_portal: IP of the GLAD portal. The portal of GLAD is the node on which users operate the GLAD cluster. Default: the ip in $glad_thinker/conf/scheduler
$sage_user: username of the Sage user. Installed Sage under this user. Default: 'sage'. It is not supported to change this var in the current implementation.
$sage_portal: IP of Sage portal. It would be set in config.sh when installing Sage.
$dstore_mp: The mount point of D-store. Default: '/thinker/dstore'.

Exported environment variables on the portal for a GLAD.
Quote:$glad_base
$glad_root
$gr: alias to $glad_root. Read-only, should be the same as $glad_root.
$gb: alias to $glad_base. Read-only, should be the same as $glad_base.

Variables used only during runtime by glad programs
Quote:$glad_result_dir: result directory for storing result file. Default: $glad_base/results/$USER/
$glad_share_dir: a shared directory for data and config files used by programs. Default: $glad_base/share/$USER
$glad_keep_work_files: If it is set to true, do not delete intermediate files.
$glad_size: the concurrency level. Technically, it equals the number of containers used to execute tasks in Data Thinker.
$glad_work_dir: the directory for storing intermediate data on local node only. Default: $glad_root/fastdata/$USER
$glad_efs_root: the root directory of EFS. Its fixed values is /thinker/bin/ephemeral/.
$glad_efs_dir: the users' directory of EFS for storing fast intermediate data on local node. Its fixed values is $glad_efs_root/$USER.
$glad_normal_threads: number of threads used in external programs, such as bwa and GATK, if they support such an option. Default: 16
$human_genome: the default reference genome database. Default: /thinker/dstore/world/gene/sapiens/reference/hututa/glad-ref-r1/v2/human_g1k_v37.fa
$database_snp: the default SNP database. Default: /thinker/dstore/world/gene/sapiens/reference/hututa/glad-ref-r1/v2/dbsnp_137.vcf
$glad_prog_bwa: the BWA program used by `glad bwamem` command. Default: $dfs_mp/world/software/bwa/run/bwa-0.7.13
$glad_bwa_cmd: the command for running bwa
$glad_gatk_cmd: the command for running gatk
$glad_java_jar_cmd: the command for running a jar package with java
$glad_prog_picard: the jar file of the picard tool
$glad_prog_blast{x,n,p}: the BLAST tools.

$optlevel: controls the optimization level. By default, optlevel=0. If optlevel is 0 or empty, no special optimization will be done. If optlevel is non-zero, some optimizations may be enabled if optlevel is high enough.

sage_atp_staff_space_start: The space of the first staff ATP container of Sage.
sage_atp_staff_space_cnt: The number of spaces of the staff ATP containers of Sage.
sage_atp_head_space: The space of the head ATP container of Sage.

Variables used only during installation
Please check http://tab.d-thinker.org/showthread.php?tid=5508

Experimental or internal variables
These variables are in the experimental stage or for internal use, and they may be removed in the future. Use them with caution.
$glad_data_pmax: data-parallel maximum - indicating the maximum data parallelism. Typically the task count is equal to or less than this value.
glad_mode: glad work mode. The values can be "confident" or "deterministic". Default: confident.
glad_bingo_set_done_file: If True, `glad bingo` would create a empty <result>.done file for result <result>. Default: False.
glad_sage_space_start: Sage ATP start space number
glad_sage_space_cnt: Sage ATP space count


Default directory structure

Top directory layout:
Quote:Under $glad_root/:
-- store: symbolic link to $dfs_mp
-- fastdata: symbolic link to $glad_work_dir

Organization of GLAD files, the files installed into $glad_base, in the deployed cluster:

Quote:$glad_base: where glad files will be installed to
-- bin/ # scripts, binary executables; previously stored under PCAWG2-Tools/
-- conf/ # configuration files; previously under NGS_Tools_Conf/
-- results/ # result files directory
-- examples/ # examples; previously under glad-example/
-- NGS_Tools # 3rd party tools only

Common directories for accessing/storing files:
Quote:-- $dstore_mp/r3data/ # if $dfs=dstore, this dir stores files with long-time value; stored with 3 copies on disks; reliable and high-level fault-tolerant.
-- $dstore_mp/r1data/ # if $dfs=dstore, this dir stores files users are working on; stored 1 copy on disks; disk usage wise.
-- $dstore_mp/world/ # store shared files from various sources, such as reference genome; stored with 2 copies; disk usage and fault tolerance balanced.

Directories of result files:
One pipeline generates one or a set of directories (mentioned as the pipeline's result files) in the $glad_result_dir after execution. If 2 pipelines use the same directorie(s) under $glad_result_dir, later storing to a file will overwrite previously generate ones. The users/pipeline writers should be aware of and handle the situation.

---
20161107/gl: Add glad_mode
20161021/yxj: Add notation section
20161005/gl: Add glad sage space vars
20161004/gl: Add glad_share_dir
20160915/zma: Updated main post to reflect recent changes.
20160727/gl: Add experimental variables
20160725/gl: Add $glad_size
20160717/gl: Add $glad_prog_picard, etc.
20160708/gl: Add $glad_portal
20160321/gl: Make some variable names bold.
20160320/gl: Make glad files bold.
20160320/gl: Add optlevel and glad_keep_tmp_files.
20160630/yxj: Add more configurable variables.
Quote this message in a reply
03-03-2016, 02:18 PM
Post: #2
RE: GLAD common variables and default directory structure
add @zma to cc list.
Quote this message in a reply
03-03-2016, 02:22 PM
Post: #3
RE: GLAD common variables and default directory structure
I am wonder need I to export these vars by writing them into such as /etc/profile, or just GLAD will set this, or need I to set one var according the requirement when installing GLAD one by one(such as create a user names 'gene' according instructions)? could @zma clarifies this?
Quote this message in a reply
03-03-2016, 04:40 PM
Post: #4
RE: GLAD common variables and default directory structure
Glad can be installed to be shared by various users. The method is clear now.

I meed some time to integrate it into glad.

Current glad impl installs glad into one user (eg. gene). No need to set additional priles in /etc/profile/ etc.
Quote this message in a reply
03-08-2016, 01:42 PM
Post: #5
RE: GLAD common variables and default directory structure
In this the $glad_root is ~$glad_user/glad, when I try to run glad-install-ref.sh, I find the $glad_root is /mnt, the script locates /home/gene/glad/store/gene/glad/bin where the dir 'store' links to /mnt/dstore according above requirement, So the worldrefdb path becomes '/mnt/store/world/gene/sapiens/reference/hututa/glad-ref-r1/v1/Ref.tgz', it is wrong, Is there anythinkg I miss?

@zma, please take a look. (I think maybe you need take a look the way to gain $glad_root on glad-common.sh).
Quote this message in a reply
03-08-2016, 03:12 PM
Post: #6
RE: GLAD common variables and default directory structure
(03-08-2016 01:42 PM)xwcwt Wrote:  the worldrefdb path becomes '/mnt/store/world/gene/sapiens/reference/hututa/glad-ref-r1/v1/Ref.tgz', it is wrong, Is there anythinkg I miss?

There is a bug in glad-install-ref.sh . It dereferenced the symbolic before sourcing glad-common.sh .

Fixed in glad. Please try again.
Quote this message in a reply
03-09-2016, 12:33 AM
Post: #7
RE: GLAD common variables and default directory structure
Some variables contain dirnames. We must make it clear which dirs are in dfs, which are not, which may or may not be in dfs. If a dir is not necessarily in dfs, we can't write content once and assume the content is visible to all tasks. This has ambiguity has introduced a lot of problems in the split execution of bwamem.
Find all posts by this user
Quote this message in a reply
03-09-2016, 04:30 PM
Post: #8
RE: GLAD common variables and default directory structure
Let's also record the relevant envars here? for example, glad_keep_tmp_files and optlevel.
Find all posts by this user
Quote this message in a reply
03-10-2016, 10:44 AM
Post: #9
RE: GLAD common variables and default directory structure
(03-09-2016 04:30 PM)lingu Wrote:  Let's also record the relevant envars here? for example, glad_keep_tmp_files and optlevel.

Sounds good.
Quote this message in a reply
03-14-2016, 05:44 PM
Post: #10
RE: GLAD common variables and default directory structure
Changes for multi-glad installation: $glad_base: alias to $glad_root/store/gene/glad/ $glad_root/store/$glad_user/glad
Quote this message in a reply
Post Reply 


  • View a Printable Version
  • Send this Thread to a Friend
  • Subscribe to this thread
  • Show the subscribers of this thread:
  • Add subscribers to this thread:
Forum Jump: