Post Reply 
 
Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Build and deploy Data Thinker
02-08-2016, 11:40 AM (This post was last modified: 01-27-2017 10:35 AM by lingu.)
Post: #1
Build and deploy Data Thinker
You can set up a Data Thinker (D-thinker) system with the dt code tree. Usually, you can rely on instructions in http://d-thinker.org/doc/configuration-manual.html to build a thinker. In some cases, the instructions there can be slightly outdated. This thread provides some discussions and the latest information about building and deploying a D-thinker system,or, simply, a thinker.

The following instructions assume you can access the dt code tree using cod. If you have not been able to access it, send a request to the D-thinker team together with a public key for authentication. The request should include at least your name and your contact information. When the request is approved, we will send you a username for accessing the code tree.

A D-thinker system comprises one or many nodes, and each node can be a server, a normal PC, or a virtual machine. One node is designated as the portal of D-thinker. The portal is the computer that executes the commands to start a D-thinker program and create tasks. Other nodes run tasks created by the portal. The default setup allows you to install everything (portal programs, D-thinker components) on one node, and, consequently, the node is a portal and it runs tasks. You can change the configuration to expand the system to as many nodes as you like. The largest setting we tested has 1024 virtual machines or 160 physical servers.

1. Set up Cod so that you can obtain the dt software.

2. Get D-thinker system components.
D-thinker has a freely available Community version and it is publicly available for read-only access with git. Developers of Data Thinker can should use cod to get read/write access.

Code:
cod read dt

Suppose the ground directory (sometimes specified by the environment variable GROUND_DIR) is ~/forest, the cod command will create and store relevant files in ~/forest/d-thinker/dt. Note that we use the 'rc' branch, not the master branch, to store the up-to-date tested code.

There are three other repositories are needed, bin, dt-common, and utilib.
Code:
cod read bin
cod read dt-common
cod read utilib

3. Setting up SSH login without password to the portal
To install d-thinker easily, you need to set up SSH login to the portal without entering password, so that the installation process does not prompt you for password.

By default, the portal is 127.0.0.1 (localhost). Therefore, you need to set it up for 127.0.0.1. The instructions are here. If your portal is not 127.0.0.1, please change the destination in the steps correspondingly.



4. Build and deploy the portal: Build D-thinker software from the current tc2 code tree, and deploy the D-thinker software to a node serving as the portal..

Change the work directory to the dt code tree, e.g.,
Code:
cd ~/forest/d-thinker/dt
Perform the installation.

Code:
make install

This will build and install the d-thinker components in ~/think on the portal which is localhost (127.0.0.1). After installation, the deploy utility conducts a few tests to verify the correctness of the system.

You can fine-tune the behavior of 'make install' by specifying options.

If you would like to install D-thinker software in a specific directory other than ~/think, you can use dt/release/bin/deploy and the --base <base_dir> option to change the installation location, as shown in the following example which installs the D-thinker software in ~/voltaire on the computer with IP 10.0.1.200.
Code:
release/bin/deploy --base voltaire 10.0.1.200

5. Login to HOST and configure Data Thinker.

Code:
cd ~/think/bin


Edit set-env.sh to match the system configuration (e.g., path to the directory for d-thinker executables), which locates in ~/think/conf/.


6. Set up environment variables.

Now the environment variables are set when the thinker runs.

Code:
source ~/think/bin/set-env.sh


6. Set up the public/private key pair in your environment so that you can ssh to the VPC's hosts as user l0 without using a password. Usually you just need to add your public key to the file ~/.ssh/authorized_keys on the VPC hosts. Make sure the private key used is ~/d-thinker/conf/key on the portal and its access mode is 600. Refer to standard SSH documentation for more information on configuring password-less authentications.

As an example, the following commands configure the keys in the simplest form on a single server <VPC>.
Code:
cp ~/.ssh/id_rsa ~/thinker/conf/key
chmod 600 ~/thinker/conf/key
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

In particular systems, the commands for configuration of keys can be different than the above ones.

7. Now you can run a program in the D-thinker. For example, alphabet.bin in the directory release/app/alphabet~/think/apps/examples/alphabet.bin. You can use the following command to run the Alphabet program in the thinker:

Code:
dt run -n 1 ~/think/apps/examples/alphabet.bin

After the Alphabet program exits, it should print a list of letters in the English alphabet. This means the thinker is set up and at work. You can run other compiled programs in this thinker.

Have fun!

--------
20170107/zli: remove the session about "set up environment variables"
20170107/zli: add a session about "Setting up SSH login without password to the portal"
20170106/zli: change "cod clone" to be "cod read"
20161230/zli: remove obsolete instructions
20161227/zli: add installation process for necessary repositories
20160503/yxj: fix some obsolete names and improve readability.
Find all posts by this user
Quote this message in a reply
02-26-2016, 12:23 AM
Post: #2
RE: Build and deploy Data Thinker
Note that the portal need to be able to password-less ssh to 127.0.0.1 and it should already be done at least once (no interactive yes/no question). Otherwise, the installation may fail.
Quote this message in a reply
02-26-2016, 12:55 AM
Post: #3
RE: Build and deploy Data Thinker
At current implementation, deploy calls nconfig.sh without --portal option while nconfig.sh consider portal as 127.0.0.1 if it is not provided.

It is okay for know as portal value is only used for connection and during depoying nconfig.sh is invoked on portal.

However, this leaves a potential bug: if the portal value is recorded somewhere in the configuration file, the wrong value is used.
Quote this message in a reply
05-03-2016, 01:24 PM
Post: #4
RE: Build and deploy Data Thinker
In step 3:

Code:
xinjie@devmac0e0:~/forest/d-thinker/dt
$ make install
Thinker IPs in "ips.psudo" with portal = 127.0.0.1
release/bin/deploy --newme --ips="ips.psudo" 127.0.0.1

>> Begin building D-thinker for IPs on the portal 127.0.0.1
make[1]: Entering directory `/home/xinjie/forest/d-thinker/dt/release'
rm -f vpc mem_home scheduler nrc mem_test
make[1]: Leaving directory `/home/xinjie/forest/d-thinker/dt/release'
Built vpc
Built scheduler
Built mem_home
Built nrc
Finished building in release mode.
Built mem_test
Finished building Unit Tests.
Finished building D-thinker in directory /home/xinjie/forest/d-thinker/dt/release.
/home/xinjie/forest/d-thinker/dt/library/stdlib
In file included from ./src/libcrt.c0:1:0:
./src/malloc.c:1:20: fatal error: malloc.h: No such file or directory
#include "malloc.h"
                    ^
compilation terminated.
make[1]: *** [all] Error 1
make: *** [install] Error 251

The malloc.h is in the ./dt/library/stdlib, not in ./dt/library/stdlib/src.
Should I export a "CC0_PATH"-like environment variable ? If does, we should add this step into the above guide, or add it into a suitable set env script.
Or maybe the makefile of stdlib should add the headers path. But I am not sure whether dcc support the option -I like g++.
Find all posts by this user
Quote this message in a reply
05-03-2016, 01:38 PM
Post: #5
RE: Build and deploy Data Thinker
(05-03-2016 01:24 PM)YU_Xinjie Wrote:  In step 3:

Code:
xinjie@devmac0e0:~/forest/d-thinker/dt
$ make install
Thinker IPs in "ips.psudo" with portal = 127.0.0.1
release/bin/deploy --newme --ips="ips.psudo" 127.0.0.1

>> Begin building D-thinker for IPs on the portal 127.0.0.1
make[1]: Entering directory `/home/xinjie/forest/d-thinker/dt/release'
rm -f vpc mem_home scheduler nrc mem_test
make[1]: Leaving directory `/home/xinjie/forest/d-thinker/dt/release'
Built vpc
Built scheduler
Built mem_home
Built nrc
Finished building in release mode.
Built mem_test
Finished building Unit Tests.
Finished building D-thinker in directory /home/xinjie/forest/d-thinker/dt/release.
/home/xinjie/forest/d-thinker/dt/library/stdlib
In file included from ./src/libcrt.c0:1:0:
./src/malloc.c:1:20: fatal error: malloc.h: No such file or directory
#include "malloc.h"
                    ^
compilation terminated.
make[1]: *** [all] Error 1
make: *** [install] Error 251

The malloc.h is in the ./dt/library/stdlib, not in ./dt/library/stdlib/src.
Should I export a "CC0_PATH"-like environment variable ? If does, we should add this step into the above guide, or add it into a suitable set env script.
Or maybe the makefile of stdlib should add the headers path. But I am not sure whether dcc support the option -I like g++.

This is a bug.

Fixed in

a2065793b8517e6fbd826a9a7507452f7428bce0

The stdlib building does not require external tools/libraries as itself contains files needed.

BTW: the environment variables can are used for building other programs: http://tab.d-thinker.org/showthread.php?...http://tab.d-thinker.org/showthread.php?tid=5701&pid=1891
Quote this message in a reply
05-03-2016, 02:26 PM
Post: #6
RE: Build and deploy Data Thinker
lingu: I fix some obsolete names and improve readability of this guide. Pls check and apply them.
Find all posts by this user
Quote this message in a reply
05-03-2016, 03:34 PM
Post: #7
RE: Build and deploy Data Thinker
(05-03-2016 02:26 PM)YU_Xinjie Wrote:  lingu: I fix some obsolete names and improve readability of this guide. Pls check and apply them.

Thanks for reminding. I rectified one edit, but need to think more on others. Some parts, such as the sourcing of set-env.sh, is automatic now. We'll take this opportunity to optimize the deploy process further.
Find all posts by this user
Quote this message in a reply
05-04-2016, 01:11 PM
Post: #8
RE: Build and deploy Data Thinker
I try to use 'deploy 10.16.0.248' to update my DT on my own dev PC and a fatal error happens:
Code:
bash sources set-env.sh but no think_base is defined

Then I Can not open the gnome-terminal. after I log out I can not login again.....

I have to login with root user and enter my home dir and modify .bashrc not to exec set-env.sh. then works.
Quote this message in a reply
05-05-2016, 09:51 AM (This post was last modified: 05-05-2016 09:52 AM by YU_Xinjie.)
Post: #9
RE: Build and deploy Data Thinker
(05-04-2016 01:11 PM)xwcwt Wrote:  I try to use 'deploy 10.16.0.248' to update my DT on my own dev PC and a fatal error happens:
Code:
bash sources set-env.sh but no think_base is defined

Then I Can not open the gnome-terminal. after I log out I can not login again.....

I have to login with root user and enter my home dir and modify .bashrc not to exec set-env.sh. then works.

It seems that "source" command runs in the .bashrc's shell environment:
Code:
source filename [arguments]
              Read  and  execute  commands  from filename in the current shell environment and return the exit status of the last command executed from filename.
I guess it results in the return value of source command becoming the return value of .bashrc if the return value is not zero. It means if set-env.sh exits with non-zero, .bashrc would also exit with non-zero.

But if you run a script directly in .bashrc, the script will run its own sub-shell environment, which would not exits .bashrc by default without 'set -o errexit'.

One solution is to wrap "source set-env.sh" in .bashrc like this:
Code:
. ~/think/bin/set-env-template.sh | xargs echo
If we want it become more friendly to users, we can use another set-bash.sh to wrap above command and ask users to "source set-bash.sh" in .bashrc.
Find all posts by this user
Quote this message in a reply
05-05-2016, 10:56 AM
Post: #10
RE: Build and deploy Data Thinker
(05-05-2016 09:51 AM)YU_Xinjie Wrote:  
(05-04-2016 01:11 PM)xwcwt Wrote:  I try to use 'deploy 10.16.0.248' to update my DT on my own dev PC and a fatal error happens:
Code:
bash sources set-env.sh but no think_base is defined

Then I Can not open the gnome-terminal. after I log out I can not login again.....

I have to login with root user and enter my home dir and modify .bashrc not to exec set-env.sh. then works.

It seems that "source" command runs in the .bashrc's shell environment:
Code:
source filename [arguments]
              Read  and  execute  commands  from filename in the current shell environment and return the exit status of the last command executed from filename.
I guess it results in the return value of source command becoming the return value of .bashrc if the return value is not zero. It means if set-env.sh exits with non-zero, .bashrc would also exit with non-zero.

But if you run a script directly in .bashrc, the script will run its own sub-shell environment, which would not exits .bashrc by default without 'set -o errexit'.

One solution is to wrap "source set-env.sh" in .bashrc like this:
Code:
. ~/think/bin/set-env-template.sh | xargs echo
If we want it become more friendly to users, we can use another set-bash.sh to wrap above command and ask users to "source set-bash.sh" in .bashrc.

set-env.sh will not be sourced in user's .bashrc anymore in the latest DT.

But there are some users having that line in their .bashrc. We should at least make sure set-env.sh should not break users' work environment.

When set-env.sh is sourced from users' ~/.bashrc, the $0 is "bash".

One method is that if $0 == "bash", prints a warning message and then quits directly without doing anything. But I did not dig deeper to test whether and how it will works.

@lingu: the change which breaks Wentao's work environment seems be applied in commit 1885160f4c0369d97cb84bdb99d225ba9ca9ada7 by you. Please take a look.
Quote this message in a reply
Post Reply 


Forum Jump: