Post Reply 
 
Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
String matching
10-15-2014, 02:32 PM (This post was last modified: 01-05-2018 10:54 AM by zma.)
Post: #1
String matching
This program matches a set of k strings (set A) to a set of m reference strings (set B).

You can find one copy of the implementation under apps/string-matching in DT ( http://tab.d-thinker.org/showthread.php?tid=2433 ).

#The input and output#

The strings in A and B has IDs. k is usually relatively small and m is usually relatively large.

The input is 2 files for A and B. In each file, each line represents a string.

The output is a list of pairs of <ID1, ID2>. ID1 is the ID of the string in set A, ID2 is the ID of the string in set B, and these 2 strings are the same. Each pair is in one line.

For example, the set B (example-input/reference.txt):

0 ATCG
1 ACCGGT
2 GTTTG
3 ACCCGGT
4 ACTCCC
5 ATCCG
6 GTCCG
7 TTCCG

the set A (example-input/match.txt):

0 ACG
1 GTTTG
2 TTCCG

The results will be:

1,2
2,7

For more details, please check the README file in the code tree.
Visit this user's website Find all posts by this user
Quote this message in a reply
01-04-2018, 07:51 PM
Post: #2
RE: String matching
I think we should move this thread out of this board.
Find all posts by this user
Quote this message in a reply
01-05-2018, 10:50 AM
Post: #3
RE: String matching
Moved the 'Programming'.
Visit this user's website Find all posts by this user
Quote this message in a reply
01-05-2018, 10:54 AM
Post: #4
RE: String matching
@gyz: this program is a little bit old. The building scripts do not work any more.

One issues I see is that it uses ${CC0_INC} variable which is not set by default http://tab.d-thinker.org/showthread.php?tid=2553 in default DT installation. The fix may be to use $think_base/... instead of ${CC0_INC}.

Please fix this and make this program build and run.

Please then add a test target `make test` for this program in its dir. `runncheck` program can be useful in writing the test.
Visit this user's website Find all posts by this user
Quote this message in a reply
01-05-2018, 12:31 PM
Post: #5
RE: String matching
I'm working on it.
But it seems my configuration has not done right yet. When I run "make", "Permission denied" comes out.
By the way, I cannot access the links now:
http://tab.d-thinker.org/showthread.php?...http://tab.d-thinker.org/showthread.php?tid=5678&acti
http://tab.d-thinker.org/showthread.php?...http://tab.d-thinker.org/showthread.php?tid=5686&acti
Have contacted Pingshan.
(01-05-2018 10:54 AM)zma Wrote:  @gyz: this program is a little bit old. The building scripts do not work any more.

One issues I see is that it uses ${CC0_INC} variable which is not set by default http://tab.d-thinker.org/showthread.php?tid=2553 in default DT installation. The fix may be to use $think_base/... instead of ${CC0_INC}.

Please fix this and make this program build and run.

Please then add a test target `make test` for this program in its dir. `runncheck` program can be useful in writing the test.
Find all posts by this user
Quote this message in a reply
01-05-2018, 12:44 PM
Post: #6
RE: String matching
(01-05-2018 12:31 PM)gyz Wrote:  I'm working on it.
But it seems my configuration has not done right yet. When I run "make", "Permission denied" comes out.

If you tried and can't figure out the problem, ask some other engineers for help by providing details http://tab.d-thinker.org/showthread.php?tid=8070 .
Visit this user's website Find all posts by this user
Quote this message in a reply
01-09-2018, 06:16 PM (This post was last modified: 01-10-2018 12:14 AM by gyz.)
Post: #7
RE: String matching
Goals and Designs

1. To make build.bash, the building scripts of 'string-matching' program work

Because there is $think_base set = $HOME/think after the default DT installing process, make this replace in build.bash:
Code:
((${CC0_INC}->$think_base/library/stdlib))/strsplitter.bash $file >$processed

2. To add a test target `make test` for this program in its dir, by using 'runncheck' in /thinker/bin
how to run:
Code:
make test

Similar with build.bash, we can create test.bash to be referred in 'make' cmd
Make a change in Makefile :
Code:
((->test:
        @./test.bash))

And similar with example-input, we create a dir named example-output which includes match1-output & match2-output as verifying files

test.bash could be:
Code:
make clean
make

runncheck "dt run match1" match1-output
if fail; then echo and exit 1

runncheck "dt run match2" match2-output
if fail; then echo and exit 1

echo "Passed"

3. To update some relevant info files
There are several problems in README
a. "am" in cmds should be replaced by "dt"
b.
Quote:You may use the same reference set to match other strings set, such as:
am run am-match.c0.binsm-match.c0.bin example-input/match2.txt
c. update the path
Quote:You can check the overall output under: ~/l0/l0.tmp/result-4/stdout-* . Here, the '4' in "result-4" is the number of VPCs and may be difference for a different system configuration.
d. add infos about "make test"

@zma pls review it.
Find all posts by this user
Quote this message in a reply
01-09-2018, 09:22 PM (This post was last modified: 01-09-2018 09:26 PM by YU_Xinjie.)
Post: #8
RE: String matching
(01-09-2018 06:16 PM)gyz Wrote:  Goals

1. Make building scripts of 'string-matching' program work

2. Add a test target `make test` for this program in its dir by using 'runncheck'

3. Update some relevant info files

Design

1. For making build.bash work, because there is $think_base set as $HOME/think after default DT installing process, thus we can make this change:
Code:
function compile() {
    file=$1
    bin=$2
    args=$3
    processed="strsplitted-$file"
    log="${bin}.log"
    ((${CC0_INC}->$think_base/library/stdlib))/strsplitter.bash $file >$processed
    echo "cc0 $args $processed -o $bin >$log"
    cc0 $args $processed -o $bin -g >$log
    rm -f $processed
}

2. Similar with build.bash, we can create test.bash to writing.
In the dir '$HOME/forest/d-thinker/dt/apps/string-matching', change Makefile to:
Code:
all:
        @./build.bash
clean:
        rm -f sm-load-reference.c0.bin sm-match.c0.bin
((->
test:
        @./test.bash
))

Similar with example-input, then we create a dir named example-output, there are 2 right output files as test correction: match1-output & match2-output

In addition, test.bash could be:
Code:
make clean
make
echo "Testing..."

runncheck "dt run sm-match.c0.bin ./example-input/match.txt | tail -n 4 | head -n 2" ./example-output/match1-output
if (( $? )); then
        echo $testfail
        exit 1
fi

runncheck "dt run sm-match.c0.bin ./example-input/match2.txt | tail -n 5 | head -n 3" ./example-output/match2-output
if (( $? )); then
        echo $testfail
        exit 1
fi

echo "...test passed"

how to run:
Code:
make test

3. There are several problems in README
a. "am" in commands should be replaced by "dt"
b.
Quote:You may use the same reference set to match other strings set, such as:

am run am-match.c0.binsm-match.c0.bin example-input/match2.txt
c. the path should be updated
Quote:You can check the overall output under: ~/l0/l0.tmp/result-4/stdout-* . Here, the '4' in "result-4" is the number of VPCs and may be difference for a different system configuration.
d. we can add infos about "make test"

@zma pls review it.

1. Usually we try to avoid proposing a large proposal which scare others and is hard to review and wastes others' time.
If there are a lot of changes, try your best to separate them into separated proposals.
Around 10 lines is the suitable complexity for one proposal.

2. Usually we have a driving example for each proposal, which leads us think about the design without getting lost. It is a little bit similar with your "Goal".

3. We should use pseudocode rather than real code in the design.
The former would be easier to read and avoid distracting reviewers.

You can refer to http://tab.d-thinker.org/showthread.php?...http://tab.d-thinker.org/showthread.php?tid=8781&pid=4690 for an example of a proposal.
Find all posts by this user
Quote this message in a reply
01-10-2018, 12:16 AM
Post: #9
RE: String matching
Thanks Xinjie,
Have re-edited it.
Find all posts by this user
Quote this message in a reply
01-23-2018, 11:39 AM
Post: #10
RE: String matching
(01-09-2018 06:16 PM)gyz Wrote:  Goals and Designs

1. To make build.bash, the building scripts of 'string-matching' program work

Because there is $think_base set = $HOME/think after the default DT installing process, make this replace in build.bash:
Code:
((${CC0_INC}->$think_base/library/stdlib))/strsplitter.bash $file >$processed

2. To add a test target `make test` for this program in its dir, by using 'runncheck' in /thinker/bin
how to run:
Code:
make test

Similar with build.bash, we can create test.bash to be referred in 'make' cmd
Make a change in Makefile :
Code:
((->test:
        @./test.bash))

And similar with example-input, we create a dir named example-output which includes match1-output & match2-output as verifying files

test.bash could be:
Code:
make clean
make

runncheck "dt run match1" match1-output
if fail; then echo and exit 1

runncheck "dt run match2" match2-output
if fail; then echo and exit 1

echo "Passed"

3. To update some relevant info files
There are several problems in README
a. "am" in cmds should be replaced by "dt"
b.
Quote:You may use the same reference set to match other strings set, such as:
am run am-match.c0.binsm-match.c0.bin example-input/match2.txt
c. update the path
Quote:You can check the overall output under: ~/l0/l0.tmp/result-4/stdout-* . Here, the '4' in "result-4" is the number of VPCs and may be difference for a different system configuration.
d. add infos about "make test"

@zma pls review it.

Looks good.

For `runncheck "dt run match2" match2-output`, the raw `dt run` STDOUT may need to be grepped a little bit as some runtime info are in STDOUT too and you may only keep the useful info to be compared.
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump: