Artifacts for "Continuous Model Validation Using Reference Attribute Grammars"
==============================================================================

### Authors

-   Johannes Mey <johannes.mey@tu-dresden.de>
-   Carl Mai <carl.mai@tu-dresden.de>
-   René Schöne <rene.schoene@tu-dresden.de>
-   Görel Hedin <gorel.hedin@cs.lth.se>
-   Emma Söderberg <emma.soderberg@cs.lth.se>
-   Thomas Kühn <thomas.kuehn3@tu-dresden.de>
-   Niklas Fors <niklas.fors@cs.lth.se>
-   Jesper Öqvist <jesper.oqvist@cs.lth.se>
-   Uwe Aßmann <uwe.assmann@tu-dresden.de>

### Introduction

The paper discusses the utilization of reference attribute grammars
(RAGs) for model validation and presents two specific contributions.
First, the differences between models and trees specified by reference
attribute grammars, specifically non-containment references, are
discussed and a manual, yet optimised method to efficiently overcome
these differences is presented. Secondly, an extension of RAG grammar
specifications is proposed to model non-containment references
automatically. The proposed modelling techniques are compared to
state-of-the-art modelling tools utilizing a benchmarking framework for
continuous model validation, the *Train Benchmark*.

### Structure of the Supplementary Artifacts

The artifacts are structured in four parts:

-   A standalone example of the non-containment references preprocessor
    (relational-rags-0.2.3.zip)
-   Benchmark code to reproduce the measurements, including all relevant
    source codes
    -   as a zip file (ModelValidationWithRAGs.zip)
    -   as a docker container (trainbenchmark-docker.tar)
-   Full collection of all measurement data and diagrams mentioned in
    the paper (paper-results.zip)

### General Remarks on the presented Listings and Measurements

For reasons of readability and simplicity, there are some minor
differences in naming in the source codes and the measured resulting
data. Most importantly, the names of the three presented JastAdd
implementation variants are different in the code and the diagrams.

The following table shows the relation of the terminology used in the
paper and in the code.

  Name used in paper and result data   Name used in source code
  ------------------------------------ --------------------------
  Name Lookup                          jastadd-namelookup
  Intrinsic References                 jastadd-intrinsic
  Grammar Extension                    jastadd-relast

The Grammar Extension Preprocessor *RelAst*
-------------------------------------------

To transform the grammar extension we provide a preprocessor for
JastAdd. This preprocessor including its source code is provided in the
`preprocessor` subdirectory.

Its usage is:

-   Build the preprocessor
    -   `./gradlew build jar`
    -   copy the jar
        `cp build/libs/relational-rags-0.2.3.jar relast-compiler.jar`
-   Run preprocessor on train benchmark (output written to standard
    output):
    -   `cat examples/TrainBenchmark.relast`
    -   `java -jar relast-compiler.jar examples/TrainBenchmark.relast`
-   Run preprocessor and write output to files:
    -   `java -jar relast-compiler.jar examples/TrainBenchmark.relast --file`
    -   `cat examples/TrainBenchmarkGen.ast`
    -   `cat examples/TrainBenchmarkGen.jadd`

The Train Benchmark
-------------------

### Structure of the Train Benchmark

The benchmark is able to measure different scenarios specified by
configurations with several kinds of parameters:

1.  **Input Data:** There are two types of input data used in the
    benchmark, the `inject` and the `repair` data set. The former
    contains *valid* models, i.e., models, which do not contain any of
    the faults that are supposed to be found by the presented queries.
    The latter, `repair`, contains models already containing faults.
2.  **Queries:** The queries are used to find the aforementioned faults.
    For each fault, there are two queries: *repair*, to find the fault,
    and *inject*, to find places where a fault can be injected.
3.  **Transformations:** The transformations performed by the benchmark
    are, again, two sets: *inject* and *repair* transformations.
4.  **Transformation Strategies:** The benchmark does not perform the
    operation on all matches. The strategy *fixed* performs the
    transformation on a given number of matches, while the
    *proportional* strategy performs them on a given percentage of all
    matches.

These settings are defined in a *benchmark scenario*, which can be
edited before running the benchmark.

### Measurement Data

The result data is stored in the directory
[paper-results/](paper-results/). This directory contains two
subdirectories:

-   [measurements](paper-results/measurements) contains two directories.
    The [individual](paper-results/measurements/individual) subdirectory
    contains the measurements for individual queries for both the
    *inject* and *repair* scenario. The
    [all-queries](paper-results/measurements/all-queries) subdirectory
    contains the same data for the a run including all queries in
    sequence. Both directories contain files with time measurement data
    (starting with `times`) and the numbers of matches (starting with
    `matches`). Each file name contains information on the tool used,
    the query, and the size of the model.
-   [diagrams](paper-results/diagrams) contains the same subdirectories,
    containing diagrams with the respective measurements. The diagrams
    are generated from the same data as in the paper, but enlarged for
    better readability.

**Please Note:** The measurements were conducted using a timeout for the
whole run. If a run was not completed, no individual times of the steps
appear in the measurements and diagrams. Thus, some tools do not have
measurements for all problem sizes.

### The Source Code

For this publication, we tried to modify the source code of the
benchmark itself as little as possible. Therefore, unfortunately, the
code base is rather large and confusing. The following section tries to
point to the parts relevant for this paper.

The benchmark is structured in modules, some of which form the code of
the benchmark, some are provided by the contesting tools, and some are
related to required model serializations. There are some naming
conventions: - Tool-related modules are in directories starting with
`trainbenchmark-tool`. - Model serialization-related modules start with
`trainbenchmark-generator`. - All other modules are core modules of the
benchmark.

The JastAdd-based solutions use a preprocessor to generate Java files,
for the presented variant. Each JastAdd configuration must be presented
to the benchmark as a separate tool. Thus, there are two directories for
each variant, one for the batch processing mode and one for the
incremental mode. Because these two modes share almost all the source
code, a third directory is used to store this shared code. Finally,
there is a directory for code shared between all JastAdd variants. These
are the important directories:

-   [JastAdd with Name
    Lookup](trainbenchmark/trainbenchmark-tool-jastadd-namelookup-base)
    -   [Grammar](trainbenchmark/trainbenchmark-tool-jastadd-namelookup-base/src/main/jastadd/train.ast)
    -   [Queries](trainbenchmark/trainbenchmark-tool-jastadd-namelookup-base/src/main/jastadd/queries)
    -   [Transformations](trainbenchmark/trainbenchmark-tool-jastadd-namelookup-base/src/main/java/de/tudresden/inf/st/train/jastadd/transformations)
-   [JastAdd with Intrinsic
    References](trainbenchmark/trainbenchmark-tool-jastadd-intrinsic-base)
    -   [Grammar](trainbenchmark/trainbenchmark-tool-jastadd-intrinsic-base/src/main/jastadd/train.ast)
    -   [Queries](trainbenchmark/trainbenchmark-tool-jastadd-intrinsic-base/src/main/jastadd/queries)
    -   [Transformations](trainbenchmark/trainbenchmark-tool-jastadd-intrinsic-base/src/main/java/de/tudresden/inf/st/train/jastadd/transformations)
-   [JastAdd with Grammar
    Extension](trainbenchmark/trainbenchmark-tool-jastadd-relast-base)
    -   [(Extended)
        Grammar](trainbenchmark/trainbenchmark-tool-jastadd-relast-base/src/main/jastadd/Train.relast)
    -   [Queries](trainbenchmark/trainbenchmark-tool-jastadd-relast-base/src/main/jastadd/queries)
    -   [Transformations](trainbenchmark/trainbenchmark-tool-jastadd-relast-base/src/main/java/de/tudresden/inf/st/train/jastadd/transformations)
-   [Common JastAdd
    Code](trainbenchmark/trainbenchmark-tool-jastadd-base)

### Reproducing the Measurements

**[Please Note: Reproducing the graphs as presented in the paper and
supplied here takes a very long time depending on the utilized hardware.
It is strongly suggested running the benchmark with a smaller maximum
problem size, fewer repetitions, and a shorter
timeout.]{style="color:red"}** Most results of the benchmark are
observable with more restricted setup as well. In the following, we will
provide a suggested way to run the benchmark in different sizes. Note
that running the benchmark requires a significant amount of disk space
(up to 10GB when running the full benchmark).

To reproduce the measurements, there are several options. We provide a
prepared Docker image that can be run directly. Alternatively, it is, of
course, also possible to simply run the provided gradle build scripts.
However, since there are some software requirements imposed by the
benchmark, particularly for creating the diagrams using R. We strongly
suggest running the Docker variant.

#### Running the Benchmark with Docker

##### Loading the Docker Image

-   Variant 1 (*recommended*): Load the provided docker image
    -   Prerequisites: An installation of Docker in the `PATH`
    -   Steps:
        -   Unpack the provided archive and open a terminal in the
            extracted directory
        -   `docker load --input trainbenchmark-docker.tar`
-   Variant 2: Build the docker image from the provided Dockerfile
    -   Prerequisites: An installation of Docker in the `PATH`
    -   Steps:
        -   Unpack the provided archive and open a terminal in the
            extracted directory
        -   `docker build -t trainbenchmark .`

##### Running the Docker Image

-   `docker run -it -v "$PWD"/docker-results:/trainbenchmark/results:Z -v "$PWD"/docker-diagrams:/trainbenchmark/diagrams:Z trainbenchmark`
-   This makes the results and diagrams available outside the container
    in the directories `docker-results` and `docker-diagrams`
    respectively
-   Once running, a command prompt is opened and some information is
    displayed
-   Follow the instructions below

#### Running the Benchmark directly

-   For running a standard run, use one of the following commands:

  Name     Command          Minimum size   Maximum size   Timeout   Runs
  -------- ---------------- -------------- -------------- --------- ------
  Small    `./run_small`    1              32             60s       1
  Medium   `./run_medium`   1              64             10min     5
  Full     `./run_full`     1              512            15min     10

-   For running a custom run,
    -   run `./gradlew preprocess` to generate the grammar from the
        extended grammar specification
    -   run `./gradlew build shadowJar -x test`
    -   configure the scripts by running
        `./scripts/configure.sh 1 <MAXSIZE> <TIMEOUT in s> <REPETITIONS>`
        -   Where MAXSIZE is one of 2, 4, 8, 16, 32, 64, 128, 256, 512,
            or, 1024. The larger sizes use **a lot of** disk space!
    -   run `./gradlew generate`
    -   run the benchmark
        -   run `./gradlew individualInjectBenchmark` for the *inject*
            scenarios
        -   run `./gradlew individualRepairBenchmark` for the *repair*
            scenarios
    -   Plot the diagrams for the current run:
        `./gradlew plotIndividual`
-   The resulting data and diagrams is placed in the `results` and the
    `diagrams` folder
    -   When running with docker, the data is also in `docker-results`
        and `docker-diagrams` on the host machine.
