 UHPC Design-of-Experiments and Analysis Code
============================================

This archive contains the R script and Python notebooks used to design, augment,
and analyze the experiments that produced the publicly available dataset:

  Rezazadeh, F., Dürrbaum, A., Abrishambaf, A., Zimmermann, G., & Kroll, A.
  Mechanical properties of ultra-high performance concrete (UHPC).
  Dataset, DOI: 10.48662/daks-56 (DaKS – University of Kassel research data repository, 2025).

The dataset and this code bundle are intended to be used together: the dataset
provides the measurements from 150 UHPC batches, and this archive documents how
the design of experiments (DoE) and the main descriptive statistics were
generated.

-------------------------------------------------------------------------------
1. Archive contents
-------------------------------------------------------------------------------

After unzipping, the folder `Design_of_Experiments_Code/` has the following structure:

  DoE_code/
  ├─ R/
  │   └─ doe_modeling_phase_lhs_augmentation.R
  ├─ python/
  │   ├─ doe_modeling_phase_processing.ipynb
  │   └─ eda_level_distribution_and_descriptive_stats.ipynb
  └─ README.txt  (this file)

1.1 R script
------------

File: `R/doe_modeling_phase_lhs_augmentation.R`

This script implements the modeling-phase design augmentation used when creating
the UHPC dataset (DOI: 10.48662/daks-56).

Steps:

  1. Loads the Taguchi L50 screening desig.
     with factor levels coded as integers in {1, 2, 3, 4, 5}.
  2. Normalizes each factor (column) to the interval [0, 1].
  3. Calls `optAugmentLHS()` from the R package `lhs` to add 50 additional
     experiments in an S-optimal Latin hypercube fashion.
  4. Writes the augmented normalized design to Excel.

The script does NOT rescale back to discrete levels; this is performed in the
Python notebook `doe_modeling_phase_processing.ipynb`.

1.2 Python notebooks
--------------------

File: `python/doe_modeling_phase_processing.ipynb`

This notebook post-processes the augmented design from the R script and prepares
a complete modeling-phase design:

  1. Rescales the normalized augmented design from [0, 1] back to the coded
     levels {1, 2, 3, 4, 5} and rounds to the nearest integer.
  2. Checks for duplicate experiments by comparing the original Taguchi L50
     design with the augmented design (exact row matches).
  3. Combines the original 50 screening experiments with the newly added
     modeling-phase experiments to form the final design (levels 1–5).
  4. Maps coded levels to real physical values according to the factor-level definitions
     used in the UHPC experiments.
  5. Writes intermediate and final designs to Excel, including:
       - augmented design in [0,1],
       - augmented design in levels {1–5},
       - tables with/without duplicates,
       - final design with real-valued factors.

This notebook is the bridge between the R-based design augmentation and the
final, physically interpretable UHPC factor settings.

---

File: `python/eda_level_distribution_and_descriptive_stats.ipynb`

This notebook performs exploratory data analysis (EDA):

  1. Level distributions:
       - Counts how often each coded level (e.g., 1–5) is used for each discrete
         factor across all 150 experiments (screening, modeling, and
         domain-optimization phases).
  2. Descriptive statistics:
       - Computes mean, median, standard deviation, minimum, and maximum for
         selected inputs and outputs.

-------------------------------------------------------------------------------
2. Required software and libraries
-------------------------------------------------------------------------------

2.1 R environment
-----------------

Requirements:

  - Packages:
      * `lhs`       – Latin hypercube samples (contains `optAugmentLHS()`, GPL-3).
      * `readxl`    – import Excel files.
      * `openxlsx`  – export Excel files.

Install the required R packages with:

  install.packages(c("lhs", "readxl", "openxlsx"))

The R script uses these packages but does not redistribute their code. Please
refer to each package’s CRAN page for license details.

2.2 Python environment
----------------------

Requirements:

  - Packages:
      * `pandas`       – data frames and Excel I/O.
      * `numpy`        – numerical utilities.
      * `scikit-learn` – `MinMaxScaler` for rescaling.
      * `openpyxl`     – Excel reader/writer backend.

Install with:

  pip install pandas numpy scikit-learn openpyxl

These are external dependencies with their own licenses (as specified in their
documentation) and are not included in this archive.

-------------------------------------------------------------------------------
3. References for the R design-augmentation function
-------------------------------------------------------------------------------

The R script uses the `optAugmentLHS()` function from the R package `lhs`. The
following references are cited in the script header:

1. `lhs` R package documentation (CRAN):
   https://cran.r-project.org/web/packages/lhs/lhs.pdf
2. Stein, M. Large sample properties of simulations using Latin hypercube
   sampling. Technometrics 29, 143–151, 
   DOI: 10.1080/00401706.1987.10488205 (1987).
3. Carnell, R. `optaugmentlhs.r` [R script]. GitHub repository:
   `bertcarnell/lhs` (2019). Part of the `lhs` package (GPL-3).
   Accessed 12 February 2025.
------------------------------------------------------------------
NOTE: Reuse of this database is unlimited with retention of copyright notice.