Introduction#

MoDeVa is a comprehensive tool designed to support both model development and model validation. It provides the following capabilities:

  • Building conceptually sound models: High-quality data, proper feature selection, interpretable model development, performance optimization and benchmarking.

  • Conducting rigorous outcome analysis: Identification of model weaknesses, output reliablity throuput uncertainty quantification, model robustness against noice, resilience against diistribution drift, and fairness analysis.

  • Performing ongoing monitoring: Continuous monitoring and improvement of deployed models to meet evolving business needs.

Designed for use by data scientists, MoDeVa offers flexible workflow options to meet diverse user needs, ranging from advanced customization to streamlined automation:

  • High Code: Power user library with scikit-learn-like APIs for full flexibility and control.

  • Low Code: Menu-driven panels within Jupyter notebooks for ease of use.

  • Pipeline: Chain-of-task execution for streamlined production with automated report generation.

Key Modules#

Conceptual Soundness#

MoDeVa ensures models are developed on a solid foundation with advanced capabilities for data quality, feature selection, and interpretable model development.

  • Data Quality & Suitability:

    • Data summary statistics and missing value treatment.

    • Exploratory data analysis (1D, 2D, 3D plots, multivariate correlations).

    • Outlier detection techniques (e.g., Isolation Forest, CBLOF).

    • Data drift testing (Population Stability Index (PSI), Wasserstein Distance (WD)).

  • Variable Selection & Causality:

    • Univariate feature selection (correlation-based methods).

    • XGB permutation-based feature importance.

    • Randomized conditional independence tests.

  • Interpretable Model Development:

    • Rich suite of inherently interpretable modeling frameworks like

      • FANOVA and GAMI-Net with main effects and interactions.

      • Locally interpretable Deep ReLU Neural Networks.

      • Neural-Tree architectures.

      • Mixture-of-Experts (MoE) models.

    • Hyperparameter optimization techniques for fine-tuning models

  • Interpretability & Explainability:

    • Inherent Interpretability: Direct interpretations of GLM, GAMI-Net, and MoE approaches.

    • Post-Hoc Explanations:

      • Permutation Feature Importance (PFI), H-statistics.

      • Partial Dependence Plots (PDP), Accumulated Local Effects (ALE).

      • LIME (Local Interpretable Model-Agnostic Explanations), SHAP (SHapley Additive exPlanations).

Outcome Analysis#

MoDeVa enables detailed testing and analysis to model outcome, to identify model weaknesses, ensure reliability, and enhance robustness.

  • Model Performance:

    • Comprehensive performance metrics.

    • Performance analysis on sliced segments.

    • Residual analysis and overfit detection.

  • Reliability Testing:

    • Split conformal prediction for regression models.

    • Full conformal prediction for classification models.

    • Reliability diagnostics with slicing techniques.

  • Robustness Testing:

    • Evaluate performance degradation under covariate noise perturbation.

    • Perturbation strategies (raw-scale, quantile-scale).

    • Robustness diagnostics by slicing techniques.

  • Resilience Testing:

    • Evaluate performance dedegration under distribution shifts.

    • Distribution shift scenarios: worst-sample, worst-cluster

    • Diagnostics to pinpoint vulnerabilities.

  • Fairness Analysis:

    • Fairness metrics for classification and regression.

    • Fairness analysis on sliced segments.

    • Bias mitigation strategies to ensure alignment with regulatory requirements.

Special Features#

  • User-Friendly Low-code Panels for Jupyter Notebook Users:

    Simplified panels designed for seamless interaction, providing intuitive interfaces for data exploration, model development, and outcome analysis with minimal coding effort.

  • Interactive Visualizations:

    Powerful statistical graphics and dashboards for dynamic data exploration, model analytics, and actionable insights.

  • Experiment Tracking:

    Leverage MLflow to manage datasets and maintain a detailed registry of experiments, ensuring reproducibility and efficient model lifecycle management.

  • Integration and Extensibility:

    Seamless integration of external models into MoDeVa workflows for consistent validation and streamlined operations.

    Additionally, MoDeVa offers APIs for integration with external MLOps pipelines and other enterprise systems, ensuring compatibility with diverse infrastructures.