Visualisation

Visualisation (`payn.Visualisation.Visualisation`)

The payn.Visualisation module provides a comprehensive suite of plotting tools designed to inspect data distributions and interpret hyperparameter optimization results. It also includes a robust wrapper around Optuna’s matplotlib backend to generate a complete portfolio of optimization plots automatically.

Yield Distribution Analysis: The plot_yield_bins function generates dual-layer histograms to visualize the spread of target variables (yield). It overlays a standard frequency distribution with a color-coded classification split (Positive vs. Negative) based on a user-defined threshold. This allows for rapid visual assessment of dataset imbalance and class separability.
Automated Artifact Generation: Upon completion of an optimization study, the system auto-generates key diagnostic plots, including Optimization History (convergence tracking), Hyperparameter Importance (f-ANOVA), and Parallel Coordinate plots.
Pairwise Interaction Analysis: To reveal complex dependencies between hyperparameters, the module automatically iterates through all parameter pairs to generate Contour and Slice plots.
MLflow Integration: When enabled, all generated plots are automatically logged as artifacts to the associated MLflow run, creating a permanent visual record of the experiment’s search phase.

A class for visualizing data, distributions, metrics, and model performance in the PAYN project.

Attributes:

Name	Type	Description
`data`	`DataFrame`	The dataset to visualize.
`logger`	`Optional[Logger]`	Logger instance for tracking experiments.

Source code in payn\Visualisation\visualisation.py

class Visualisation:
    """
    A class for visualizing data, distributions, metrics, and model performance in the PAYN project.

    Attributes:
        data (pd.DataFrame): The dataset to visualize.
        logger (Optional[Logger]): Logger instance for tracking experiments.
    """

    def __init__(self, data: pd.DataFrame, logger: Optional[Logger] = None) -> None:
        """
        Initialize the Visualisation class with the dataset.

        Args:
            data (pd.DataFrame): The dataset to visualize.
            logger (Logger, optional): Instance of Logger class for logging purposes.
        """
        self.data = data
        self.logger = logger

    def plot_yield_bins(self, yield_column: str, bin_size: Optional[int] = None, positive_threshold: Optional[float] = None, title: Optional[str] = None):
        """
        Plot yield bins and optionally compare with a positive threshold.

        Args:
            yield_column (str): The name of the yield column in the dataset.
            bin_size (int, optional): The size of the bins for yield distribution. If not specified, only positive and negative bins will be plotted.
            positive_threshold (float, optional): The threshold separating negative and positive data.
            title (str, optional): The title of the plot.
        """
        # Extract the yield data
        yield_data = self.data[yield_column]

        # Set up the figure
        fig, ax = plt.subplots(figsize=(10, 6))

        if bin_size is not None:
            # Calculate bins for the yield distribution
            distribution_bins = np.linspace(0, 100, bin_size + 1)
            n, bins, patches = ax.hist(yield_data, bins=distribution_bins, alpha=0.6, edgecolor='black', label='Yield Distribution')

            # Apply color map
            colormap = plt.cm.get_cmap('viridis')
            for patch, value in zip(patches, bins):
                patch.set_facecolor(colormap(value / 100))

            # Add frequency values on top of each bar for yield distribution
            for i in range(len(n)):
                ax.text(bins[i] + (bins[i + 1] - bins[i]) / 2, n[i] + 0.5, str(n[i]),
                        ha='center', fontsize=8, color='black')

        if positive_threshold is not None:
            # Define two bins: one for negative and one for positive data
            bins = [0, positive_threshold, 100]
            counts, edges = np.histogram(yield_data, bins=bins)

            # Plot the histogram with two adjacent bins behind the yield distribution
            ax.bar(edges[:-1], counts, width=np.diff(edges), align='edge', alpha=0.3, color=['red', 'green'], edgecolor='black',
                   label=['Negative Data', 'Positive Data'], zorder=1)

            # Add frequency values on top of each bar for the positive/negative data
            for i in range(len(counts)):
                ax.text(edges[i] + (edges[i + 1] - edges[i]) / 2, counts[i] + 0.5, str(counts[i]),
                        ha='center', fontsize=10, color='black', zorder=2)

        # Title and labels
        if title:
            ax.set_title(title)
        else:
            ax.set_title('Yield Distribution and Positive/Negative Data Separation')
        ax.set_xlabel('Yield (%)')
        ax.set_ylabel('Frequency')
        ax.legend()

        plt.tight_layout()
        plt.show()

    def plot_optuna_study(self, study: optuna.Study, log_to_mlflow: bool = False):
        """
        Generate and log a comprehensive set of visualizations for an Optuna study using Matplotlib.

        Args:
            study (optuna.Study): The Optuna study to visualize.
            log_to_mlflow (bool): If True, log the visualizations to MLflow as artifacts.
        """
        visualizations: Dict[str, Callable]= {
            "contour_plot": optuna.visualization.matplotlib.plot_contour,
            "edf_plot": optuna.visualization.matplotlib.plot_edf,
            "hypervolume_history": optuna.visualization.matplotlib.plot_hypervolume_history,
            "intermediate_values": optuna.visualization.matplotlib.plot_intermediate_values,
            "optimization_history": optuna.visualization.matplotlib.plot_optimization_history,
            "parallel_coordinate": optuna.visualization.matplotlib.plot_parallel_coordinate,
            "param_importances": optuna.visualization.matplotlib.plot_param_importances,
            "pareto_front": optuna.visualization.matplotlib.plot_pareto_front,
            "rank_plot": optuna.visualization.matplotlib.plot_rank,
            "slice_plot": optuna.visualization.matplotlib.plot_slice,
            "terminator_improvement": optuna.visualization.matplotlib.plot_terminator_improvement,
            "timeline_plot": optuna.visualization.matplotlib.plot_timeline,
        }

        # Directory to save plots locally before logging to MLflow
        output_dir = "optuna_visualizations"
        os.makedirs(output_dir, exist_ok=True)

        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=ExperimentalWarning)

            for plot_name, plot_function in visualizations.items():
                try:
                    if plot_name in ["contour_plot", "slice_plot"]:
                        # Ensure parameters exist in the study
                        params = list(study.best_params.keys())
                        if len(params) < 2:
                            # print(f"Skipping {plot_name}: Requires at least two parameters in the study.")
                            continue

                        # Plot all combinations of parameter pairs
                        for param_pair in combinations(params, 2):
                            plt.figure(figsize=(10, 6))
                            plot_function(study, params=list(param_pair))

                            # Adjust legend placement for slice_plot
                            if plot_name == "slice_plot":
                                # Adjust legend placement to avoid overlapping with the color bar
                                legend = plt.legend(loc='upper left', bbox_to_anchor=(0.15, 1), borderaxespad=0.) #1.15, 1
                                # Reduce the layout tightness to accommodate the legend placement
                                # plt.tight_layout(rect=[0, 0, 0.85, 1])
                                # plt.figure() # figsize=(10, 6)

                            # Save the plot
                            plot_path = os.path.join(output_dir, f"{plot_name}_{'_'.join(param_pair)}.png")
                            #plt.tight_layout()
                            plt.savefig(plot_path)
                            plt.close()

                            # Log the artifact to MLflow if required
                            if log_to_mlflow and self.logger:
                                self.logger.log_image_to_mlflow(plot_path)

                            # print(f"Successfully generated and saved: {plot_name} for {param_pair}")

                    elif plot_name == "pareto_front" or plot_name == "hypervolume_history":
                        # Multi-objective specific plots
                        if len(study.directions) < 2:
                            # print(f"Skipping {plot_name}: Applicable only for multi-objective studies.")
                            continue
                        plt.figure() # figsize=(10, 6)
                        if plot_name == "hypervolume_history":
                            reference_point = [100] * len(study.directions)
                            plot_function(study, reference_point=reference_point)
                        else:
                            plot_function(study)

                        # Save the plot
                        plot_path = os.path.join(output_dir, f"{plot_name}.png")
                        # plt.tight_layout()
                        plt.savefig(plot_path)
                        plt.close()

                        # Log the artifact to MLflow if required
                        if log_to_mlflow and self.logger:
                            self.logger.log_image_to_mlflow(plot_path)

                        # print(f"Successfully generated and saved: {plot_name}")

                    elif plot_name == "intermediate_values":
                        # Ensure study includes pruning to utilize intermediate values
                        if not any(t.intermediate_values for t in study.trials):
                            # print(f"Skipping {plot_name}: No intermediate values found in the study.")
                            continue
                        plt.figure() # figsize=(10, 6)
                        plot_function(study)

                        # Save the plot
                        plot_path = os.path.join(output_dir, f"{plot_name}.png")
                        # plt.tight_layout()
                        plt.savefig(plot_path)
                        plt.close()

                        # Log the artifact to MLflow if required
                        if log_to_mlflow and self.logger:
                            self.logger.log_image_to_mlflow(plot_path)

                        # print(f"Successfully generated and saved: {plot_name}")

                    elif plot_name == "rank_plot":
                        plt.figure() # figsize=(10, 6)
                        plt.rcParams["figure.figsize"] = (10, 6)
                        plot_function(study)

                        # Add legend
                        #plt.legend()
                        plot_path = os.path.join(output_dir, f"{plot_name}.png")
                        plt.savefig(plot_path)
                        plt.close()

                    elif plot_name == "parallel_coordinate":
                        plt.figure()  # You can set figsize=(10, 6) if needed
                        plot_function(study)
                        if plot_name.lower() == "parallel_coordinate":
                            plt.xticks(rotation=45, ha='right')

                            # Adjust layout and save figure with tight bounding box
                        plt.tight_layout()
                        # Save the plot
                        plot_path = os.path.join(output_dir, f"{plot_name}.png")
                        # plt.tight_layout()
                        plt.savefig(plot_path)
                        plt.close()

                    else:
                        # General case for all other plots
                        plt.figure() # figsize=(10, 6)
                        plot_function(study)

                        # Add legend
                        plt.legend()
                        # plt.tight_layout()
                        # ##del
                        # plt.show()
                        # plt.close()

                        # Save the plot
                        plot_path = os.path.join(output_dir, f"{plot_name}.png")
                        # plt.tight_layout()
                        plt.savefig(plot_path)
                        plt.close()

                        # Log the artifact to MLflow if required
                        if log_to_mlflow and self.logger:
                            self.logger.log_image_to_mlflow(plot_path)

                        # print(f"Successfully generated and saved: {plot_name}")

                except Exception as e:
                    print(f"Error generating {plot_name}: {e}")
                    plt.close()

        # Clean up temporary files if logged to MLflow
        if log_to_mlflow:
            shutil.rmtree(output_dir, ignore_errors=True)

`init(data, logger=None)`

Initialize the Visualisation class with the dataset.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The dataset to visualize.	required
`logger`	`Logger`	Instance of Logger class for logging purposes.	`None`

Source code in payn\Visualisation\visualisation.py

def __init__(self, data: pd.DataFrame, logger: Optional[Logger] = None) -> None:
    """
    Initialize the Visualisation class with the dataset.

    Args:
        data (pd.DataFrame): The dataset to visualize.
        logger (Logger, optional): Instance of Logger class for logging purposes.
    """
    self.data = data
    self.logger = logger

`plot_optuna_study(study, log_to_mlflow=False)`

Generate and log a comprehensive set of visualizations for an Optuna study using Matplotlib.

Parameters:

Name	Type	Description	Default
`study`	`Study`	The Optuna study to visualize.	required
`log_to_mlflow`	`bool`	If True, log the visualizations to MLflow as artifacts.	`False`

Source code in payn\Visualisation\visualisation.py

def plot_optuna_study(self, study: optuna.Study, log_to_mlflow: bool = False):
    """
    Generate and log a comprehensive set of visualizations for an Optuna study using Matplotlib.

    Args:
        study (optuna.Study): The Optuna study to visualize.
        log_to_mlflow (bool): If True, log the visualizations to MLflow as artifacts.
    """
    visualizations: Dict[str, Callable]= {
        "contour_plot": optuna.visualization.matplotlib.plot_contour,
        "edf_plot": optuna.visualization.matplotlib.plot_edf,
        "hypervolume_history": optuna.visualization.matplotlib.plot_hypervolume_history,
        "intermediate_values": optuna.visualization.matplotlib.plot_intermediate_values,
        "optimization_history": optuna.visualization.matplotlib.plot_optimization_history,
        "parallel_coordinate": optuna.visualization.matplotlib.plot_parallel_coordinate,
        "param_importances": optuna.visualization.matplotlib.plot_param_importances,
        "pareto_front": optuna.visualization.matplotlib.plot_pareto_front,
        "rank_plot": optuna.visualization.matplotlib.plot_rank,
        "slice_plot": optuna.visualization.matplotlib.plot_slice,
        "terminator_improvement": optuna.visualization.matplotlib.plot_terminator_improvement,
        "timeline_plot": optuna.visualization.matplotlib.plot_timeline,
    }

    # Directory to save plots locally before logging to MLflow
    output_dir = "optuna_visualizations"
    os.makedirs(output_dir, exist_ok=True)

    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=ExperimentalWarning)

        for plot_name, plot_function in visualizations.items():
            try:
                if plot_name in ["contour_plot", "slice_plot"]:
                    # Ensure parameters exist in the study
                    params = list(study.best_params.keys())
                    if len(params) < 2:
                        # print(f"Skipping {plot_name}: Requires at least two parameters in the study.")
                        continue

                    # Plot all combinations of parameter pairs
                    for param_pair in combinations(params, 2):
                        plt.figure(figsize=(10, 6))
                        plot_function(study, params=list(param_pair))

                        # Adjust legend placement for slice_plot
                        if plot_name == "slice_plot":
                            # Adjust legend placement to avoid overlapping with the color bar
                            legend = plt.legend(loc='upper left', bbox_to_anchor=(0.15, 1), borderaxespad=0.) #1.15, 1
                            # Reduce the layout tightness to accommodate the legend placement
                            # plt.tight_layout(rect=[0, 0, 0.85, 1])
                            # plt.figure() # figsize=(10, 6)

                        # Save the plot
                        plot_path = os.path.join(output_dir, f"{plot_name}_{'_'.join(param_pair)}.png")
                        #plt.tight_layout()
                        plt.savefig(plot_path)
                        plt.close()

                        # Log the artifact to MLflow if required
                        if log_to_mlflow and self.logger:
                            self.logger.log_image_to_mlflow(plot_path)

                        # print(f"Successfully generated and saved: {plot_name} for {param_pair}")

                elif plot_name == "pareto_front" or plot_name == "hypervolume_history":
                    # Multi-objective specific plots
                    if len(study.directions) < 2:
                        # print(f"Skipping {plot_name}: Applicable only for multi-objective studies.")
                        continue
                    plt.figure() # figsize=(10, 6)
                    if plot_name == "hypervolume_history":
                        reference_point = [100] * len(study.directions)
                        plot_function(study, reference_point=reference_point)
                    else:
                        plot_function(study)

                    # Save the plot
                    plot_path = os.path.join(output_dir, f"{plot_name}.png")
                    # plt.tight_layout()
                    plt.savefig(plot_path)
                    plt.close()

                    # Log the artifact to MLflow if required
                    if log_to_mlflow and self.logger:
                        self.logger.log_image_to_mlflow(plot_path)

                    # print(f"Successfully generated and saved: {plot_name}")

                elif plot_name == "intermediate_values":
                    # Ensure study includes pruning to utilize intermediate values
                    if not any(t.intermediate_values for t in study.trials):
                        # print(f"Skipping {plot_name}: No intermediate values found in the study.")
                        continue
                    plt.figure() # figsize=(10, 6)
                    plot_function(study)

                    # Save the plot
                    plot_path = os.path.join(output_dir, f"{plot_name}.png")
                    # plt.tight_layout()
                    plt.savefig(plot_path)
                    plt.close()

                    # Log the artifact to MLflow if required
                    if log_to_mlflow and self.logger:
                        self.logger.log_image_to_mlflow(plot_path)

                    # print(f"Successfully generated and saved: {plot_name}")

                elif plot_name == "rank_plot":
                    plt.figure() # figsize=(10, 6)
                    plt.rcParams["figure.figsize"] = (10, 6)
                    plot_function(study)

                    # Add legend
                    #plt.legend()
                    plot_path = os.path.join(output_dir, f"{plot_name}.png")
                    plt.savefig(plot_path)
                    plt.close()

                elif plot_name == "parallel_coordinate":
                    plt.figure()  # You can set figsize=(10, 6) if needed
                    plot_function(study)
                    if plot_name.lower() == "parallel_coordinate":
                        plt.xticks(rotation=45, ha='right')

                        # Adjust layout and save figure with tight bounding box
                    plt.tight_layout()
                    # Save the plot
                    plot_path = os.path.join(output_dir, f"{plot_name}.png")
                    # plt.tight_layout()
                    plt.savefig(plot_path)
                    plt.close()

                else:
                    # General case for all other plots
                    plt.figure() # figsize=(10, 6)
                    plot_function(study)

                    # Add legend
                    plt.legend()
                    # plt.tight_layout()
                    # ##del
                    # plt.show()
                    # plt.close()

                    # Save the plot
                    plot_path = os.path.join(output_dir, f"{plot_name}.png")
                    # plt.tight_layout()
                    plt.savefig(plot_path)
                    plt.close()

                    # Log the artifact to MLflow if required
                    if log_to_mlflow and self.logger:
                        self.logger.log_image_to_mlflow(plot_path)

                    # print(f"Successfully generated and saved: {plot_name}")

            except Exception as e:
                print(f"Error generating {plot_name}: {e}")
                plt.close()

    # Clean up temporary files if logged to MLflow
    if log_to_mlflow:
        shutil.rmtree(output_dir, ignore_errors=True)

`plot_yield_bins(yield_column, bin_size=None, positive_threshold=None, title=None)`

Plot yield bins and optionally compare with a positive threshold.

Parameters:

Name	Type	Description	Default
`yield_column`	`str`	The name of the yield column in the dataset.	required
`bin_size`	`int`	The size of the bins for yield distribution. If not specified, only positive and negative bins will be plotted.	`None`
`positive_threshold`	`float`	The threshold separating negative and positive data.	`None`
`title`	`str`	The title of the plot.	`None`

Source code in payn\Visualisation\visualisation.py

def plot_yield_bins(self, yield_column: str, bin_size: Optional[int] = None, positive_threshold: Optional[float] = None, title: Optional[str] = None):
    """
    Plot yield bins and optionally compare with a positive threshold.

    Args:
        yield_column (str): The name of the yield column in the dataset.
        bin_size (int, optional): The size of the bins for yield distribution. If not specified, only positive and negative bins will be plotted.
        positive_threshold (float, optional): The threshold separating negative and positive data.
        title (str, optional): The title of the plot.
    """
    # Extract the yield data
    yield_data = self.data[yield_column]

    # Set up the figure
    fig, ax = plt.subplots(figsize=(10, 6))

    if bin_size is not None:
        # Calculate bins for the yield distribution
        distribution_bins = np.linspace(0, 100, bin_size + 1)
        n, bins, patches = ax.hist(yield_data, bins=distribution_bins, alpha=0.6, edgecolor='black', label='Yield Distribution')

        # Apply color map
        colormap = plt.cm.get_cmap('viridis')
        for patch, value in zip(patches, bins):
            patch.set_facecolor(colormap(value / 100))

        # Add frequency values on top of each bar for yield distribution
        for i in range(len(n)):
            ax.text(bins[i] + (bins[i + 1] - bins[i]) / 2, n[i] + 0.5, str(n[i]),
                    ha='center', fontsize=8, color='black')

    if positive_threshold is not None:
        # Define two bins: one for negative and one for positive data
        bins = [0, positive_threshold, 100]
        counts, edges = np.histogram(yield_data, bins=bins)

        # Plot the histogram with two adjacent bins behind the yield distribution
        ax.bar(edges[:-1], counts, width=np.diff(edges), align='edge', alpha=0.3, color=['red', 'green'], edgecolor='black',
               label=['Negative Data', 'Positive Data'], zorder=1)

        # Add frequency values on top of each bar for the positive/negative data
        for i in range(len(counts)):
            ax.text(edges[i] + (edges[i + 1] - edges[i]) / 2, counts[i] + 0.5, str(counts[i]),
                    ha='center', fontsize=10, color='black', zorder=2)

    # Title and labels
    if title:
        ax.set_title(title)
    else:
        ax.set_title('Yield Distribution and Positive/Negative Data Separation')
    ax.set_xlabel('Yield (%)')
    ax.set_ylabel('Frequency')
    ax.legend()

    plt.tight_layout()
    plt.show()

Visualisation

Visualisation (payn.Visualisation.Visualisation)

__init__(data, logger=None)

plot_optuna_study(study, log_to_mlflow=False)

plot_yield_bins(yield_column, bin_size=None, positive_threshold=None, title=None)

Visualisation (`payn.Visualisation.Visualisation`)

`init(data, logger=None)`

`plot_optuna_study(study, log_to_mlflow=False)`

`plot_yield_bins(yield_column, bin_size=None, positive_threshold=None, title=None)`