Skip to content

Visualisation

Visualisation (payn.Visualisation.Visualisation)

The payn.Visualisation module provides a comprehensive suite of plotting tools designed to inspect data distributions and interpret hyperparameter optimization results. It also includes a robust wrapper around Optuna’s matplotlib backend to generate a complete portfolio of optimization plots automatically.

  • Yield Distribution Analysis: The plot_yield_bins function generates dual-layer histograms to visualize the spread of target variables (yield). It overlays a standard frequency distribution with a color-coded classification split (Positive vs. Negative) based on a user-defined threshold. This allows for rapid visual assessment of dataset imbalance and class separability.
  • Automated Artifact Generation: Upon completion of an optimization study, the system auto-generates key diagnostic plots, including Optimization History (convergence tracking), Hyperparameter Importance (f-ANOVA), and Parallel Coordinate plots.
  • Pairwise Interaction Analysis: To reveal complex dependencies between hyperparameters, the module automatically iterates through all parameter pairs to generate Contour and Slice plots.
  • MLflow Integration: When enabled, all generated plots are automatically logged as artifacts to the associated MLflow run, creating a permanent visual record of the experiment’s search phase.

A class for visualizing data, distributions, metrics, and model performance in the PAYN project.

Attributes:

Name Type Description
data DataFrame

The dataset to visualize.

logger Optional[Logger]

Logger instance for tracking experiments.

Source code in payn\Visualisation\visualisation.py
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
class Visualisation:
    """
    A class for visualizing data, distributions, metrics, and model performance in the PAYN project.

    Attributes:
        data (pd.DataFrame): The dataset to visualize.
        logger (Optional[Logger]): Logger instance for tracking experiments.
    """

    def __init__(self, data: pd.DataFrame, logger: Optional[Logger] = None) -> None:
        """
        Initialize the Visualisation class with the dataset.

        Args:
            data (pd.DataFrame): The dataset to visualize.
            logger (Logger, optional): Instance of Logger class for logging purposes.
        """
        self.data = data
        self.logger = logger

    def plot_yield_bins(self, yield_column: str, bin_size: Optional[int] = None, positive_threshold: Optional[float] = None, title: Optional[str] = None):
        """
        Plot yield bins and optionally compare with a positive threshold.

        Args:
            yield_column (str): The name of the yield column in the dataset.
            bin_size (int, optional): The size of the bins for yield distribution. If not specified, only positive and negative bins will be plotted.
            positive_threshold (float, optional): The threshold separating negative and positive data.
            title (str, optional): The title of the plot.
        """
        # Extract the yield data
        yield_data = self.data[yield_column]

        # Set up the figure
        fig, ax = plt.subplots(figsize=(10, 6))

        if bin_size is not None:
            # Calculate bins for the yield distribution
            distribution_bins = np.linspace(0, 100, bin_size + 1)
            n, bins, patches = ax.hist(yield_data, bins=distribution_bins, alpha=0.6, edgecolor='black', label='Yield Distribution')

            # Apply color map
            colormap = plt.cm.get_cmap('viridis')
            for patch, value in zip(patches, bins):
                patch.set_facecolor(colormap(value / 100))

            # Add frequency values on top of each bar for yield distribution
            for i in range(len(n)):
                ax.text(bins[i] + (bins[i + 1] - bins[i]) / 2, n[i] + 0.5, str(n[i]),
                        ha='center', fontsize=8, color='black')

        if positive_threshold is not None:
            # Define two bins: one for negative and one for positive data
            bins = [0, positive_threshold, 100]
            counts, edges = np.histogram(yield_data, bins=bins)

            # Plot the histogram with two adjacent bins behind the yield distribution
            ax.bar(edges[:-1], counts, width=np.diff(edges), align='edge', alpha=0.3, color=['red', 'green'], edgecolor='black',
                   label=['Negative Data', 'Positive Data'], zorder=1)

            # Add frequency values on top of each bar for the positive/negative data
            for i in range(len(counts)):
                ax.text(edges[i] + (edges[i + 1] - edges[i]) / 2, counts[i] + 0.5, str(counts[i]),
                        ha='center', fontsize=10, color='black', zorder=2)

        # Title and labels
        if title:
            ax.set_title(title)
        else:
            ax.set_title('Yield Distribution and Positive/Negative Data Separation')
        ax.set_xlabel('Yield (%)')
        ax.set_ylabel('Frequency')
        ax.legend()

        plt.tight_layout()
        plt.show()

    def plot_optuna_study(self, study: optuna.Study, log_to_mlflow: bool = False):
        """
        Generate and log a comprehensive set of visualizations for an Optuna study using Matplotlib.

        Args:
            study (optuna.Study): The Optuna study to visualize.
            log_to_mlflow (bool): If True, log the visualizations to MLflow as artifacts.
        """
        visualizations: Dict[str, Callable]= {
            "contour_plot": optuna.visualization.matplotlib.plot_contour,
            "edf_plot": optuna.visualization.matplotlib.plot_edf,
            "hypervolume_history": optuna.visualization.matplotlib.plot_hypervolume_history,
            "intermediate_values": optuna.visualization.matplotlib.plot_intermediate_values,
            "optimization_history": optuna.visualization.matplotlib.plot_optimization_history,
            "parallel_coordinate": optuna.visualization.matplotlib.plot_parallel_coordinate,
            "param_importances": optuna.visualization.matplotlib.plot_param_importances,
            "pareto_front": optuna.visualization.matplotlib.plot_pareto_front,
            "rank_plot": optuna.visualization.matplotlib.plot_rank,
            "slice_plot": optuna.visualization.matplotlib.plot_slice,
            "terminator_improvement": optuna.visualization.matplotlib.plot_terminator_improvement,
            "timeline_plot": optuna.visualization.matplotlib.plot_timeline,
        }

        # Directory to save plots locally before logging to MLflow
        output_dir = "optuna_visualizations"
        os.makedirs(output_dir, exist_ok=True)

        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=ExperimentalWarning)

            for plot_name, plot_function in visualizations.items():
                try:
                    if plot_name in ["contour_plot", "slice_plot"]:
                        # Ensure parameters exist in the study
                        params = list(study.best_params.keys())
                        if len(params) < 2:
                            # print(f"Skipping {plot_name}: Requires at least two parameters in the study.")
                            continue

                        # Plot all combinations of parameter pairs
                        for param_pair in combinations(params, 2):
                            plt.figure(figsize=(10, 6))
                            plot_function(study, params=list(param_pair))

                            # Adjust legend placement for slice_plot
                            if plot_name == "slice_plot":
                                # Adjust legend placement to avoid overlapping with the color bar
                                legend = plt.legend(loc='upper left', bbox_to_anchor=(0.15, 1), borderaxespad=0.) #1.15, 1
                                # Reduce the layout tightness to accommodate the legend placement
                                # plt.tight_layout(rect=[0, 0, 0.85, 1])
                                # plt.figure() # figsize=(10, 6)

                            # Save the plot
                            plot_path = os.path.join(output_dir, f"{plot_name}_{'_'.join(param_pair)}.png")
                            #plt.tight_layout()
                            plt.savefig(plot_path)
                            plt.close()

                            # Log the artifact to MLflow if required
                            if log_to_mlflow and self.logger:
                                self.logger.log_image_to_mlflow(plot_path)

                            # print(f"Successfully generated and saved: {plot_name} for {param_pair}")

                    elif plot_name == "pareto_front" or plot_name == "hypervolume_history":
                        # Multi-objective specific plots
                        if len(study.directions) < 2:
                            # print(f"Skipping {plot_name}: Applicable only for multi-objective studies.")
                            continue
                        plt.figure() # figsize=(10, 6)
                        if plot_name == "hypervolume_history":
                            reference_point = [100] * len(study.directions)
                            plot_function(study, reference_point=reference_point)
                        else:
                            plot_function(study)

                        # Save the plot
                        plot_path = os.path.join(output_dir, f"{plot_name}.png")
                        # plt.tight_layout()
                        plt.savefig(plot_path)
                        plt.close()

                        # Log the artifact to MLflow if required
                        if log_to_mlflow and self.logger:
                            self.logger.log_image_to_mlflow(plot_path)

                        # print(f"Successfully generated and saved: {plot_name}")

                    elif plot_name == "intermediate_values":
                        # Ensure study includes pruning to utilize intermediate values
                        if not any(t.intermediate_values for t in study.trials):
                            # print(f"Skipping {plot_name}: No intermediate values found in the study.")
                            continue
                        plt.figure() # figsize=(10, 6)
                        plot_function(study)

                        # Save the plot
                        plot_path = os.path.join(output_dir, f"{plot_name}.png")
                        # plt.tight_layout()
                        plt.savefig(plot_path)
                        plt.close()

                        # Log the artifact to MLflow if required
                        if log_to_mlflow and self.logger:
                            self.logger.log_image_to_mlflow(plot_path)

                        # print(f"Successfully generated and saved: {plot_name}")

                    elif plot_name == "rank_plot":
                        plt.figure() # figsize=(10, 6)
                        plt.rcParams["figure.figsize"] = (10, 6)
                        plot_function(study)

                        # Add legend
                        #plt.legend()
                        plot_path = os.path.join(output_dir, f"{plot_name}.png")
                        plt.savefig(plot_path)
                        plt.close()

                    elif plot_name == "parallel_coordinate":
                        plt.figure()  # You can set figsize=(10, 6) if needed
                        plot_function(study)
                        if plot_name.lower() == "parallel_coordinate":
                            plt.xticks(rotation=45, ha='right')

                            # Adjust layout and save figure with tight bounding box
                        plt.tight_layout()
                        # Save the plot
                        plot_path = os.path.join(output_dir, f"{plot_name}.png")
                        # plt.tight_layout()
                        plt.savefig(plot_path)
                        plt.close()

                    else:
                        # General case for all other plots
                        plt.figure() # figsize=(10, 6)
                        plot_function(study)

                        # Add legend
                        plt.legend()
                        # plt.tight_layout()
                        # ##del
                        # plt.show()
                        # plt.close()

                        # Save the plot
                        plot_path = os.path.join(output_dir, f"{plot_name}.png")
                        # plt.tight_layout()
                        plt.savefig(plot_path)
                        plt.close()

                        # Log the artifact to MLflow if required
                        if log_to_mlflow and self.logger:
                            self.logger.log_image_to_mlflow(plot_path)

                        # print(f"Successfully generated and saved: {plot_name}")

                except Exception as e:
                    print(f"Error generating {plot_name}: {e}")
                    plt.close()

        # Clean up temporary files if logged to MLflow
        if log_to_mlflow:
            shutil.rmtree(output_dir, ignore_errors=True)

__init__(data, logger=None)

Initialize the Visualisation class with the dataset.

Parameters:

Name Type Description Default
data DataFrame

The dataset to visualize.

required
logger Logger

Instance of Logger class for logging purposes.

None
Source code in payn\Visualisation\visualisation.py
25
26
27
28
29
30
31
32
33
34
def __init__(self, data: pd.DataFrame, logger: Optional[Logger] = None) -> None:
    """
    Initialize the Visualisation class with the dataset.

    Args:
        data (pd.DataFrame): The dataset to visualize.
        logger (Logger, optional): Instance of Logger class for logging purposes.
    """
    self.data = data
    self.logger = logger

plot_optuna_study(study, log_to_mlflow=False)

Generate and log a comprehensive set of visualizations for an Optuna study using Matplotlib.

Parameters:

Name Type Description Default
study Study

The Optuna study to visualize.

required
log_to_mlflow bool

If True, log the visualizations to MLflow as artifacts.

False
Source code in payn\Visualisation\visualisation.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
def plot_optuna_study(self, study: optuna.Study, log_to_mlflow: bool = False):
    """
    Generate and log a comprehensive set of visualizations for an Optuna study using Matplotlib.

    Args:
        study (optuna.Study): The Optuna study to visualize.
        log_to_mlflow (bool): If True, log the visualizations to MLflow as artifacts.
    """
    visualizations: Dict[str, Callable]= {
        "contour_plot": optuna.visualization.matplotlib.plot_contour,
        "edf_plot": optuna.visualization.matplotlib.plot_edf,
        "hypervolume_history": optuna.visualization.matplotlib.plot_hypervolume_history,
        "intermediate_values": optuna.visualization.matplotlib.plot_intermediate_values,
        "optimization_history": optuna.visualization.matplotlib.plot_optimization_history,
        "parallel_coordinate": optuna.visualization.matplotlib.plot_parallel_coordinate,
        "param_importances": optuna.visualization.matplotlib.plot_param_importances,
        "pareto_front": optuna.visualization.matplotlib.plot_pareto_front,
        "rank_plot": optuna.visualization.matplotlib.plot_rank,
        "slice_plot": optuna.visualization.matplotlib.plot_slice,
        "terminator_improvement": optuna.visualization.matplotlib.plot_terminator_improvement,
        "timeline_plot": optuna.visualization.matplotlib.plot_timeline,
    }

    # Directory to save plots locally before logging to MLflow
    output_dir = "optuna_visualizations"
    os.makedirs(output_dir, exist_ok=True)

    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=ExperimentalWarning)

        for plot_name, plot_function in visualizations.items():
            try:
                if plot_name in ["contour_plot", "slice_plot"]:
                    # Ensure parameters exist in the study
                    params = list(study.best_params.keys())
                    if len(params) < 2:
                        # print(f"Skipping {plot_name}: Requires at least two parameters in the study.")
                        continue

                    # Plot all combinations of parameter pairs
                    for param_pair in combinations(params, 2):
                        plt.figure(figsize=(10, 6))
                        plot_function(study, params=list(param_pair))

                        # Adjust legend placement for slice_plot
                        if plot_name == "slice_plot":
                            # Adjust legend placement to avoid overlapping with the color bar
                            legend = plt.legend(loc='upper left', bbox_to_anchor=(0.15, 1), borderaxespad=0.) #1.15, 1
                            # Reduce the layout tightness to accommodate the legend placement
                            # plt.tight_layout(rect=[0, 0, 0.85, 1])
                            # plt.figure() # figsize=(10, 6)

                        # Save the plot
                        plot_path = os.path.join(output_dir, f"{plot_name}_{'_'.join(param_pair)}.png")
                        #plt.tight_layout()
                        plt.savefig(plot_path)
                        plt.close()

                        # Log the artifact to MLflow if required
                        if log_to_mlflow and self.logger:
                            self.logger.log_image_to_mlflow(plot_path)

                        # print(f"Successfully generated and saved: {plot_name} for {param_pair}")

                elif plot_name == "pareto_front" or plot_name == "hypervolume_history":
                    # Multi-objective specific plots
                    if len(study.directions) < 2:
                        # print(f"Skipping {plot_name}: Applicable only for multi-objective studies.")
                        continue
                    plt.figure() # figsize=(10, 6)
                    if plot_name == "hypervolume_history":
                        reference_point = [100] * len(study.directions)
                        plot_function(study, reference_point=reference_point)
                    else:
                        plot_function(study)

                    # Save the plot
                    plot_path = os.path.join(output_dir, f"{plot_name}.png")
                    # plt.tight_layout()
                    plt.savefig(plot_path)
                    plt.close()

                    # Log the artifact to MLflow if required
                    if log_to_mlflow and self.logger:
                        self.logger.log_image_to_mlflow(plot_path)

                    # print(f"Successfully generated and saved: {plot_name}")

                elif plot_name == "intermediate_values":
                    # Ensure study includes pruning to utilize intermediate values
                    if not any(t.intermediate_values for t in study.trials):
                        # print(f"Skipping {plot_name}: No intermediate values found in the study.")
                        continue
                    plt.figure() # figsize=(10, 6)
                    plot_function(study)

                    # Save the plot
                    plot_path = os.path.join(output_dir, f"{plot_name}.png")
                    # plt.tight_layout()
                    plt.savefig(plot_path)
                    plt.close()

                    # Log the artifact to MLflow if required
                    if log_to_mlflow and self.logger:
                        self.logger.log_image_to_mlflow(plot_path)

                    # print(f"Successfully generated and saved: {plot_name}")

                elif plot_name == "rank_plot":
                    plt.figure() # figsize=(10, 6)
                    plt.rcParams["figure.figsize"] = (10, 6)
                    plot_function(study)

                    # Add legend
                    #plt.legend()
                    plot_path = os.path.join(output_dir, f"{plot_name}.png")
                    plt.savefig(plot_path)
                    plt.close()

                elif plot_name == "parallel_coordinate":
                    plt.figure()  # You can set figsize=(10, 6) if needed
                    plot_function(study)
                    if plot_name.lower() == "parallel_coordinate":
                        plt.xticks(rotation=45, ha='right')

                        # Adjust layout and save figure with tight bounding box
                    plt.tight_layout()
                    # Save the plot
                    plot_path = os.path.join(output_dir, f"{plot_name}.png")
                    # plt.tight_layout()
                    plt.savefig(plot_path)
                    plt.close()

                else:
                    # General case for all other plots
                    plt.figure() # figsize=(10, 6)
                    plot_function(study)

                    # Add legend
                    plt.legend()
                    # plt.tight_layout()
                    # ##del
                    # plt.show()
                    # plt.close()

                    # Save the plot
                    plot_path = os.path.join(output_dir, f"{plot_name}.png")
                    # plt.tight_layout()
                    plt.savefig(plot_path)
                    plt.close()

                    # Log the artifact to MLflow if required
                    if log_to_mlflow and self.logger:
                        self.logger.log_image_to_mlflow(plot_path)

                    # print(f"Successfully generated and saved: {plot_name}")

            except Exception as e:
                print(f"Error generating {plot_name}: {e}")
                plt.close()

    # Clean up temporary files if logged to MLflow
    if log_to_mlflow:
        shutil.rmtree(output_dir, ignore_errors=True)

plot_yield_bins(yield_column, bin_size=None, positive_threshold=None, title=None)

Plot yield bins and optionally compare with a positive threshold.

Parameters:

Name Type Description Default
yield_column str

The name of the yield column in the dataset.

required
bin_size int

The size of the bins for yield distribution. If not specified, only positive and negative bins will be plotted.

None
positive_threshold float

The threshold separating negative and positive data.

None
title str

The title of the plot.

None
Source code in payn\Visualisation\visualisation.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def plot_yield_bins(self, yield_column: str, bin_size: Optional[int] = None, positive_threshold: Optional[float] = None, title: Optional[str] = None):
    """
    Plot yield bins and optionally compare with a positive threshold.

    Args:
        yield_column (str): The name of the yield column in the dataset.
        bin_size (int, optional): The size of the bins for yield distribution. If not specified, only positive and negative bins will be plotted.
        positive_threshold (float, optional): The threshold separating negative and positive data.
        title (str, optional): The title of the plot.
    """
    # Extract the yield data
    yield_data = self.data[yield_column]

    # Set up the figure
    fig, ax = plt.subplots(figsize=(10, 6))

    if bin_size is not None:
        # Calculate bins for the yield distribution
        distribution_bins = np.linspace(0, 100, bin_size + 1)
        n, bins, patches = ax.hist(yield_data, bins=distribution_bins, alpha=0.6, edgecolor='black', label='Yield Distribution')

        # Apply color map
        colormap = plt.cm.get_cmap('viridis')
        for patch, value in zip(patches, bins):
            patch.set_facecolor(colormap(value / 100))

        # Add frequency values on top of each bar for yield distribution
        for i in range(len(n)):
            ax.text(bins[i] + (bins[i + 1] - bins[i]) / 2, n[i] + 0.5, str(n[i]),
                    ha='center', fontsize=8, color='black')

    if positive_threshold is not None:
        # Define two bins: one for negative and one for positive data
        bins = [0, positive_threshold, 100]
        counts, edges = np.histogram(yield_data, bins=bins)

        # Plot the histogram with two adjacent bins behind the yield distribution
        ax.bar(edges[:-1], counts, width=np.diff(edges), align='edge', alpha=0.3, color=['red', 'green'], edgecolor='black',
               label=['Negative Data', 'Positive Data'], zorder=1)

        # Add frequency values on top of each bar for the positive/negative data
        for i in range(len(counts)):
            ax.text(edges[i] + (edges[i + 1] - edges[i]) / 2, counts[i] + 0.5, str(counts[i]),
                    ha='center', fontsize=10, color='black', zorder=2)

    # Title and labels
    if title:
        ax.set_title(title)
    else:
        ax.set_title('Yield Distribution and Positive/Negative Data Separation')
    ax.set_xlabel('Yield (%)')
    ax.set_ylabel('Frequency')
    ax.legend()

    plt.tight_layout()
    plt.show()