
Pygbm’s API and default values are likely to be changed in future version, without any deprecation cycle.

Gradient Boosting Estimators

Gradient Boosting decision trees for classification and regression.

class pygbm.gradient_boosting.GradientBoostingClassifier(loss='auto', learning_rate=0.1, max_iter=100, max_leaf_nodes=31, max_depth=None, min_samples_leaf=20, l2_regularization=0.0, max_bins=256, scoring='neg_log_loss', validation_split=0.1, n_iter_no_change=5, tol=1e-07, verbose=0, random_state=None)[source]

Scikit-learn compatible Gradient Boosting Tree for classification.

  • loss ({'auto', 'binary_crossentropy', 'categorical_crossentropy'}, optional(default='auto')) – The loss function to use in the boosting process. ‘binary_crossentropy’ (also known as logistic loss) is used for binary classification and generalizes to ‘categorical_crossentropy’ for multiclass classification. ‘auto’ will automatically choose either loss depending on the nature of the problem.
  • learning_rate (float, optional(default=1)) – The learning rate, also known as shrinkage. This is used as a multiplicative factor for the leaves values. Use 1 for no shrinkage.
  • max_iter (int, optional(default=100)) – The maximum number of iterations of the boosting process, i.e. the maximum number of trees for binary classification. For multiclass classification, n_classes trees per iteration are built.
  • max_leaf_nodes (int or None, optional(default=None)) – The maximum number of leaves for each tree. If None, there is no maximum limit.
  • max_depth (int or None, optional(default=None)) – The maximum depth of each tree. The depth of a tree is the number of nodes to go from the root to the deepest leaf.
  • min_samples_leaf (int, optional(default=20)) – The minimum number of samples per leaf.
  • l2_regularization (float, optional(default=0)) – The L2 regularization parameter. Use 0 for no regularization.
  • max_bins (int, optional(default=256)) – The maximum number of bins to use. Before training, each feature of the input array X is binned into at most max_bins bins, which allows for a much faster training stage. Features with a small number of unique values may use less than max_bins bins. Must be no larger than 256.
  • scoring (str or callable or None, optional (default='accuracy')) – Scoring parameter to use for early stopping (see sklearn.metrics for available options). If None, no early stopping is done.
  • validation_split (int or float or None, optional(default=0.1)) – Proportion (or absolute size) of training data to set aside as validation data for early stopping. If None, early stopping is done on the whole training data.
  • n_iter_no_change (int, optional (default=5)) – Used to determine when to “early stop”. The fitting process is stopped when none of the last n_iter_no_change scores are better than the ``n_iter_no_change - 1``th-to-last one, up to some tolerance.
  • tol (float or None optional (default=1e-7)) – The absolute tolerance to use when comparing scores. The higher the tolerance, the more likely we are to early stop: higher tolerance means that it will be harder for subsequent iterations to be considered an improvement upon the reference score.
  • verbose (int, optional(default=0)) – The verbosity level. If not zero, print some information about the fitting process.
  • random_state (int, np.random.RandomStateInstance or None, optional(default=None)) – Pseudo-random number generator to control the subsampling in the binning process, and the train/validation data split if early stopping is enabled. See scikit-learn glossary.


>>> from sklearn.datasets import load_iris
>>> from pygbm import GradientBoostingClassifier
>>> X, y = load_iris(return_X_y=True)
>>> clf = GradientBoostingClassifier().fit(X, y)
>>> clf.score(X, y)
fit(X, y)

Fit the gradient boosting model.

  • X (array-like, shape=(n_samples, n_features)) – The input samples.
  • y (array-like, shape=(n_samples,)) – Target values.


Return type:



Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any

Predict classes for X.

Parameters:X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned.
Returns:y – The predicted classes.
Return type:array, shape (n_samples,)

Predict class probabilities for X.

Parameters:X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned.
Returns:p – The class probabilities of the input samples.
Return type:array, shape (n_samples, n_classes)
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

  • X (array-like, shape = (n_samples, n_features)) – Test samples.
  • y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True labels for X.
  • sample_weight (array-like, shape = [n_samples], optional) – Sample weights.

score – Mean accuracy of self.predict(X) wrt. y.

Return type:



Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Return type:self
class pygbm.gradient_boosting.GradientBoostingRegressor(loss='least_squares', learning_rate=0.1, max_iter=100, max_leaf_nodes=31, max_depth=None, min_samples_leaf=20, l2_regularization=0.0, max_bins=256, scoring='neg_mean_squared_error', validation_split=0.1, n_iter_no_change=5, tol=1e-07, verbose=0, random_state=None)[source]

Scikit-learn compatible Gradient Boosting Tree for regression.

  • loss ({'least_squares'}, optional(default='least_squares')) – The loss function to use in the boosting process.
  • learning_rate (float, optional(default=0.1)) – The learning rate, also known as shrinkage. This is used as a multiplicative factor for the leaves values. Use 1 for no shrinkage.
  • max_iter (int, optional(default=100)) – The maximum number of iterations of the boosting process, i.e. the maximum number of trees.
  • max_leaf_nodes (int or None, optional(default=None)) – The maximum number of leaves for each tree. If None, there is no maximum limit.
  • max_depth (int or None, optional(default=None)) – The maximum depth of each tree. The depth of a tree is the number of nodes to go from the root to the deepest leaf.
  • min_samples_leaf (int, optional(default=20)) – The minimum number of samples per leaf.
  • l2_regularization (float, optional(default=0)) – The L2 regularization parameter. Use 0 for no regularization.
  • max_bins (int, optional(default=256)) – The maximum number of bins to use. Before training, each feature of the input array X is binned into at most max_bins bins, which allows for a much faster training stage. Features with a small number of unique values may use less than max_bins bins. Must be no larger than 256.
  • scoring (str or callable or None, optional (default='neg_mean_squared_error')) – Scoring parameter to use for early stopping (see sklearn.metrics for available options). If None, no early stopping is done.
  • validation_split (int or float or None, optional(default=0.1)) – Proportion (or absolute size) of training data to set aside as validation data for early stopping. If None, early stopping is done on the whole training data.
  • n_iter_no_change (int, optional (default=5)) – Used to determine when to “early stop”. The fitting process is stopped when none of the last n_iter_no_change scores are better than the ``n_iter_no_change - 1``th-to-last one, up to some tolerance.
  • tol (float or None optional (default=1e-7)) – The absolute tolerance to use when comparing scores. The higher the tolerance, the more likely we are to early stop: higher tolerance means that it will be harder for subsequent iterations to be considered an improvement upon the reference score.
  • verbose (int, optional (default=0)) – The verbosity level. If not zero, print some information about the fitting process.
  • random_state (int, np.random.RandomStateInstance or None, optional (default=None)) –

    Pseudo-random number generator to control the subsampling in the binning process, and the train/validation data split if early stopping is enabled. See scikit-learn glossary.


>>> from sklearn.datasets import load_boston
>>> from pygbm import GradientBoostingRegressor
>>> X, y = load_boston(return_X_y=True)
>>> est = GradientBoostingRegressor().fit(X, y)
>>> est.score(X, y)
fit(X, y)

Fit the gradient boosting model.

  • X (array-like, shape=(n_samples, n_features)) – The input samples.
  • y (array-like, shape=(n_samples,)) – Target values.


Return type:



Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any

Predict values for X.

Parameters:X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned.
Returns:y – The predicted values.
Return type:array, shape (n_samples,)
score(X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

  • X (array-like, shape = (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.
  • y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True values for X.
  • sample_weight (array-like, shape = [n_samples], optional) – Sample weights.

score – R^2 of self.predict(X) wrt. y.

Return type:



Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Return type:self