Warning

Pygbm’s API and default values are likely to be changed in future version, without any deprecation cycle.

Gradient Boosting Estimators

Gradient Boosting decision trees for classification and regression.

class pygbm.gradient_boosting.GradientBoostingClassifier(loss='auto', learning_rate=0.1, max_iter=100, max_leaf_nodes=31, max_depth=None, min_samples_leaf=20, l2_regularization=0.0, max_bins=256, scoring=None, validation_split=0.1, n_iter_no_change=5, tol=1e-07, verbose=0, random_state=None)[source]

Scikit-learn compatible Gradient Boosting Tree for classification.

Parameters:
  • loss ({'auto', 'binary_crossentropy', 'categorical_crossentropy'}, optional(default='auto')) – The loss function to use in the boosting process. ‘binary_crossentropy’ (also known as logistic loss) is used for binary classification and generalizes to ‘categorical_crossentropy’ for multiclass classification. ‘auto’ will automatically choose either loss depending on the nature of the problem.
  • learning_rate (float, optional(default=1)) – The learning rate, also known as shrinkage. This is used as a multiplicative factor for the leaves values. Use 1 for no shrinkage.
  • max_iter (int, optional(default=100)) – The maximum number of iterations of the boosting process, i.e. the maximum number of trees for binary classification. For multiclass classification, n_classes trees per iteration are built.
  • max_leaf_nodes (int or None, optional(default=None)) – The maximum number of leaves for each tree. If None, there is no maximum limit.
  • max_depth (int or None, optional(default=None)) – The maximum depth of each tree. The depth of a tree is the number of nodes to go from the root to the deepest leaf.
  • min_samples_leaf (int, optional(default=20)) – The minimum number of samples per leaf.
  • l2_regularization (float, optional(default=0)) – The L2 regularization parameter. Use 0 for no regularization.
  • max_bins (int, optional(default=256)) – The maximum number of bins to use. Before training, each feature of the input array X is binned into at most max_bins bins, which allows for a much faster training stage. Features with a small number of unique values may use less than max_bins bins. Must be no larger than 256.
  • scoring (str or callable or None, optional (default=None)) – Scoring parameter to use for early stopping (see sklearn.metrics for available options). If None, early stopping is check w.r.t the loss value.
  • validation_split (int or float or None, optional(default=0.1)) – Proportion (or absolute size) of training data to set aside as validation data for early stopping. If None, early stopping is done on the training data.
  • n_iter_no_change (int or None, optional (default=5)) – Used to determine when to “early stop”. The fitting process is stopped when none of the last n_iter_no_change scores are better than the ``n_iter_no_change - 1``th-to-last one, up to some tolerance. If None or 0, no early-stopping is done.
  • tol (float or None optional (default=1e-7)) – The absolute tolerance to use when comparing scores. The higher the tolerance, the more likely we are to early stop: higher tolerance means that it will be harder for subsequent iterations to be considered an improvement upon the reference score.
  • verbose (int, optional(default=0)) – The verbosity level. If not zero, print some information about the fitting process.
  • random_state (int, np.random.RandomStateInstance or None, optional(default=None)) – Pseudo-random number generator to control the subsampling in the binning process, and the train/validation data split if early stopping is enabled. See scikit-learn glossary.

Examples

>>> from sklearn.datasets import load_iris
>>> from pygbm import GradientBoostingClassifier
>>> X, y = load_iris(return_X_y=True)
>>> clf = GradientBoostingClassifier().fit(X, y)
>>> clf.score(X, y)
0.97...
fit(X, y)

Fit the gradient boosting model.

Parameters:
  • X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned and the prediction methods (predict, predict_proba) will only accept pre-binned data as well.
  • y (array-like, shape=(n_samples,)) – Target values.
Returns:

self

Return type:

object

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
predict(X)[source]

Predict classes for X.

Parameters:X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned and the estimator must have been fitted with pre-binned data.
Returns:y – The predicted classes.
Return type:array, shape (n_samples,)
predict_proba(X)[source]

Predict class probabilities for X.

Parameters:X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned and the estimator must have been fitted with pre-binned data.
Returns:p – The class probabilities of the input samples.
Return type:array, shape (n_samples, n_classes)
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Test samples.
  • y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True labels for X.
  • sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
Returns:

score – Mean accuracy of self.predict(X) wrt. y.

Return type:

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
class pygbm.gradient_boosting.GradientBoostingRegressor(loss='least_squares', learning_rate=0.1, max_iter=100, max_leaf_nodes=31, max_depth=None, min_samples_leaf=20, l2_regularization=0.0, max_bins=256, scoring=None, validation_split=0.1, n_iter_no_change=5, tol=1e-07, verbose=0, random_state=None)[source]

Scikit-learn compatible Gradient Boosting Tree for regression.

Parameters:
  • loss ({'least_squares'}, optional(default='least_squares')) – The loss function to use in the boosting process.
  • learning_rate (float, optional(default=0.1)) – The learning rate, also known as shrinkage. This is used as a multiplicative factor for the leaves values. Use 1 for no shrinkage.
  • max_iter (int, optional(default=100)) – The maximum number of iterations of the boosting process, i.e. the maximum number of trees.
  • max_leaf_nodes (int or None, optional(default=None)) – The maximum number of leaves for each tree. If None, there is no maximum limit.
  • max_depth (int or None, optional(default=None)) – The maximum depth of each tree. The depth of a tree is the number of nodes to go from the root to the deepest leaf.
  • min_samples_leaf (int, optional(default=20)) – The minimum number of samples per leaf.
  • l2_regularization (float, optional(default=0)) – The L2 regularization parameter. Use 0 for no regularization.
  • max_bins (int, optional(default=256)) – The maximum number of bins to use. Before training, each feature of the input array X is binned into at most max_bins bins, which allows for a much faster training stage. Features with a small number of unique values may use less than max_bins bins. Must be no larger than 256.
  • scoring (str or callable or None, optional (default=None)) – Scoring parameter to use for early stopping (see sklearn.metrics for available options). If None, early stopping is check w.r.t the loss value.
  • validation_split (int or float or None, optional(default=0.1)) – Proportion (or absolute size) of training data to set aside as validation data for early stopping. If None, early stopping is done on the training data.
  • n_iter_no_change (int or None, optional (default=5)) – Used to determine when to “early stop”. The fitting process is stopped when none of the last n_iter_no_change scores are better than the ``n_iter_no_change - 1``th-to-last one, up to some tolerance. If None or 0, no early-stopping is done.
  • tol (float or None optional (default=1e-7)) – The absolute tolerance to use when comparing scores. The higher the tolerance, the more likely we are to early stop: higher tolerance means that it will be harder for subsequent iterations to be considered an improvement upon the reference score.
  • verbose (int, optional (default=0)) – The verbosity level. If not zero, print some information about the fitting process.
  • random_state (int, np.random.RandomStateInstance or None, optional (default=None)) –

    Pseudo-random number generator to control the subsampling in the binning process, and the train/validation data split if early stopping is enabled. See scikit-learn glossary.

Examples

>>> from sklearn.datasets import load_boston
>>> from pygbm import GradientBoostingRegressor
>>> X, y = load_boston(return_X_y=True)
>>> est = GradientBoostingRegressor().fit(X, y)
>>> est.score(X, y)
0.92...
fit(X, y)

Fit the gradient boosting model.

Parameters:
  • X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned and the prediction methods (predict, predict_proba) will only accept pre-binned data as well.
  • y (array-like, shape=(n_samples,)) – Target values.
Returns:

self

Return type:

object

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
predict(X)[source]

Predict values for X.

Parameters:X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned and the estimator must have been fitted with pre-binned data.
Returns:y – The predicted values.
Return type:array, shape (n_samples,)
score(X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.
  • y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True values for X.
  • sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
Returns:

score – R^2 of self.predict(X) wrt. y.

Return type:

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self