Warning

Pygbm’s API and default values are likely to be changed in future version, without any deprecation cycle.

# Gradient Boosting Estimators¶

Gradient Boosting decision trees for classification and regression.

class pygbm.gradient_boosting.GradientBoostingClassifier(loss='auto', learning_rate=0.1, max_iter=100, max_leaf_nodes=31, max_depth=None, min_samples_leaf=20, l2_regularization=0.0, max_bins=256, scoring=None, validation_split=0.1, n_iter_no_change=5, tol=1e-07, verbose=0, random_state=None)[source]

Scikit-learn compatible Gradient Boosting Tree for classification.

Parameters: loss ({'auto', 'binary_crossentropy', 'categorical_crossentropy'}, optional(default='auto')) – The loss function to use in the boosting process. ‘binary_crossentropy’ (also known as logistic loss) is used for binary classification and generalizes to ‘categorical_crossentropy’ for multiclass classification. ‘auto’ will automatically choose either loss depending on the nature of the problem. learning_rate (float, optional(default=1)) – The learning rate, also known as shrinkage. This is used as a multiplicative factor for the leaves values. Use 1 for no shrinkage. max_iter (int, optional(default=100)) – The maximum number of iterations of the boosting process, i.e. the maximum number of trees for binary classification. For multiclass classification, n_classes trees per iteration are built. max_leaf_nodes (int or None, optional(default=None)) – The maximum number of leaves for each tree. If None, there is no maximum limit. max_depth (int or None, optional(default=None)) – The maximum depth of each tree. The depth of a tree is the number of nodes to go from the root to the deepest leaf. min_samples_leaf (int, optional(default=20)) – The minimum number of samples per leaf. l2_regularization (float, optional(default=0)) – The L2 regularization parameter. Use 0 for no regularization. max_bins (int, optional(default=256)) – The maximum number of bins to use. Before training, each feature of the input array X is binned into at most max_bins bins, which allows for a much faster training stage. Features with a small number of unique values may use less than max_bins bins. Must be no larger than 256. scoring (str or callable or None, optional (default=None)) – Scoring parameter to use for early stopping (see sklearn.metrics for available options). If None, early stopping is check w.r.t the loss value. validation_split (int or float or None, optional(default=0.1)) – Proportion (or absolute size) of training data to set aside as validation data for early stopping. If None, early stopping is done on the training data. n_iter_no_change (int or None, optional (default=5)) – Used to determine when to “early stop”. The fitting process is stopped when none of the last n_iter_no_change scores are better than the n_iter_no_change - 1th-to-last one, up to some tolerance. If None or 0, no early-stopping is done. tol (float or None optional (default=1e-7)) – The absolute tolerance to use when comparing scores. The higher the tolerance, the more likely we are to early stop: higher tolerance means that it will be harder for subsequent iterations to be considered an improvement upon the reference score. verbose (int, optional(default=0)) – The verbosity level. If not zero, print some information about the fitting process. random_state (int, np.random.RandomStateInstance or None, optional(default=None)) – Pseudo-random number generator to control the subsampling in the binning process, and the train/validation data split if early stopping is enabled. See scikit-learn glossary.

Examples

>>> from sklearn.datasets import load_iris
>>> from pygbm import GradientBoostingClassifier
>>> X, y = load_iris(return_X_y=True)
>>> clf = GradientBoostingClassifier().fit(X, y)
>>> clf.score(X, y)
0.97...

fit(X, y)

Fit the gradient boosting model.

Parameters: X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned and the prediction methods (predict, predict_proba) will only accept pre-binned data as well. y (array-like, shape=(n_samples,)) – Target values. self object
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. params – Parameter names mapped to their values. mapping of string to any
predict(X)[source]

Predict classes for X.

Parameters: X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned and the estimator must have been fitted with pre-binned data. y – The predicted classes. array, shape (n_samples,)
predict_proba(X)[source]

Predict class probabilities for X.

Parameters: X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned and the estimator must have been fitted with pre-binned data. p – The class probabilities of the input samples. array, shape (n_samples, n_classes)
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters: X (array-like, shape = (n_samples, n_features)) – Test samples. y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True labels for X. sample_weight (array-like, shape = [n_samples], optional) – Sample weights. score – Mean accuracy of self.predict(X) wrt. y. float
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self
class pygbm.gradient_boosting.GradientBoostingRegressor(loss='least_squares', learning_rate=0.1, max_iter=100, max_leaf_nodes=31, max_depth=None, min_samples_leaf=20, l2_regularization=0.0, max_bins=256, scoring=None, validation_split=0.1, n_iter_no_change=5, tol=1e-07, verbose=0, random_state=None)[source]

Scikit-learn compatible Gradient Boosting Tree for regression.

Parameters: loss ({'least_squares'}, optional(default='least_squares')) – The loss function to use in the boosting process. learning_rate (float, optional(default=0.1)) – The learning rate, also known as shrinkage. This is used as a multiplicative factor for the leaves values. Use 1 for no shrinkage. max_iter (int, optional(default=100)) – The maximum number of iterations of the boosting process, i.e. the maximum number of trees. max_leaf_nodes (int or None, optional(default=None)) – The maximum number of leaves for each tree. If None, there is no maximum limit. max_depth (int or None, optional(default=None)) – The maximum depth of each tree. The depth of a tree is the number of nodes to go from the root to the deepest leaf. min_samples_leaf (int, optional(default=20)) – The minimum number of samples per leaf. l2_regularization (float, optional(default=0)) – The L2 regularization parameter. Use 0 for no regularization. max_bins (int, optional(default=256)) – The maximum number of bins to use. Before training, each feature of the input array X is binned into at most max_bins bins, which allows for a much faster training stage. Features with a small number of unique values may use less than max_bins bins. Must be no larger than 256. scoring (str or callable or None, optional (default=None)) – Scoring parameter to use for early stopping (see sklearn.metrics for available options). If None, early stopping is check w.r.t the loss value. validation_split (int or float or None, optional(default=0.1)) – Proportion (or absolute size) of training data to set aside as validation data for early stopping. If None, early stopping is done on the training data. n_iter_no_change (int or None, optional (default=5)) – Used to determine when to “early stop”. The fitting process is stopped when none of the last n_iter_no_change scores are better than the n_iter_no_change - 1th-to-last one, up to some tolerance. If None or 0, no early-stopping is done. tol (float or None optional (default=1e-7)) – The absolute tolerance to use when comparing scores. The higher the tolerance, the more likely we are to early stop: higher tolerance means that it will be harder for subsequent iterations to be considered an improvement upon the reference score. verbose (int, optional (default=0)) – The verbosity level. If not zero, print some information about the fitting process. random_state (int, np.random.RandomStateInstance or None, optional (default=None)) – Pseudo-random number generator to control the subsampling in the binning process, and the train/validation data split if early stopping is enabled. See scikit-learn glossary.

Examples

>>> from sklearn.datasets import load_boston
>>> from pygbm import GradientBoostingRegressor
>>> X, y = load_boston(return_X_y=True)
>>> est = GradientBoostingRegressor().fit(X, y)
>>> est.score(X, y)
0.92...

fit(X, y)

Fit the gradient boosting model.

Parameters: X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned and the prediction methods (predict, predict_proba) will only accept pre-binned data as well. y (array-like, shape=(n_samples,)) – Target values. self object
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. params – Parameter names mapped to their values. mapping of string to any
predict(X)[source]

Predict values for X.

Parameters: X (array-like, shape=(n_samples, n_features)) – The input samples. If X.dtype == np.uint8, the data is assumed to be pre-binned and the estimator must have been fitted with pre-binned data. y – The predicted values. array, shape (n_samples,)
score(X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters: X (array-like, shape = (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator. y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True values for X. sample_weight (array-like, shape = [n_samples], optional) – Sample weights. score – R^2 of self.predict(X) wrt. y. float
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self