Stree
- class stree.Stree(C: float = 1.0, kernel: str = 'linear', max_iter: int = 100000, random_state: Optional[int] = None, max_depth: Optional[int] = None, tol: float = 0.0001, degree: int = 3, gamma='scale', split_criteria: str = 'impurity', criterion: str = 'entropy', min_samples_split: int = 0, max_features=None, splitter: str = 'random', multiclass_strategy: str = 'ovo', normalize: bool = False)[source]
Bases:
BaseEstimator
,ClassifierMixin
Estimator that is based on binary trees of svm nodes can deal with sample_weights in predict, used in boosting sklearn methods inheriting from BaseEstimator implements get_params and set_params methods inheriting from ClassifierMixin implement the attribute _estimator_type with “classifier” as value
Parameters
- Cfloat, optional
Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive., by default 1.0
- kernelstr, optional
Specifies the kernel type to be used in the algorithm. It must be one of ‘liblinear’, ‘linear’, ‘poly’ or ‘rbf’. liblinear uses [liblinear](https://www.csie.ntu.edu.tw/~cjlin/liblinear/) library and the rest uses [libsvm](https://www.csie.ntu.edu.tw/~cjlin/libsvm/) library through scikit-learn library, by default “linear”
- max_iterint, optional
Hard limit on iterations within solver, or -1 for no limit., by default 1e5
- random_stateint, optional
Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False.Pass an int for reproducible output across multiple function calls, by default None
- max_depthint, optional
Specifies the maximum depth of the tree, by default None
- tolfloat, optional
Tolerance for stopping, by default 1e-4
- degreeint, optional
Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels., by default 3
- gammastr, optional
Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.if gamma=’scale’ (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,if ‘auto’, uses 1 / n_features., by default “scale”
- split_criteriastr, optional
Decides (just in case of a multi class classification) which column (class) use to split the dataset in a node. max_samples is incompatible with ‘ovo’ multiclass_strategy, by default “impurity”
- criterionstr, optional
The function to measure the quality of a split (only used if max_features != num_features). Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain., by default “entropy”
- min_samples_splitint, optional
The minimum number of samples required to split an internal node. 0 (default) for any, by default 0
- max_featuresoptional
The number of features to consider when looking for the split: If int, then consider max_features features at each split. If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split. If “auto”, then max_features= sqrt(n_features). If “sqrt”, then max_features=sqrt(n_features). If “log2”, then max_features=log2(n_features). If None, then max_features= n_features., by default None
- splitterstr, optional
The strategy used to choose the feature set at each node (only used if max_features < num_features). Supported strategies are: “best”: sklearn SelectKBest algorithm is used in every node to choose the max_features best features. “random”: The algorithm generates 5 candidates and choose the best (max. info. gain) of them. “trandom”: The algorithm generates only one random combination. “mutual”: Chooses the best features w.r.t. their mutual info with the label. “cfs”: Apply Correlation-based Feature Selection. “fcbf”: Apply Fast Correlation- Based , by default “random”
- multiclass_strategystr, optional
Strategy to use with multiclass datasets, “ovo”: one versus one. “ovr”: one versus rest, by default “ovo”
- normalizebool, optional
If standardization of features should be applied on each node with the samples that reach it , by default False
Attributes
- classes_ndarray of shape (n_classes,)
The classes labels.
- n_classes_int
The number of classes
- n_iter_int
Max number of iterations in classifier
- depth_int
Max depht of the tree
- n_features_int
The number of features when
fit
is performed.- n_features_in_int
Number of features seen during fit.
- max_features_int
Number of features to use in hyperplane computation
- tree_Node
root of the tree
- X_ndarray
points to the input dataset
- y_ndarray
points to the input labels
References
R. Montañana, J. A. Gámez, J. M. Puerta, “STree: a single multi-class oblique decision tree based on support vector machines.”, 2021 LNAI 12882
- __predict_class(X: array) array
Compute the predicted class for the samples in X. Returns the number of samples of each class in the corresponding leaf node.
Parameters
- Xnp.array
Array of samples
Returns
- np.array
Array of shape (n_samples, n_classes) with the number of samples of each class in the corresponding leaf node
- _more_tags() dict [source]
Required by sklearn to supply features of the classifier make mandatory the labels array
- Returns
the tag required
- Return type
dict
- _train(X: ndarray, y: ndarray, sample_weight: ndarray, depth: int, title: str) Optional[Snode] [source]
Recursive function to split the original dataset into predictor nodes (leaves)
Parameters
- Xnp.ndarray
samples dataset
- ynp.ndarray
samples labels
- sample_weightnp.ndarray
weight of samples. Rescale C per sample.
- depthint
actual depth in the tree
- titlestr
description of the node
Returns
- Optional[Snode]
binary tree
- check_predict(X) array [source]
Checks predict and predict_proba preconditions. If input X is not an np.array convert it to one.
Parameters
- Xnp.ndarray
Array of samples
Returns
- np.array
Array of samples
Raises
- ValueError
If number of features of X is different of the number of features in training data
- fit(X: ndarray, y: ndarray, sample_weight: Optional[array] = None) Stree [source]
Build the tree based on the dataset of samples and its labels
Returns
- Stree
itself to be able to chain actions: fit().predict() …
Raises
- ValueError
if C < 0
- ValueError
if max_depth < 1
- ValueError
if all samples have 0 or negative weights
- nodes_leaves() tuple [source]
Compute the number of nodes and leaves in the built tree
Returns
- [tuple]
tuple with the number of nodes and the number of leaves
- predict(X: array) array [source]
Predict labels for each sample in dataset passed
Parameters
- Xnp.array
dataset of samples
Returns
- np.array
array of labels
Raises
- ValueError
if dataset with inconsistent number of features
- NotFittedError
if model is not fitted
- predict_proba(X: array) array [source]
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
Parameters
X : dataset of samples.
Returns
- probaarray of shape (n_samples, n_classes)
The class probabilities of the input samples.
Raises
- ValueError
if dataset with inconsistent number of features
- NotFittedError
if model is not fitted