본문 바로가기
인공지능/Machine Learning

[ML] Tree Based Learning Algorithms - Decision Trees

by 유일리 2022. 11. 30.
  • Tree-based learning algorithms, also known as Cart (Classification and Regression Trees), are a popular technique for predicting numeric and categorical outputs.
  • Tree-based methods, which include decision trees, bagging, random forests, and boosting, are considered highly effective in the space of supervised learning.
  • This is partly due to their high accuracy and versatility as they can be used to predict both discrete and continuous outcomes.

Decision Trees

  • Decision trees create a decision structure to interpret patterns by splitting data into groups using variables that best split the data into homogenous or numerically relevant groups based on entropy (a measure of variance in the data among different classes).
  • The primary appeal of decision trees is they can be displayed graphically as a tree-like graph.
  • Unlike an actual tree, the decision tree is displayed upside down with the leaves located at the bottom or foot of the free.
  • Each branch represents the outcome of a decision/variable and each leaf node represents a class label, such as “Go to beach” or “Stay in.”
  • Decision rules are subsequently marked by the path from the root of the tree to a terminal leaf node.

Decision Tree for “What to do today?”

Example

Let’s use a decision tree classifier to predict the outcome of a user clicking on an advert using the advertising dataset.

advertising.csv
0.10MB

1-2. Import libraries/ Dataset

import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix

df = pd.read_csv('/content/advertising.csv')

3. Convert non-numeric variables

df = pd.get_dummies(df, columns=['Country', 'City'])

4. Remove columns

del df['Ad Topic Line']
del df['Timestamp']

df.head()

5. Set X and y variables

X = df.drop('Clicked on Ad',axis=1)
y = df['Clicked on Ad']
 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=10, shuffle=True)

6. Set algorithm

model = DecisionTreeClassifier()
model.fit(X_train,y_train)

7. Evaluate

model_predict = model.predict(X_test)

print(confusion_matrix(y_test, model_predict))
print(classification_report(y_test, model_predict))

 

https://github.com/erica00j/machinelearning/blob/main/decision_Tree.ipynb

 

GitHub - erica00j/machinelearning

Contribute to erica00j/machinelearning development by creating an account on GitHub.

github.com

 

댓글