Python graph tree visualization

Visualizing Decision Trees with Python (Scikit-learn, Graphviz, Matplotlib)

Learn about how to visualize decision trees using matplotlib and Graphviz

Decision trees are a popular supervised learning method for a variety of reasons. Benefits of decision trees include that they can be used for both regression and classification, they don’t require feature scaling, and they are relatively easy to interpret as you can visualize decision trees. This is not only a powerful way to understand your model, but also to communicate how your model works. Consequently, it would help to know how to make a visualization based on your model.

  • How to Fit a Decision Tree Model using Scikit-Learn
  • How to Visualize Decision Trees using Matplotlib
  • How to Visualize Decision Trees using Graphviz (what is Graphviz, how to install it on Mac and Windows, and how to use it to visualize decision trees)
  • How to Visualize Individual Decision Trees from Bagged Trees or Random Forests

As always, the code used in this tutorial is available on my GitHub. With that, let’s get started!

How to Fit a Decision Tree Model using Scikit-Learn

In order to visualize decision trees, we need first need to fit a decision tree model using scikit-learn. If this section is not clear, I encourage you to read my Understanding Decision Trees for Classification (Python) tutorial as I go into a lot of detail on how decision trees work and how to use them.

Import Libraries

The following import statements are what we will use for this section of the tutorial.

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn import tree

Источник

Visualize a Decision Tree in 4 Ways with Scikit-Learn and Python

Run Mercury to convert Jupyter Notebooks to web apps

Visualize Decision Tree in Python banner

A Decision Tree is a supervised algorithm used in machine learning. It is using a binary tree graph (each node has two children) to assign for each data sample a target value. The target values are presented in the tree leaves. To reach to the leaf, the sample is propagated through nodes, starting at the root node. In each node a decision is made, to which descendant node it should go. A decision is made based on the selected sample’s feature. Decision Tree learning is a process of finding the optimal rules in each internal tree node according to the selected metric.

Читайте также:  Complete program in python

The decision trees can be divided, with respect to the target values, into:

  • Classification trees used to classify samples, assign to a limited set of values — classes. In scikit-learn it is DecisionTreeClassifier .
  • Regression trees used to assign samples into numerical values within the range. In scikit-learn it is DecisionTreeRegressor .

Decision trees are a popular tool in decision analysis. They can support decisions thanks to the visual representation of each decision.

Below I show 4 ways to visualize Decision Tree in Python:

  • print text representation of the tree with sklearn.tree.export_text method
  • plot with sklearn.tree.plot_tree method (matplotlib needed)
  • plot with sklearn.tree.export_graphviz method (graphviz needed)
  • plot with dtreeviz package (dtreeviz and graphviz needed)

I will show how to visualize trees on classification and regression tasks.

Train Decision Tree on Classification Task

I will train a DecisionTreeClassifier on iris dataset. I will use default hyper-parameters for the classifier.

from matplotlib import pyplot as plt from sklearn import datasets from sklearn.tree import DecisionTreeClassifier from sklearn import tree 
# Prepare the data data iris = datasets.load_iris() X = iris.data y = iris.target 
# Fit the classifier with default hyper-parameters clf = DecisionTreeClassifier(random_state=1234) model = clf.fit(X, y) 

Exporting Decision Tree to the text representation can be useful when working on applications whitout user interface or when we want to log information about the model into the text file. You can check details about export_text in the sklearn docs.

text_representation = tree.export_text(clf) print(text_representation) 
|--- feature_2 2.45 | |--- feature_3 1.65 | | | | |--- class: 2 | | |--- feature_2 > 4.95 | | | |--- feature_3 1.55 | | | | |--- feature_0 6.95 | | | | | |--- class: 2 | |--- feature_3 > 1.75 | | |--- feature_2 3.10 | | | | |--- class: 1 | | |--- feature_2 > 4.85 | | | |--- class: 2 

If you want to save it to the file, it can be done with following code:

with open("decistion_tree.log", "w") as fout: fout.write(text_representation) 

Plot Tree with plot_tree

The plot_tree method was added to sklearn in version 0.21 . It requires matplotlib to be installed. It allows us to easily produce figure of the tree (without intermediate exporting to graphviz) The more information about plot_tree arguments are in the docs.

fig = plt.figure(figsize=(25,20)) _ = tree.plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True) 

Decision Tree visualization with plot_tree from sklearn

(The plot_tree returns annotations for the plot, to not show them in the notebook I assigned returned value to _ .)

To save the figure to the .png file:

fig.savefig("decistion_tree.png") 

Please notice that I’m using filled=True in the plot_tree . When this parameter is set to True the method uses color to indicate the majority of the class. (It will be nice if there will be some legend with class and color matching.)

Visualize Decision Tree with graphviz

Please make sure that you have graphviz installed ( pip install graphviz ). To plot the tree first we need to export it to DOT format with export_graphviz method (link to docs). Then we can plot it in the notebook or save to the file.

import graphviz # DOT data dot_data = tree.export_graphviz(clf, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names, filled=True) # Draw graph graph = graphviz.Source(dot_data, format="png") graph 
graph.render("decision_tree_graphivz") 
'decision_tree_graphivz.png' 

Plot Decision Tree with dtreeviz Package

The dtreeviz package is available in github. It can be installed with pip install dtreeviz . It requires graphviz to be installed (but you dont need to manually convert between DOT files and images). To plot the tree just run:

from dtreeviz.trees import dtreeviz # remember to load the package viz = dtreeviz(clf, X, y, target_name="target", feature_names=iris.feature_names, class_names=list(iris.target_names)) viz 

Save visualization to the file:

Читайте также:  Directoryindex index html var

Visualizing the Decision Tree in Regression Task

Below, I present all 4 methods for DecisionTreeRegressor from scikit-learn package (in python of course).

from sklearn import datasets from sklearn.tree import DecisionTreeRegressor from sklearn import tree 
# Prepare the data data boston = datasets.load_boston() X = boston.data y = boston.target 

To keep the size of the tree small, I set max_depth = 3 .

# Fit the regressor, set max_depth = 3 regr = DecisionTreeRegressor(max_depth=3, random_state=1234) model = regr.fit(X, y) 
text_representation = tree.export_text(regr) print(text_representation) 
|--- feature_5 1.38 | | | |--- value: [22.91] | |--- feature_12 > 14.40 | | |--- feature_0 6.99 | | | |--- value: [11.98] |--- feature_5 > 6.94 | |--- feature_5 0.66 | | | |--- value: [14.40] | |--- feature_5 > 7.44 | | |--- feature_10 19.65 | | | |--- value: [21.90] 
fig = plt.figure(figsize=(25,20)) _ = tree.plot_tree(regr, feature_names=boston.feature_names, filled=True) 

Decision Tree Regressor visualized with plot_tree

Please notice, that the color of the leaf is coresponding to the predicted value.

dot_data = tree.export_graphviz(regr, out_file=None, feature_names=boston.feature_names, filled=True) graphviz.Source(dot_data, format="png") 
from dtreeviz.trees import dtreeviz # remember to load the package viz = dtreeviz(regr, X, y, target_name="target", feature_names=boston.feature_names) viz 

From above methods my favourite is visualizing with dtreeviz package. I like it becuause:

  • it shows the distribution of decision feature in the each node (nice!)
  • it shows the class-color matching legend
  • it shows the distribution of the class in the leaf in case of classification tasks, and mean of the leaf’s reponse in the case of regression tasks

It would be great to have dtreeviz visualization in the interactive mode, so the user can dynamically change the depth of the tree. I’m using dtreeviz package in my Automated Machine Learning (autoML) Python package mljar-supervised . You can check the details of the implementation in the github repository. One important thing is, that in my AutoML package I’m not using decision trees with max_depth greater than 4 . I add this limit to not have too large trees, which in my opinion loose the ability of clear understanding what’s going on in the model. Below is the example of the markdown report for Decision Tree generated by mljar-supervised .

Источник

Visualize Trees in Python

Visualize Trees in Python

  1. Visualize Trees in Python
  2. Install the Graphviz Package
  3. Convert a Tree to a Dot File
  4. Convert a Dot File to an Image
Читайте также:  Jquery ui wordpress css

For many different reasons, decision trees are a common supervised learning technique. The advantages of decision trees include that we can use them for both classification and regression, that they don’t require feature scaling, and that decision trees are straightforward to read.

This article demonstrated Python’s Graphviz to display decision trees.

Visualize Trees in Python

Graphviz , or graph visualization, is open-source software that represents structural information as diagrams of abstract graphs and networks. For example, one use of Graphviz in data science is visualizing decision trees.

Install the Graphviz Package

Installing Graphviz is often necessary to convert the dot file into an image file (PNG, JPG, SVG, etc.), which depends on your operating system and several other factors.

If you have received the error dot: command not found , it is possible that you have not installed the dot language as well.

sudo apt install graphviz 

For more installation procedures like a manual download of executable files and other versions, you may go to the official Graphviz documentation.

Once installed, verify if your Graphviz installation has the dot.exe file inside its installation directory, as we will need it later to generate a visualization of dot files. Commonly, the default installation path is under C:\Program Files\Graphviz\bin .

Convert a Tree to a Dot File

One of the requirements of Graphviz is the tree in dot format, but first, we need a sample tree. We may go to this article on how trees work in Python, or we can use the command below for the convenience of going through this article.

tree = 'digraph G World>'  open('sample.dot','w').write(tree) 

The snippet of code above writes a simple Hello->World 2-node tree structure to a sample.dot file that sufficed our requirement. However, if we want a more complex tree, we can use this online tree maker that also conveniently lets us download it directly in dot format.

Convert a Dot File to an Image

Now that we have our dot file, we will need to run it inside Graphviz using the dot application. Earlier, we asked you to take note of our Graphviz dot.exe path as we will need it in this section.

Go to the dot.exe path using the command cd inside the command prompt.

cd C:\Program Files\Graphviz\bin 

We are doing this because, most of the time, the Graphviz installation doesn’t include dot in our environment variables, so we can use the dot.exe to run the following snippet of code.

# dot.exe -Tpng -o dot.exe -Tpng C:\PS\sample.dot -o C:\PS\sample.png 

Once executed, this should yield the use of the visual representation of our tree in PNG format.

Sample Tree From Sample.dot file

For the complete information on dot parameters, we may go to the dot command official documentation.

Marion specializes in anything Microsoft-related and always tries to work and apply code in an IT infrastructure.

Related Article — Python Tree

Источник

Оцените статью