ランダムフォレストの特徴量重要度を元に特徴量選択を実施するサンプル
100回乱択して学習・テストを都度作成して評価を行う。
あまり変化は見られない。iris data setのためもあるかもしれない
----------------------------------------------------------------------------------------------
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectFromModel
import matplotlib.pylab as plt
import random
import numpy as np
iris = load_iris()
ids = range(0,150)
acc_list = []
for loop in range(0,100):
random.shuffle(ids)
train_ids = ids[0:100]
test_ids = ids[100:len(ids)]
Xtrain, ytrain = iris.data[train_ids], iris.target[train_ids]
Xtest, ytest = iris.data[test_ids], iris.target[test_ids]
clf = RandomForestClassifier()
clf = clf.fit(Xtrain, ytrain)
#print "Feature importaces(base) = {0}".format(clf.feature_importances_)
acc_test = sum([1.0 for a,b in zip(clf.predict(Xtest), ytest) if a == b]) / len(ytest)
model = SelectFromModel(clf, 'median', prefit=True)
Xtrain_new = model.transform(Xtrain)
Xtest_new = model.transform(Xtest)
clf = RandomForestClassifier()
clf = clf.fit(Xtrain_new, ytrain)
acc_test_new = sum([1.0 for a,b in zip(clf.predict(Xtest_new), ytest) if a == b]) / len(ytest)
print "Feature importaces(SelecFromModel) = {0}".format(clf.feature_importances_)
acc_list.append([acc_test, acc_test_new])
print np.mean(np.array(acc_list).T[0])
print np.mean(np.array(acc_list).T[1])
plt.plot(np.array(acc_list).T[0], '-bo')
plt.plot(np.array(acc_list).T[1], '-ro')
plt.grid()
plt.show()
----------------------------------------------------------------------------------------------
PR