忍者ブログ

揺動経路の記録

   

[PR]

×

[PR]上記の広告は3ヶ月以上新規記事投稿のないブログに表示されています。新しい記事を書く事で広告が消えます。

pythonからexe実行

Factorization Machinesバイナリの実行サンプル

-----------------------------------------------------------------------------------
# coding: utf-8
import subprocess

import numpy as np
import pandas as pd


def create_data():
    """
    movie lens sample data 100k
      https://grouplens.org/datasets/movielens/100k/
    """
    data = pd.read_csv('u.data', names = ['user_id', 'mov_id', 'rating', 'time'], sep = '\t')

    traindata = data.iloc[0:len(data)/2,:]
    testdata = data.iloc[len(data)/2:len(data),:]

    idmax = np.max(list(traindata.user_id))
    with open('train.txt', 'w') as fp:
        for i in range(len(traindata)):
            feature_vector = '{0} {1}:1 {2}:1\n'.format(traindata.iloc[i,:].rating, traindata.iloc[i,:].user_id, traindata.iloc[i,:].mov_id + idmax)
            fp.write(feature_vector)

    with open('test.txt', 'w') as fp:
        for i in range(len(testdata)):
            feature_vector = '{0} {1}:1 {2}:1\n'.format(testdata.iloc[i,:].rating, testdata.iloc[i,:].user_id, testdata.iloc[i,:].mov_id + idmax)
            fp.write(feature_vector)

def execute():
    exec_cmd = r'.\libfm\libfm.exe -task r -train train.txt -test test.txt -out result.txt'
    returncode = subprocess.call(exec_cmd, shell=True)
    print returncode

if __name__ == '__main__':
    # create_data()
    execute()
PR

SQLのめも

SQL基礎

https://www.shift-the-oracle.com/sql/aggregate-functions/count.html

SQL, HiveQLの違い

MapReduceを持ちた分散処理を考えると以下のクエリは下にしたほうがよいらしい


①SELECT
count(DISTINCT user_id) FROM access_log


SELECT
 
count(*)
FROM (
  SELECT
 
DISTINCT
  user_id
FROM
  access_log
) t







ランダムフォレスト+特徴量選択

ランダムフォレストの特徴量重要度を元に特徴量選択を実施するサンプル
100回乱択して学習・テストを都度作成して評価を行う。

あまり変化は見られない。iris data setのためもあるかもしれない

----------------------------------------------------------------------------------------------

from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectFromModel

import matplotlib.pylab as plt
import random
import numpy as np
iris = load_iris()
ids = range(0,150)

acc_list = []
for loop in range(0,100):
    random.shuffle(ids)
    train_ids = ids[0:100]
    test_ids = ids[100:len(ids)]
    Xtrain, ytrain = iris.data[train_ids], iris.target[train_ids]
    Xtest, ytest = iris.data[test_ids], iris.target[test_ids]

    clf = RandomForestClassifier()
    clf = clf.fit(Xtrain, ytrain)
    #print "Feature importaces(base) = {0}".format(clf.feature_importances_)

    acc_test = sum([1.0 for a,b in zip(clf.predict(Xtest), ytest) if a == b]) / len(ytest)

    model = SelectFromModel(clf, 'median', prefit=True)
    Xtrain_new = model.transform(Xtrain)
    Xtest_new = model.transform(Xtest)

    clf = RandomForestClassifier()
    clf = clf.fit(Xtrain_new, ytrain)
    acc_test_new = sum([1.0 for a,b in zip(clf.predict(Xtest_new), ytest) if a == b]) / len(ytest)

    print "Feature importaces(SelecFromModel) = {0}".format(clf.feature_importances_)
    acc_list.append([acc_test, acc_test_new])

print np.mean(np.array(acc_list).T[0])
print np.mean(np.array(acc_list).T[1])

plt.plot(np.array(acc_list).T[0], '-bo')
plt.plot(np.array(acc_list).T[1], '-ro')
plt.grid()
plt.show()

----------------------------------------------------------------------------------------------

プロフィール

HN:
stochaotic
性別:
非公開

最新記事

(06/17)
(05/31)
(11/09)
(03/23)
(02/11)

P R

Copyright ©  -- 揺動経路の記録 --  All Rights Reserved
Design by CriCri / Photo by Geralt / powered by NINJA TOOLS / 忍者ブログ / [PR]