Главная
страница 1страница 2

Министерство образования Российской Федерации
МОСКОВСКИЙ ГОСУДАРСТВЕННЫЙ ИНСТИТУТ

ЭЛЕКТРОНИКИ И МАТЕМАТИКИ (ТЕХНИЧЕСКИЙ УНИВЕРСИТЕТ)



ОТЧЕТ О ЛАБОТАРОРНОЙ РАБОТЕ

Методы и средства анализа данных

по теме:


«Система анализа данных WEKA»

Вариант datamining400-57


Руководитель темы ______________ И. Игнатьев

подпись, дата


Исполнитель ______________ П. Степуро

подпись, дата Группа С-75


1. ВВЕДЕНИЕ

2. ОСНОВНАЯ ЧАСТЬ

ЗАДАНИЕ 1. ПОДГОТОВКА ДАННЫХ

ЗАДАНИЕ 2. КЛАССИФИКАЦИЯ

NaiveBayes

ID3


J4.8

SVM(SMO)


ЗАДАНИЕ 3.

3. ЗАКЛЮЧЕНИЕ

4. НАБОР ДАННЫХ

ВВЕДЕНИЕ.
Задачей лабораторной является анализ данных. Для удобства анализа данных используется система анализа данных Weka и её графический интерфейс Explorer. Необходимо было применить заданные методы классификации и сравнить их действие на исходном наборе данных.

Cистема анализа данных Weka написана на Java и представляет собой систему библиотек функции обработки данных, плюс несколько графических интерфейсов к этим библиотекам. Основной интерфейс системы - Explorer. Он позволяет выполнять практически все действия, которые предусмотрены в системе. Именно в нем мы будем работать. 




ОСНОВНАЯ ЧАСТЬ

ЗАДАНИЕ 1: ПОДГОТОВИТЬ ИСХОДНЫЙ ФАЙЛ В ФОРМАТЕ *.arff.
В начале необходимо перевести таблицу, содержащую данные, в формат csv и модифицировать ее.

Модификация состоит в добавлении полей метаданных: в начало файла на отдельных строчках названия зависимости @relation имя, описания атрибутов @attribute (имя , тип, ) и @data перед началом самих данных. Типы данных следующие: численные (numeric, real, integer), перечислимые(nominal) (задаются перечислением вида {i1, ..., in}), строковые (string), дата (date [date format]).

Например, атрибут capital-gain тип numeric, так как это числовые данные, характеризующие заработок. Атрибуты необходимо характеризовать как можно точнее.

Таким образом мы изменили исходный файл, перечислили все атрибуты и можем сохранить файл в формате *.arff.

@RELATION test
@ATTRIBUTE age numeric

@ATTRIBUTE workclass {Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked}

@ATTRIBUTE fnlwgt numeric

@ATTRIBUTE education {Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool}

@ATTRIBUTE education-num numeric

@ATTRIBUTE marital-status {Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse}

@ATTRIBUTE occupation {Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces}

@ATTRIBUTE relationship{Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried}

@ATTRIBUTE race {White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black}

@ATTRIBUTE sex {Female, Male}

@ATTRIBUTE capital-gain numeric

@ATTRIBUTE capital-loss numeric

@ATTRIBUTE hours-per-week numeric

@ATTRIBUTE native-country {United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands}

@ATTRIBUTE income {>50K,<=50K}
@DATA
44, Private, 376072, HS-grad, 9, Married-civ-spouse, Tech-support, Husband, White, Male, 0, 0, 45, United-States, >50K

34, Local-gov, 177675, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 55, United-States, >50K


ЗАДАНИЕ 2: КЛАССИФИЦИРОВАТЬ ИСХОДНЫЕ ДАННЫЕ БАЙЕСОВСКИМ МЕТОДОМ, МЕТОДОМ J4.8, МЕТОДОМ ID3, МЕТОДОМ 1R, МЕТОДОМ SVM.

ПО МЕРЕ НЕОБХОДИМОСТИ ИСПОЛЬЗОВАТЬ ФИЛЬТРЫ.

При помощи кнопки Visualize All представить зависимость переменной от всех атрибутов в графическом виде.



Для автоматической обработки данных используют фильтры. Фильтры делятся на два типа - те, применение которых к данным может вызвать отклонение (supervised) (то есть фактически эти фильтры требуют уже наличия каких-то знаний, полученных от примененного какого-то алгоритма обучения), и те, который можно применять к ещё необработанным данным (unsupervised).



  • RemoveType, Remove - для удаления определенных атрибутов, в том числе и по типу - для нас это полезно, так как не все типы могут быть использованы в различных алгоритмах;

  • Disctretize - для превращения числового атрибута в перечислимый;

  • RemoveUseless - для удаления атрибутов, значения которых стремятся к ключу, то есть варьируются слишком сильно;

  • ReplaceMissingValues - для замещения отсутствующих значений средними по атрибуту;

  • Различный *toBinary - для перевода перечислимых и числовых значений атрибута в группу бинарных атрибутов вида атрибут=значение TRUE|FALSE.

При помощи кнопки Choose выбираем метод классификации. Методов представлено много, но наиболее важны методы линейной регрессий (в разделе functions), наивной байесовской классификации (в разделе bayes), построения деревьев решений (в разделе trees) и построения правил (в разделе rules). Выбрав метод классификации, мы можем исправить значения параметров метода по умолчанию.

Далее необходимо выбрать метод проверки и зависимую переменную. Основным методом является кросс-проверка (cross-validation). Можно также проводить проверку результатов анализа на обучающем множестве(training set), на специальном тестовом множестве (supplied test set) и на тестовой части обучающего множества (Percentage Split).

После этого нажимается кнопка Start. По завершении анализа заполнится окно Output и добавится новая запись в окно Result.

В нашем случае методом проверки является кросс-проверка. Суть ее в том что исходный набор данных разбивается на обучающее и проверочное множества. Далее по обучающему множеству данные классифицируются, а по проверочному проверяются. Таким образом и вычисляется ошибка.

КЛАССИФИКАЦИЯ ДАННЫХ МЕТОДОМ NAIVE BAYES.


Метод NAIVE BAYES использует предположение, что все рассматриваемые переменные независимы друг от друга. Идея алгоритма заключается в том, что формируются правила, в условных частях которых сравниваются все независимые переменные с соответствующими возможными значениями.

Одним из действительных преимуществ данного метода является то, что пропущенные значения не создают никакой проблемы. При подсчете вероятности они просто пропускаются для всех правил, и не влияют на соотношение вероятностей. Значит можно не использовать фильтры.



Анализ: (для населения с заработком больше и меньше 50000)

=== Run information ===
Scheme: weka.classifiers.bayes.NaiveBayes

Relation: test

Instances: 400 (общее количество)

Attributes: 15 (атрибуты)

age

workclass

fnlwgt

education

education-num

marital-status

occupation

relationship

race

sex

capital-gain

capital-loss

hours-per-week

native-country

income

Test mode: 10-fold cross-validation (кросс проверка для оценки ошибки алгоритма)
=== Classifier model (full training set) ===
Naive Bayes Classifier
Class >50K: Prior probability = 0.24(годовой доход больше 50к имеют 24% людей)
age: Normal Distribution. Mean = 43.6962 StandardDev = 9.9212 WeightSum = 96 Precision = 1.0892857142857142

(их средний возраст 43 года 9)

workclass: Discrete Estimator. Counts = 64 15 7 4 7 4 1 1 (Total = 103)

(люди из рабочего класса: 64 – частников,15 работающих вне корпорации, 7 работающих на корпорацию, 4 из федерального управления, 7 из местного управления, 4 из управления штатом,1 из безработных и 1 из еще не работающих)



fnlwgt: Normal Distribution. Mean = 191285.6891 StandardDev = 95275.8814 WeightSum = 96 Precision = 1835.791878172589

(люди, имеющие средний вес в стране 19128595275)



education: Discrete Estimator. Counts = 23 19 1 24 5 5 6 1 1 1 13 1 2 8 1 1 (Total = 112)

(люди, с образованием: 23 бакалавра, 19 выпускников колледжа, 1 11-классник, 24 выпускник высшей школы, 5 выпускников проф.школы, 5 академиков, 6 член ассоциации по профессиональному признаку, 1 9-классника, 1 ученик с 7-8 класс, 1 12-классник, 13 магистров, 1 ученик с 1-4 класс,2 10классника,8 доктора наук, 1 ученик с 5-6 класс, 1 дошкольник)



education-num: Normal Distribution. Mean = 11.6875 StandardDev = 2.3466 WeightSum = 96 Precision = 1.0

(люди, с количеством лет образования 11,7  2,3)



marital-status: Discrete Estimator. Counts = 79 8 10 1 2 2 1 (Total = 103)

occupation: Discrete Estimator. Counts = 7 16 2 10 19 22 2 6 10 2 6 1 5 1 (Total = 109)

relationship: Discrete Estimator. Counts = 10 3 69 16 1 3 (Total = 102)

race: Discrete Estimator. Counts = 92 3 2 2 2 (Total = 101)

sex: Discrete Estimator. Counts = 16 82 (Total = 98)

capital-gain: Normal Distribution. Mean = 1846.0461 StandardDev = 4904.5334 WeightSum = 96 Precision = 1464.6315789473683

capital-loss: Normal Distribution. Mean = 158.85 StandardDev = 573.2855 WeightSum = 96 Precision = 282.4

hours-per-week: Normal Distribution. Mean = 45.0523 StandardDev = 10.7546 WeightSum = 96 Precision = 1.9534883720930232

native-country: Discrete Estimator. Counts = 88 1 1 1 3 2 1 1 2 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (Total = 135)
(точно такой же анализ мы видим для населения получающего меньше 50к в) год
Class <=50K: Prior probability = 0.76
age: Normal Distribution. Mean = 36.5269 StandardDev = 13.8907 WeightSum = 304 Precision = 1.0892857142857142

workclass: Discrete Estimator. Counts = 236 22 8 4 13 10 1 1 (Total = 295)

fnlwgt: Normal Distribution. Mean = 197963.5834 StandardDev = 103982.0527 WeightSum = 304 Precision = 1835.791878172589

education: Discrete Estimator. Counts = 35 87 12 97 1 7 12 11 8 5 12 8 17 1 6 1 (Total = 320)

education-num: Normal Distribution. Mean = 9.2928 StandardDev = 2.5822 WeightSum = 304 Precision = 1.0

marital-status: Discrete Estimator. Counts = 110 50 122 15 7 6 1 (Total = 311)

occupation: Discrete Estimator. Counts = 11 48 44 30 28 22 11 30 35 10 22 4 5 1 (Total = 301)

relationship: Discrete Estimator. Counts = 15 72 93 80 16 34 (Total = 310)

race: Discrete Estimator. Counts = 260 9 4 3 33 (Total = 309)

sex: Discrete Estimator. Counts = 113 193 (Total = 306)

capital-gain: Normal Distribution. Mean = 120.4467 StandardDev = 633.8929 WeightSum = 304 Precision = 1464.6315789473683

capital-loss: Normal Distribution. Mean = 37.1579 StandardDev = 247.093 WeightSum = 304 Precision = 282.4

hours-per-week: Normal Distribution. Mean = 37.9067 StandardDev = 11.7867 WeightSum = 304 Precision = 1.9534883720930232

native-country: Discrete Estimator. Counts = 270 1 2 4 1 3 1 2 1 3 1 2 2 1 1 3 1 2 1 1 8 2 2 1 2 2 1 1 2 1 1 3 1 1 1 1 2 1 1 1 1 (Total = 339)
Time taken to build model: 0.03 seconds (время затраченное на анализ)
=== Stratified cross-validation ===

=== Summary ===
Correctly Classified Instances 329 82.25 % (правильно исследованных данных)

Incorrectly Classified Instances 71 17.75 % (данные исследованные с ошибками)

(проценты правильно исследованных данных определяют точность алгоритма)

Kappa statistic 0.4616

Mean absolute error 0.1923 (ошибки)

Root mean squared error 0.3929

Relative absolute error 52.5958 % (относительная)

Root relative squared error 91.9981 % (квадратичная)

Total Number of Instances 400 (общее количество)
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class

0.49 0.072 0.681 0.49 0.57 >50K

0.928 0.51 0.852 0.928 0.888 <=50K
=== Confusion Matrix ===
a b <-- classified as

47 49 | a = >50K

22 282 | b = <=50K


КЛАССИФИКАЦИЯ ДАННЫХ МЕТОДОМ ID3.


Здесь используется метод построения деревьев решений. Для классификации данных в этом методе множество объектов разбивают из обучающей выборки, относящиеся к одинаковым классам. Для построения дерева необходимо правильно выбирать независимую переменную по которой будет происходить разбиение внутренних узлов дерева. Для алгоритма ID3 необходимо выбрать такую переменную, чтобы при разбиении по ней один из классов имел наибольшую вероятность появления.
Во входном наборе алгоритм требует только номинальные значения переменных, а также чтобы не было пропущенных значений. Поэтому применяем фильтры – RemoveType (для удаления атрибутов типа «numeric») и ReplaceMissingValues (для замещения отсутствующих значений средними по атрибуту).
Для большей точности мы использовали метод проверки результатов анализа - training set (проверку результатов анализа на обучающем множестве).
Анализ:

=== Run information ===


Scheme: weka.classifiers.trees.Id3

Relation: test-weka.filters.unsupervised.attribute.Normalize-weka.filters.unsupervised.attribute.RemoveType-Tnumeric-weka.filters.unsupervised.attribute.ReplaceMissingValues

Instances: 400

Attributes: 9

workclass

education

marital-status

occupation

relationship

race


sex

native-country

income

Test mode: 10-fold cross-validation


=== Classifier model (full training set) ===
Id3

education = Bachelors

| relationship = Wife

| | occupation = Tech-support: null

| | occupation = Craft-repair: null

| | occupation = Other-service: <=50K

| | occupation = Sales: null

| | occupation = Exec-managerial: null

| | occupation = Prof-specialty: >50K

| | occupation = Handlers-cleaners: null

| | occupation = Machine-op-inspct: null

| | occupation = Adm-clerical: null

| | occupation = Farming-fishing: null

| | occupation = Transport-moving: null

| | occupation = Priv-house-serv: null

| | occupation = Protective-serv: null

| | occupation = Armed-Forces: null

| relationship = Own-child: <=50K

| relationship = Husband

| | occupation = Tech-support: null

| | occupation = Craft-repair: >50K

| | occupation = Other-service: null

| | occupation = Sales: >50K

| | occupation = Exec-managerial

| | | native-country = United-States

| | | | workclass = Private: >50K

| | | | workclass = Self-emp-not-inc: >50K

| | | | workclass = Self-emp-inc: null

| | | | workclass = Federal-gov: null

| | | | workclass = Local-gov: >50K

| | | | workclass = State-gov: null

| | | | workclass = Without-pay: null

| | | | workclass = Never-worked: null

| | | native-country = Cambodia: null

| | | native-country = England: null

| | | native-country = Puerto-Rico: null

| | | native-country = Canada: null

| | | native-country = Germany: <=50K

| | | native-country = Outlying-US(Guam-USVI-etc): null

| | | native-country = India: null

| | | native-country = Japan: >50K

| | | native-country = Greece: >50K

| | | native-country = South: null

| | | native-country = China: null

| | | native-country = Cuba: null

| | | native-country = Iran: null

| | | native-country = Honduras: null

| | | native-country = Philippines: null

| | | native-country = Italy: null

| | | native-country = Poland: null

| | | native-country = Jamaica: null

| | | native-country = Vietnam: null

| | | native-country = Mexico: null

| | | native-country = Portugal: null

| | | native-country = Ireland: null

| | | native-country = France: null

| | | native-country = Dominican-Republic: null

| | | native-country = Laos: null

| | | native-country = Ecuador: null

| | | native-country = Taiwan: null

| | | native-country = Haiti: null

| | | native-country = Columbia: null

| | | native-country = Hungary: null

| | | native-country = Guatemala: null

| | | native-country = Nicaragua: null

| | | native-country = Scotland: null

| | | native-country = Thailand: null

| | | native-country = Yugoslavia: null

| | | native-country = El-Salvador: null

| | | native-country = Trinadad&Tobago: null

| | | native-country = Peru: null

| | | native-country = Hong: null

| | | native-country = Holand-Netherlands: null

| | occupation = Prof-specialty

| | | workclass = Private

| | | | native-country = United-States: >50K

| | | | native-country = Cambodia: null

| | | | native-country = England: null

| | | | native-country = Puerto-Rico: null

| | | | native-country = Canada: >50K

| | | | native-country = Germany: null

| | | | native-country = Outlying-US(Guam-USVI-etc): null

| | | | native-country = India: null

| | | | native-country = Japan: null

| | | | native-country = Greece: null

| | | | native-country = South: null

| | | | native-country = China: null

| | | | native-country = Cuba: null

| | | | native-country = Iran: null

| | | | native-country = Honduras: null

| | | | native-country = Philippines: null

| | | | native-country = Italy: null

| | | | native-country = Poland: null

| | | | native-country = Jamaica: null

| | | | native-country = Vietnam: null

| | | | native-country = Mexico: null

| | | | native-country = Portugal: null

| | | | native-country = Ireland: null

| | | | native-country = France: null

| | | | native-country = Dominican-Republic: null

| | | | native-country = Laos: null

| | | | native-country = Ecuador: null

| | | | native-country = Taiwan: null

| | | | native-country = Haiti: null

| | | | native-country = Columbia: null

| | | | native-country = Hungary: null

| | | | native-country = Guatemala: null

| | | | native-country = Nicaragua: null

| | | | native-country = Scotland: null

| | | | native-country = Thailand: null

| | | | native-country = Yugoslavia: null

| | | | native-country = El-Salvador: null

| | | | native-country = Trinadad&Tobago: null

| | | | native-country = Peru: null

| | | | native-country = Hong: null

| | | | native-country = Holand-Netherlands: null

| | | workclass = Self-emp-not-inc: >50K

| | | workclass = Self-emp-inc: null

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: null

| | | workclass = State-gov: >50K

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Handlers-cleaners: null

| | occupation = Machine-op-inspct: null

| | occupation = Adm-clerical: >50K

| | occupation = Farming-fishing: null

| | occupation = Transport-moving: <=50K

| | occupation = Priv-house-serv: null

| | occupation = Protective-serv: null

| | occupation = Armed-Forces: null

| relationship = Not-in-family

| | occupation = Tech-support: <=50K

| | occupation = Craft-repair: null

| | occupation = Other-service: <=50K

| | occupation = Sales: null

| | occupation = Exec-managerial: <=50K

| | occupation = Prof-specialty

| | | sex = Female: <=50K

| | | sex = Male

| | | | workclass = Private: <=50K

| | | | workclass = Self-emp-not-inc: null

| | | | workclass = Self-emp-inc: null

| | | | workclass = Federal-gov: null

| | | | workclass = Local-gov: null

| | | | workclass = State-gov: <=50K

| | | | workclass = Without-pay: null

| | | | workclass = Never-worked: null

| | occupation = Handlers-cleaners: null

| | occupation = Machine-op-inspct: <=50K

| | occupation = Adm-clerical: <=50K

| | occupation = Farming-fishing: null

| | occupation = Transport-moving: null

| | occupation = Priv-house-serv: null

| | occupation = Protective-serv: null

| | occupation = Armed-Forces: null

| relationship = Other-relative: <=50K

| relationship = Unmarried: <=50K

education = Some-college

| relationship = Wife

| | occupation = Tech-support: null

| | occupation = Craft-repair: <=50K

| | occupation = Other-service: null

| | occupation = Sales: null

| | occupation = Exec-managerial: null

| | occupation = Prof-specialty: >50K

| | occupation = Handlers-cleaners: null

| | occupation = Machine-op-inspct: >50K

| | occupation = Adm-clerical: <=50K

| | occupation = Farming-fishing: null

| | occupation = Transport-moving: null

| | occupation = Priv-house-serv: null

| | occupation = Protective-serv: null

| | occupation = Armed-Forces: null

| relationship = Own-child

| | marital-status = Married-civ-spouse: null

| | marital-status = Divorced

| | | occupation = Tech-support: null

| | | occupation = Craft-repair: <=50K

| | | occupation = Other-service: null

| | | occupation = Sales: null

| | | occupation = Exec-managerial: null

| | | occupation = Prof-specialty: null

| | | occupation = Handlers-cleaners: null

| | | occupation = Machine-op-inspct: null

| | | occupation = Adm-clerical: >50K

| | | occupation = Farming-fishing: null

| | | occupation = Transport-moving: null

| | | occupation = Priv-house-serv: null

| | | occupation = Protective-serv: null

| | | occupation = Armed-Forces: null

| | marital-status = Never-married: <=50K

| | marital-status = Separated: <=50K

| | marital-status = Widowed: null

| | marital-status = Married-spouse-absent: null

| | marital-status = Married-AF-spouse: null

| relationship = Husband

| | occupation = Tech-support

| | | workclass = Private: <=50K

| | | workclass = Self-emp-not-inc: >50K

| | | workclass = Self-emp-inc: null

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: null

| | | workclass = State-gov: null

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Craft-repair

| | | workclass = Private: <=50K

| | | workclass = Self-emp-not-inc: <=50K

| | | workclass = Self-emp-inc: <=50K

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: <=50K

| | | workclass = State-gov: >50K

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Other-service: <=50K

| | occupation = Sales: <=50K

| | occupation = Exec-managerial

| | | workclass = Private: >50K

| | | workclass = Self-emp-not-inc: <=50K

| | | workclass = Self-emp-inc: null

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: null

| | | workclass = State-gov: null

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Prof-specialty: null

| | occupation = Handlers-cleaners: null

| | occupation = Machine-op-inspct: >50K

| | occupation = Adm-clerical: >50K

| | occupation = Farming-fishing

| | | workclass = Private: >50K

| | | workclass = Self-emp-not-inc: null

| | | workclass = Self-emp-inc: <=50K

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: null

| | | workclass = State-gov: null

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Transport-moving

| | | workclass = Private: <=50K

| | | workclass = Self-emp-not-inc: null

| | | workclass = Self-emp-inc: >50K

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: null

| | | workclass = State-gov: null

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Priv-house-serv: null

| | occupation = Protective-serv

| | | workclass = Private: <=50K

| | | workclass = Self-emp-not-inc: null

| | | workclass = Self-emp-inc: null

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: >50K

| | | workclass = State-gov: null

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Armed-Forces: null

| relationship = Not-in-family

| | occupation = Tech-support: <=50K

| | occupation = Craft-repair

| | | sex = Female: >50K

| | | sex = Male

| | | | marital-status = Married-civ-spouse: null

| | | | marital-status = Divorced: >50K

| | | | marital-status = Never-married: <=50K

| | | | marital-status = Separated: <=50K

| | | | marital-status = Widowed: null

| | | | marital-status = Married-spouse-absent: null

| | | | marital-status = Married-AF-spouse: null

| | occupation = Other-service: <=50K

| | occupation = Sales

| | | marital-status = Married-civ-spouse: null

| | | marital-status = Divorced: <=50K

| | | marital-status = Never-married

| | | | race = White: >50K

| | | | race = Asian-Pac-Islander: null

| | | | race = Amer-Indian-Eskimo: null

| | | | race = Other: null

| | | | race = Black: <=50K

| | | marital-status = Separated: null

| | | marital-status = Widowed: >50K

| | | marital-status = Married-spouse-absent: null

| | | marital-status = Married-AF-spouse: null

| | occupation = Exec-managerial: null

| | occupation = Prof-specialty: <=50K

| | occupation = Handlers-cleaners: null

| | occupation = Machine-op-inspct: <=50K

| | occupation = Adm-clerical

| | | marital-status = Married-civ-spouse: null

| | | marital-status = Divorced: null

| | | marital-status = Never-married: <=50K

| | | marital-status = Separated: null

| | | marital-status = Widowed: <=50K

| | | marital-status = Married-spouse-absent: >50K

| | | marital-status = Married-AF-spouse: null

| | occupation = Farming-fishing: <=50K

| | occupation = Transport-moving: <=50K

| | occupation = Priv-house-serv: null

| | occupation = Protective-serv: <=50K

| | occupation = Armed-Forces: null

| relationship = Other-relative: <=50K

| relationship = Unmarried: <=50K

education = 11th: <=50K

education = HS-grad

| marital-status = Married-civ-spouse

| | occupation = Tech-support: >50K

| | occupation = Craft-repair

| | | native-country = United-States

| | | | workclass = Private

| | | | | relationship = Wife: <=50K

| | | | | relationship = Own-child: null

| | | | | relationship = Husband: <=50K

| | | | | relationship = Not-in-family: null

| | | | | relationship = Other-relative: null

| | | | | relationship = Unmarried: null

| | | | workclass = Self-emp-not-inc: null

| | | | workclass = Self-emp-inc: <=50K

| | | | workclass = Federal-gov: null

| | | | workclass = Local-gov: null

| | | | workclass = State-gov: null

| | | | workclass = Without-pay: null

| | | | workclass = Never-worked: null

| | | native-country = Cambodia: null

| | | native-country = England: null

| | | native-country = Puerto-Rico: null

| | | native-country = Canada: null

| | | native-country = Germany: null

| | | native-country = Outlying-US(Guam-USVI-etc): null

| | | native-country = India: null

| | | native-country = Japan: null

| | | native-country = Greece: null

| | | native-country = South: null

| | | native-country = China: null

| | | native-country = Cuba: null

| | | native-country = Iran: null

| | | native-country = Honduras: null

| | | native-country = Philippines: null

| | | native-country = Italy: null

| | | native-country = Poland: null

| | | native-country = Jamaica: null

| | | native-country = Vietnam: null

| | | native-country = Mexico: >50K

| | | native-country = Portugal: null

| | | native-country = Ireland: null

| | | native-country = France: null

| | | native-country = Dominican-Republic: null

| | | native-country = Laos: null

| | | native-country = Ecuador: null

| | | native-country = Taiwan: null

| | | native-country = Haiti: null

| | | native-country = Columbia: null

| | | native-country = Hungary: null

| | | native-country = Guatemala: null

| | | native-country = Nicaragua: null

| | | native-country = Scotland: null

| | | native-country = Thailand: null

| | | native-country = Yugoslavia: null

| | | native-country = El-Salvador: null

| | | native-country = Trinadad&Tobago: null

| | | native-country = Peru: null

| | | native-country = Hong: null

| | | native-country = Holand-Netherlands: null

| | occupation = Other-service

| | | relationship = Wife: >50K

| | | relationship = Own-child: null

| | | relationship = Husband: <=50K

| | | relationship = Not-in-family: null

| | | relationship = Other-relative: null

| | | relationship = Unmarried: null

| | occupation = Sales

| | | relationship = Wife: null

| | | relationship = Own-child: >50K

| | | relationship = Husband: <=50K

| | | relationship = Not-in-family: null

| | | relationship = Other-relative: null

| | | relationship = Unmarried: null

| | occupation = Exec-managerial

| | | workclass = Private

| | | | relationship = Wife: <=50K

| | | | relationship = Own-child: null

| | | | relationship = Husband: >50K

| | | | relationship = Not-in-family: null

| | | | relationship = Other-relative: null

| | | | relationship = Unmarried: null

| | | workclass = Self-emp-not-inc: <=50K

| | | workclass = Self-emp-inc: >50K

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: >50K

| | | workclass = State-gov: null

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Prof-specialty: null

| | occupation = Handlers-cleaners

| | | native-country = United-States: >50K

| | | native-country = Cambodia: null

| | | native-country = England: null

| | | native-country = Puerto-Rico: null

| | | native-country = Canada: null

| | | native-country = Germany: null

| | | native-country = Outlying-US(Guam-USVI-etc): null

| | | native-country = India: null

| | | native-country = Japan: null

| | | native-country = Greece: null

| | | native-country = South: null

| | | native-country = China: null

| | | native-country = Cuba: null

| | | native-country = Iran: null

| | | native-country = Honduras: null

| | | native-country = Philippines: null

| | | native-country = Italy: null

| | | native-country = Poland: null

| | | native-country = Jamaica: null

| | | native-country = Vietnam: null

| | | native-country = Mexico: <=50K

| | | native-country = Portugal: null

| | | native-country = Ireland: null

| | | native-country = France: null

| | | native-country = Dominican-Republic: null

| | | native-country = Laos: null

| | | native-country = Ecuador: null

| | | native-country = Taiwan: null

| | | native-country = Haiti: null

| | | native-country = Columbia: null

| | | native-country = Hungary: null

| | | native-country = Guatemala: null

| | | native-country = Nicaragua: null

| | | native-country = Scotland: null

| | | native-country = Thailand: null

| | | native-country = Yugoslavia: null

| | | native-country = El-Salvador: null

| | | native-country = Trinadad&Tobago: null

| | | native-country = Peru: null

| | | native-country = Hong: null

| | | native-country = Holand-Netherlands: null

| | occupation = Machine-op-inspct

| | | workclass = Private: <=50K

| | | workclass = Self-emp-not-inc: <=50K

| | | workclass = Self-emp-inc: null

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: null

| | | workclass = State-gov: null

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Adm-clerical: >50K

| | occupation = Farming-fishing: <=50K

| | occupation = Transport-moving

| | | relationship = Wife: <=50K

| | | relationship = Own-child: null

| | | relationship = Husband: <=50K

| | | relationship = Not-in-family: null

| | | relationship = Other-relative: null

| | | relationship = Unmarried: null

| | occupation = Priv-house-serv: null

| | occupation = Protective-serv: >50K

| | occupation = Armed-Forces: null

| marital-status = Divorced

| | occupation = Tech-support: null

| | occupation = Craft-repair

| | | workclass = Private: <=50K

| | | workclass = Self-emp-not-inc: >50K

| | | workclass = Self-emp-inc: null

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: >50K

| | | workclass = State-gov: null

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Other-service: <=50K

| | occupation = Sales: <=50K

| | occupation = Exec-managerial

| | | workclass = Private: >50K

| | | workclass = Self-emp-not-inc: <=50K

| | | workclass = Self-emp-inc: null

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: null

| | | workclass = State-gov: null

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | occupation = Prof-specialty: null

| | occupation = Handlers-cleaners: null

| | occupation = Machine-op-inspct: >50K

| | occupation = Adm-clerical: <=50K

| | occupation = Farming-fishing: null

| | occupation = Transport-moving: <=50K

| | occupation = Priv-house-serv: null

| | occupation = Protective-serv: null

| | occupation = Armed-Forces: null

| marital-status = Never-married: <=50K

| marital-status = Separated: <=50K

| marital-status = Widowed: <=50K

| marital-status = Married-spouse-absent: null

| marital-status = Married-AF-spouse: null

education = Prof-school: >50K

education = Assoc-acdm

| occupation = Tech-support: null

| occupation = Craft-repair: >50K

| occupation = Other-service: null

| occupation = Sales: <=50K

| occupation = Exec-managerial: >50K

| occupation = Prof-specialty: <=50K

| occupation = Handlers-cleaners: null

| occupation = Machine-op-inspct: >50K

| occupation = Adm-clerical

| | marital-status = Married-civ-spouse: >50K

| | marital-status = Divorced: null

| | marital-status = Never-married: <=50K

| | marital-status = Separated: null

| | marital-status = Widowed: <=50K

| | marital-status = Married-spouse-absent: null

| | marital-status = Married-AF-spouse: null

| occupation = Farming-fishing: null

| occupation = Transport-moving: <=50K

| occupation = Priv-house-serv: null

| occupation = Protective-serv: <=50K

| occupation = Armed-Forces: null

education = Assoc-voc

| relationship = Wife: <=50K

| relationship = Own-child: <=50K

| relationship = Husband

| | occupation = Tech-support: >50K

| | occupation = Craft-repair: <=50K

| | occupation = Other-service: <=50K

| | occupation = Sales: null

| | occupation = Exec-managerial: >50K

| | occupation = Prof-specialty: null

| | occupation = Handlers-cleaners: null

| | occupation = Machine-op-inspct: <=50K

| | occupation = Adm-clerical: null

| | occupation = Farming-fishing: null

| | occupation = Transport-moving: >50K

| | occupation = Priv-house-serv: null

| | occupation = Protective-serv: >50K

| | occupation = Armed-Forces: null

| relationship = Not-in-family: <=50K

| relationship = Other-relative: <=50K

| relationship = Unmarried: <=50K

education = 9th: <=50K

education = 7th-8th: <=50K

education = 12th: <=50K

education = Masters

| occupation = Tech-support: >50K

| occupation = Craft-repair

| | marital-status = Married-civ-spouse: <=50K

| | marital-status = Divorced: null

| | marital-status = Never-married: >50K

| | marital-status = Separated: null

| | marital-status = Widowed: <=50K

| | marital-status = Married-spouse-absent: null

| | marital-status = Married-AF-spouse: null

| occupation = Other-service: null

| occupation = Sales

| | workclass = Private: >50K

| | workclass = Self-emp-not-inc: >50K

| | workclass = Self-emp-inc: null

| | workclass = Federal-gov: null

| | workclass = Local-gov: null

| | workclass = State-gov: null

| | workclass = Without-pay: null

| | workclass = Never-worked: null

| occupation = Exec-managerial: >50K

| occupation = Prof-specialty

| | relationship = Wife: >50K

| | relationship = Own-child: null

| | relationship = Husband

| | | workclass = Private: >50K

| | | workclass = Self-emp-not-inc: >50K

| | | workclass = Self-emp-inc: null

| | | workclass = Federal-gov: null

| | | workclass = Local-gov: <=50K

| | | workclass = State-gov: null

| | | workclass = Without-pay: null

| | | workclass = Never-worked: null

| | relationship = Not-in-family: <=50K

| | relationship = Other-relative: <=50K

| | relationship = Unmarried: null

| occupation = Handlers-cleaners: null

| occupation = Machine-op-inspct: null

| occupation = Adm-clerical: <=50K

| occupation = Farming-fishing: null

| occupation = Transport-moving: >50K

| occupation = Priv-house-serv: null

| occupation = Protective-serv: <=50K

| occupation = Armed-Forces: null

education = 1st-4th: <=50K

education = 10th

| occupation = Tech-support: null

| occupation = Craft-repair: <=50K

| occupation = Other-service: <=50K

| occupation = Sales: <=50K

| occupation = Exec-managerial: null

| occupation = Prof-specialty: null

| occupation = Handlers-cleaners: <=50K

| occupation = Machine-op-inspct: <=50K

| occupation = Adm-clerical: null

| occupation = Farming-fishing: <=50K

| occupation = Transport-moving: >50K

| occupation = Priv-house-serv: null

| occupation = Protective-serv: null

| occupation = Armed-Forces: null

education = Doctorate: >50K

education = 5th-6th: <=50K

education = Preschool: null
Time taken to build model: 0.03 seconds
=== Stratified cross-validation ===

=== Summary ===


Correctly Classified Instances 274 68.5 %

Incorrectly Classified Instances 70 17.5 %

Kappa statistic 0.3739

Mean absolute error 0.2285

Root mean squared error 0.4468

Relative absolute error 74.3816 %

Root relative squared error 115.472 %

UnClassified Instances 56 14 %

Total Number of Instances 400
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class

0.455 0.105 0.556 0.455 0.5 >50K

0.895 0.545 0.851 0.895 0.872 <=50K
=== Confusion Matrix ===
a b <-- classified as

35 42 | a = >50K

28 239 | b = <=50K

Результат:

Жена

-при роде занятий прочее обслуживание человек получает меньше 50000



-при роде занятий проф. специалист человек получает больше 50000

Владелец ребенка получает меньше 50000

Муж

-ремонтник и продавец получает больше 50000



-- Родившиеся в США

--- работающие на себя , частник , работающие в правительстве получают больше 50000

--родившиеся в Германии получают меньше 50000

--родившиеся в японии и греции получают больше 50000

-при роде занятий проф. Специалист

-- частник

---родившийся в США,Канаде получает больше 50000


следующая страница >>
Смотрите также:
Отчет о лаботарорной работе методы и средства анализа данных по теме: «Система анализа данных weka»
383.87kb.
2 стр.
Отчет о лаботарорной работе по дисциплине Методы и средства анализа данных по теме: «Система анализа данных weka»
229.16kb.
1 стр.
Отчет о лаботарорной работе методы и средства анализа данных по теме
286.73kb.
1 стр.
Место теории измерений в методах анализа данных
266.06kb.
1 стр.
Методы анализа данных Кредиты: 3 Аннотация дисциплины
17.78kb.
1 стр.
Особенности анализа многомерных данных
170.74kb.
1 стр.
Лабораторная работа №4 Методы интеллектуального анализа данных. Обнаружение логических закономерностей на основе деревьев решений
104.04kb.
1 стр.
Методы интеллектуального анализа данных и некоторые их приложения1
28.3kb.
1 стр.
Б. Нойес Привязка данных в Windows Forms Книга охватывает все аспекты привязки данных в Windows Forms. Описываются средства, обеспечивающие связь с базой данных, такие, как типизированные наборы данных и адапт
69.76kb.
1 стр.
Методология психодиагностики и обработки экспериментальных данных
45.47kb.
1 стр.
Отчет по результатам работы по программе усовершенствования базы данных по сортам растений и изложить предложения по усовершенствованию базы данных по сортам растений
712.53kb.
4 стр.
Формула специальности: Содержанием специальности 22. 00. 04 – «Социальная структура, социальные институты и процессы»
36.75kb.
1 стр.