next up previous
Next: More general association rules: Up: Looking for interesting patterns Previous: Association rules: an introduction

OneR: the simplest method

It turns out that very simple association rules, involving just one attribute in the condition part, often work disgustingly well in practice with real-world data. Suppose in the weather data, you wish to be able to predict the value of play. The idea of the OneR (one-attribute-rule) algorithm is to find the one attribute to use that makes fewest prediction errors. For example, consider outlook:

  if outlook = sunny    then play = no  .. makes 2 errors in 5 records
  if outlook = overcast then play = yes .. makes 0 errors in 4 records
  if outlook = rainy    then play = yes .. makes 2 errors in 5 records
for a total of 4 errors in 14 cases. Likewise,
  if humidity = high then play = no    .. makes 3 errors in 7 records
  if humidity = normal then play = yes .. makes 1 error  in 7 records
also for a total of 4 errors in 14 cases. The other two attributes each produce 5 errors at best, so the OneR algorithm chooses at random betweeen using outlook and humidity as the one decisive attribute. The algorithm is:
   For each attribute A:
     For each value V of that attribute, create a rule:
       1. count how often each class appears
       2. find the most frequent class, c
       3. make a rule "if A=V then C=c"
     Calculate the error rate of this rule

   Pick the attribute whose rules produce the lowest error rate

Peter Ross,, x4437
This version: 2000-10-30