An Overview of Enterprise Miner
Enterprise Miner v.4.1 is a great
new product SAS has recently introduced in version 8. It consists of a variety
of analytical tools like neural networks to support data mining to enhance
traditional forecasting modeling. Data mining is an analytical tool that is
used in solving critical business decisions by analyzing enormous amounts of
data in order to discover relationships and unknown patterns in the data.
Enterprise Miner is a powerful product now available within the SAS software.
The EM data mining SEMMA methodology is specifically designed to handling
enormous data sets in preparation to subsequent data analysis. In SAS
Enterprise Miner, the abbreviation SEMMA stands for Sampling, Exploring,
Modifying, Modeling, and Assessing large amounts of data. Neural network
modeling with regard to the data mining tasks falls under predictive modeling
i.e. regression or classification modeling. In regression modeling, the aim is
building a model that will produce values of one variable to be predicted based
on a set of known values of other variables. In classification modeling, the
difference is that the variable to predict is categorical based on a set of
known quantitative variables.
Purpose of Writing this Book
One reason for writing this book
is because there is not a tremendous amount of written literature on neural
network modeling using SAS Enterprise Miner. The book is a step-by-step
approach to neural network modeling using SAS Enterprise Miner and the use of
the SAS neural network procedure called PROC NEURAL. This book consists of a
step-by-step approach in designing a neural network process flow diagram using
SAS Enterprise Miner. The book will also explain the various statements and
options to the NEURAL procedure. There are numerous examples in explaining the
various complex neural network designs and optimization techniques used in
network modeling with numerous examples taken from various SAS literature
comparing the forecasting results between both neural network and traditional
regression forecasting techniques with an explanation to the SAS modeling
results. The book’s introductory is a brief overview
to traditional regression modeling and the various statistical assumptions that
must be satisfied.
Highlights to this Book
Chapter 2 discusses basic model
building and the various modeling assumptions that need to be satisfied. These
modeling assumptions in order of importance are independence, equal variance,
and normality in the modeling terms must be satisfied in both traditional
regression modeling and neural network designs. However, it should be noted
that some neural network modelers ignore these same important modeling
assumptions. This section will explain the various diagnosis statistic used in
identifying outliers and influential data points that have a profound effect to
the modeling results. And finally explaining the various goodness-of-fit
statistics used in determining the best linear combination of input variables
among a pool of all possible combination of input variables to the regression
model.
Chapter 3 explains the neural
network design and the various configuration settings. The section will first
explain a simple perceptron design for a
binary-valued target variable. Next, we will discuss the neural network designs
and the various configurations to the design like the various layers, weights,
combination functions, transfer functions, objective or error functions, and
optimization techniques that are used. The section will explain the various
optimization techniques such as the various line search and grid search
techniques. It will be followed by various numerical examples in order to
simplify the complexity of the numerous optimization techniques that are
applied in calculating the neural network weight estimates and determining the
smallest error to the neural network model. A numerical example of the backpropagation algorithm will be presented that is
typically used in a neural network MLP design. The section will explain the
similarity between the multiple regression parameter estimates and the neural
network weight estimates. Pruning techniques used in pre-processing the model
will be discussed leading to the general strategies in interpreting important
input variables to the neural network model and constructing a well-designed
neural network model. The section will conclude with a brief summary to the
advantages and disadvantages of a neural network design.
Chapter 4 first explains both the
SAS neural network procedure called the NEURAL procedure and the SAS data
mining regression procedure called the DMREG procedure and the various option
statements. The chapter will then display diagrams of the neural network
architecture in a couple of modeling comparison examples presented later in the
book. Thereby, for the reader to graphically understand the neural network
configuration between the various layers and the weight estimates associated
with these same neural network layers. Followed by SAS output
listings from the En