Radial basis function networks

One approach to function approximation that is closely related to distance-weighted regression and also to artificial neural networks is learning with radial basis functions
In this approach, the learned hypothesis is a function of the form

Where, each xu is an instance from X and where the kernel function Ku(d(xu, x)) is defined so that it decreases as the distance d(xu, x) increases.
Here k is a user provided constant that specifies the number of kernel functions to be included.
𝑓̂ is a global approximation to f (x), the contribution from each of the Ku(d(xu, x)) terms is localized to a region nearby the point xu.

Choose each function Ku(d(xu, x)) to be a Gaussian function centred at the point xu with some variance 𝜎u2

The functional form of equ(1) can approximate any function with arbitrarily small error, provided a sufficiently large number k of such Gaussian kernels and provided the width 𝜎² of each kernel can be separately specified
The function given by equ(1) can be viewed as describing a two layer network where the first layer of units computes the values of the various Ku(d(xu, x)) and where the second layer computes a linear combination of these first-layer unit values

Given a set of training examples of the target function, RBF networks are typically trained in a two-stage process.

First, the number k of hidden units is determined and each hidden unit u is defined by choosing the values of xu and 𝜎u2 that define its kernel function Ku(d(xu, x))
Second, the weights w, are trained to maximize the fit of the network to the training data, using the global error criterion given by

Because the kernel functions are held fixed during this second stage, the linear weight values w, can be trained very efficiently

Several alternative methods have been proposed for choosing an appropriate number of hidden units or, equivalently, kernel functions.

One approach is to allocate a Gaussian kernel function for each training example (xi,f (xi)), centring this Gaussian at the point xi. Each of these kernels may be assigned the same width 𝜎². Given this approach, the RBF network learns a global approximation to the target function in which each training example (xi, f (xi)) can influence the value of f only in the neighbourhood of xi.
A second approach is to choose a set of kernel functions that is smaller than the number of training examples. This approach can be much more efficient than the first approach, especially when the number of training examples is large.

Radial basis function networks provide a global approximation to the target function, represented by a linear combination of many local kernel functions.
The value for any given kernel function is non-negligible only when the input x falls into the region defined by its particular centre and width. Thus, the network can be viewed as a smooth linear combination of many local approximations to the target function.
One key advantage to RBF networks is that they can be trained much more efficiently than feedforward networks trained with BACKPROPAGATION.