The intelligence of machines and the branch of computer science which aims to create it

Artificial Intelligence Journal

Subscribe to Artificial Intelligence Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Artificial Intelligence Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Artificial Intelligence Authors: Rene Buest, Liz McMillan, Angsuman Dutta, Elizabeth White, Leon Adato

Related Topics: Artificial Intelligence Journal, Java Developer Magazine

Artificial Intelligence: Article

Programming Neural Networks in Java

Programming Neural Networks in Java

Computers can perform many operations a lot faster than humans. However, there are many tasks in which the computer falls considerably short. One such task is the interpretation of graphic information. A preschool child can easily tell the difference between a cat and a dog, but this simple problem confounds today's computers.

In the field of computer science, artificial intelligence attempts to give computers human abilities. One of the primary means by which computers are endowed with humanlike abilities is through the use of a neural network, which the human brain is the ultimate example of. The human brain consists of a network of over a billion interconnected neurons. These are individual cells that can process small amounts of information and then activate other neurons to continue the process. However, the term neural network, as it's normally used, is actually a misnomer. Computers attempt to simulate a neural network. However, most publications use the term neural network rather than artificial neural network.

This article shows how to construct a neural network in Java; however, they can be constructed in almost any programming language. Most publications about neural networks use such computer languages as C, C++, Lisp, or Prolog. Java is actually quite effective as a neural network programming language. This article shows you a simple, yet particle, neural network that can recognize handwritten letters, and describes the implementation of a neural network in a small sample program. (All sample programs and source code for this article can be downloaded from www.sys-con.com/java/sourcec.cfm.)

Recognizing Letters
Using the sample program (shown in Figure 1) you can see a neural network in action. For ease of distribution, the class files are packaged into a single JAR file named OCR.jar. To run the program, use the following command (assuming you're in the same directory as the JAR file). Some systems may allow you to simply double-click the JAR file.

java -classpath OCR.jar MainEntry

When the letter-recognition program begins, there's no data loaded initially. A training file must be used that contains the shapes of the letters. An example training file (sample.dat) is preloaded with the 26 capital letters. To see the program work, click the "Load" button, which loads the sample.dat file. Now 26 letter patterns are in memory and the network must be trained. Click the "Begin Training" button; now the network is ready to recognize characters. Draw any capital letter you like and click "Recognize"; the program should now recognize the letter.

Training the Sample Program
Maybe my handwriting is considerably different than most people's. (My first grade teacher would certainly say so.) What if you want to train the program specifically for your handwriting? To replace a letter that's already defined, you must select and delete that letter first. Pressing the "Delete" button does this. Now draw the character you wish to train the program for. If you'd like to see this letter downsampled before you add it, click the "Downsample" button. If you're happy with your letter, click the "Add" button to add it to the training set. To save a copy of your newly created letters, click the "Save" button and they'll be written to the Sample.dat file.

Once you've entered all the letters you want, you must now "train" the neural network. Up to this point you've simply provided a training set of letters known as input patterns. With these input patterns, you're now ready to train the network, which could take a lot of time. However, since only one drawing sample per letter is allowed, this process will be completed in a matter of seconds. A small popup will be displayed when training is complete. When you save, only the character patterns are saved. If you load these same patterns later, you must retrain the network.

You'll now be shown how this example program is constructed, and how you can create similar programs. The file MainEntry.java contains the Swing application that makes up this application, which is little more than placing the components at their correct locations.

The three areas this article focuses on are downsampling, training, and recognition. Downsampling, an algorithm used to reduce the resolution of the letters being drawn, is used for character recognition and training, so we'll address this topic first.

Downsampling the Image
All images are downsampled before being used, which prevents the neural network from being confused by size and position. The drawing area is large enough so you could draw a letter in several different sizes. By downsampling the image to a consistent size, it won't matter how large you draw the letter, as the downsampled image will always remain a consistent size. This section shows how this is done.

When you draw an image, the program first draws a box around the boundary of your letter. This allows the program to eliminate all the white space around your letter. This process is done inside the "downsample" method of the Entry.java class. As you draw a character, this character is also drawn onto the "entryImage" instance variable of the Entry object. To crop this image and eventually downsample it, we must grab the bit pattern of the image. This is done using a PixelGrabber class:

int w = entryImage.getWidth(this);
int h = entryImage.getHeight(this);

PixelGrabber grabber =
new PixelGrabber(entryImage,0,0,w,h,true);
grabber.grabPixels();
pixelMap = (int[])grabber.getPixels();

After this code completes, the pixelMap variable, which is an array of int datatypes, now contains the bit pattern of the image. The next step is to crop the image and remove any white space. Cropping is implemented by dragging four imaginary lines from the top, left, bottom, and right sides of the image. These lines will stop as soon as they cross an actual pixel. By doing this, these lines snap to the outer edges of the image. The hLineClear and vLineClear methods both accept a parameter that indicates the line to scan, and returns true if that line is clear. The program works by calling hLineClear and vLineClear until they cross the outer edges of the image. The horizontal line method (hLineClear) is shown here.

protected boolean hLineClear(int y)
{
int w = entryImage.getWidth(this);
for ( int i=0;i<w;i++ ) {
if ( pixelMap[(y*w)+i] !=-1 )
return false;
}
return true;
}
As you can see, the horizontal line method accepts a y coordinate that specifies the horizontal line to check. The program then loops through each x coordinate on that row, checking for any pixel values. The value of -1 indicates white, so it's ignored. The "findBounds" method uses "hLineClear" and "vLineClear" to calculate the four edges. The beginning of this method is shown here:

protected void findBounds(int w,inth)
{
// top line
for ( int y=0;y<h;y++ ) {
if ( !hLineClear(y) ) {
downSampleTop=y;
break;
}

}
// bottom line
for ( int y=h-1;y>=0;y-- ) {
if ( !hLineClear(y) ) {
downSampleBottom=y;
break;
}
}

You can see how the program calculates the top and bottom lines of the cropping rectangle. To calculate the top line, the program starts at 0 and continues to the bottom of the image. As soon as the first nonclear line is found, the program establishes this as the top of the clipping rectangle. The same process, only in reverse, is carried out to determine the bottom of the image. The processes to determine the left and right boundaries are carried out in the same way.

Now that the image has been cropped, it must be downsampled. This involves taking the image from a larger resolution to a 5x7 resolution. To reduce an image to 5x7, think of an imaginary grid being drawn over the high-resolution image. This divides the image into rectangular sections, five across and seven down. If any pixel in a section is filled, the corresponding pixel in the 5x7 downsampled image is also filled. Most of the work done by this process is accomplished inside the "downSampleQuadrant" method shown here.

protected boolean downSampleQuadrant(int x,int y)
{
int w =entryImage.getWidth(this);
int startX =(int)
(downSampleLeft+(x*ratioX));
int startY = (int)
(downSampleTop+(y*ratioY));
int endX = (int)(startX + ratioX);
int endY = (int)(startY + ratioY);

for ( int yy=startY;yy<=endY;yy++ ) {
for ( int xx=startX;
xx<=endX;xx++ ) {
int loc = xx+(yy*w);

if ( pixelMap[ loc ]!= -1 )
return true;
}
}

return false;
}

The "downSampleQuadrant" method accepts the section number that should be calculated. First the starting and ending x and y coordinates must be calculated. To calculate the first x coordinate for the specified section, first the "downSampleLeft" is used; this is the left side of the cropping rectangle. Then x is multiplied by "ratioX", the ratio of how many pixels make up each section. This allows us to determine where to place "startX". The starting y position, "startY", is calculated by similar means. Next the program loops through every x and y covered by the specified section. If even one pixel is determined to be filled, the method returns true, which indicates that this section should be considered filled.

The "downSampleQuadrant" method is called in succession for each section in the image. This results of the sample image are stored in the "SampleData" class, a wrapper class that contains a 5x7 array of Boolean values. It's this structure that forms the input to both training and character recognition.

Neural Network Recognition
There are many types of neural networks, and most are named after their creators. I'll be using a Kohonen neural network, a two-level network (see Figure 2). The downsampled character pattern drawn by the user is fed to the input neurons. There's one input neuron for every pixel in the downsampled image. Because the downsampled image is a 5x7 grid, there are 35 input neurons.

Through the output neurons, the neural network communicates which letter it thinks the user drew. The number of output neurons always matches the number of unique letter samples that were provided. Since 26 letters were provided in the sample, there will be 26 output neurons. If this program were modified to support multiple samples per letter, there would still be 26 output neurons, even if there were multiple samples per letter.

In addition to input and output neurons, there are also connections between the individual neurons. These connections are not all equal. Each is assigned a weight, which is ultimately the only factor that determines what the network will output for a given input pattern. To determine the total number of connections, multiply the number of input neurons by the number of output neurons. A neural network with 26 output neurons and 35 input neurons would have a total of 910 connection weights. The training process is dedicated to finding the correct values for these weights.

The recognition process begins when the user draws a character and then clicks the "Recognize" button. First the letter is downsampled to a 5x7 image. This image must be copied from its two-dimensional array to an array of doubles that will be fed to the input neurons.

entry.downSample();

double input[] = new double[5*7];
int idx=0;
SampleData ds = sample.getData();
for ( int y=0;y<ds.getHeight();y++ )
{
for ( int x=0;x<ds.getWidth();x++
) {
input[idx++] = ds.getData(x,y)?.5:-.5;
}
}

This code does the conversion. Neurons require floating point input. As a result, the program feeds it the value of 5 for a white pixel and -5 for a black pixel. This array of 35 values is fed to the input neurons by passing the input array to the Kohonen's "winner" method. This returns which of the 35 neurons won and is stored in the "best" integer.

int best = net.winner ( input , normfac , synth ) ;
char map[] = mapNeurons();

JOptionPane.showMessageDialog(this, " " + map[best]
+ " (Neuron #" + best + " fired)",
"That Letter Is",
JOptionPane.PLAIN_MESSAGE);

Knowing the winning neuron is not too helpful because it doesn't show you which letter was recognized. To line up the neurons with their recognized letters, each letter image the network was trained from must be fed into the network and the winning neuron determined. For example, if you were to feed the training image for "J" into the neural network, and the winning neuron were neuron #4, you would know that it's the one that had learned to recognize J's pattern. This is done by calling the "mapNeurons" method, which returns an array of characters. The index of each array element corresponds to the neuron number that recognizes that character.

Most of the actual work performed by the neural network is done in the winner method. The first thing the winner method does is normalize the inputs and calculate the output values of each output neuron. The output neuron with the largest output value is considered the winner. First the "biggest" variable is set to a very small number to indicate there's no winner yet.

biggest = -1.E30;
for ( i=0 ; i<outputNeuronCount;
i++ ) {
optr = outputWeights[i];
output[i] = dotProduct (input , optr ) * normfac[0]
+ synth[0] * optr[inputNeuronCount] ;
// Remap to bipolar(-1,1 to 0,1)
output[i] = 0.5 * (output[i] + 1.0) ;
if ( output[i] > biggest ) {
biggest = output[i] ;
win = i ;
}
Each output neuron's weight is calculated by taking the dot product of each output neuron's weights to the input neurons. The dot product is calculated by multiplying each of the input neuron's input values against the weights between that input neuron and the output neuron. These weights were determined during training, which is discussed in the next section. The output is kept, and if it's the largest output so far, it's set as the "winning" neuron.

As you can see, getting the results from a neural network is a quick process. Actually determining the weights of the neurons is the complex portion of this process. Training the neural network is discussed in the following section.

How the Neural Network Learns
Learning is the process of selecting a neuron weight matrix that will correctly recognize input patterns. A Kohonen neural network learns by constantly evaluating and optimizing a weight matrix. To do this, a starting weight matrix must be determined. This matrix is chosen by selecting random numbers. Of course, this is a terrible choice for a weight matrix, but it gives a starting point to optimize from.

Once the initial random weight matrix is created, the training can begin. First the weight matrix is evaluated to determine what its current error level is. This error is determined by how well the training input (the letters that you created) maps to the output neurons. The error is calculated by the "evaluateErrors" method of the KohonenNetwork class. If the error level is low, say below 10%, the process is complete.

When the user clicks the "Begin Training" button, the training process begins with the following code:

int inputNeuron = MainEntry.DOWNSAMPLE_HEIGHT*
MainEntry.DOWNSAMPLE_WIDTH;
int outputNeuron = letter ListModel.size();
This calculates the number of input and output neurons. First, the number of input neurons is determined from the size of the downsampled image. Since the height is 7 and the width is 5, the number of input neurons will be 35. The number of output neurons matches the number of characters the program has been given.

This is the part of the program that could be modified if you want it to accept and train from more than one sample per letter. For example, if you wanted to accept four samples per letter, you'd have to make sure that the output neuron count remained 26, even though 104 input samples were provided to train with (4 for each of the 26 letters).

Now that the size of the neural network has been determined, the training set and neural network must be constructed. The training set is constructed to hold the correct number of "samples." This will be the 26 letters provided.

TrainingSet set = new TrainingSet(inputNeuron,outputNeuron);
set.setTrainingSetCount(letterListModel.size());
Next, the downsampled input images are copied to the training set; this is repeated for all 26 input patterns.

for ( intt=0;t<letterListModel.
size();t++ ) {
int idx=0;
SampleData ds = (SampleData)
letterListModel.getElementAt(t);
for ( int y=0;y<ds.getHeight();y++ ) {
for ( int x=0;x<ds.getWidth();x++ ) {
set.setInput(t,idx++,ds.getData(x,y)?.5:-.5);
}
}
}
Finally the neural network is constructed and the training set is assigned, so the "learn" method can be called. This will adjust the weight matrix until the network is trained.

net = newKohonenNetwork(inputNeuron,output
Neuron,this);
net.setTrainingSet(set);
net.learn();
The learn method will loop up to an unspecified number of iterations. Because this program only has one sample per output neuron, it's unlikely that it will take more than one iteration. When the number of training samples matches the output neuron count, training occurs very quickly.

n_retry = 0 ;
for ( iter=0 ; ; iter++ ) {
A method, "evaluateErrors", is called to evaluate how well the current weights are working. This is determined by looking at how well the training data spreads across the output neurons. If many output neurons are activated for the same training pattern, then the weight set is not a good one. An error rate is calculated, based on how well the training sets are spreading across the output neurons.

evaluateErrors ( rate , learnMethod, won ,
bigerr , correc , work ) ;
Once the error is determined, we must see if it is below the best error we've seen so far. If it is, this error is copied to the best error, and the neuron weights are also preserved.

totalError = bigerr[0] ;

if ( totalError < best_err ) {
best_err = totalError ;
copyWeights ( bestnet , this ) ;
}

The total number of winning neurons is then calculated, allowing us to determine if no output neurons were activated. In addition, if the error is below the accepted quit error (10%), the training stops.

winners = 0 ;
for ( i=0;i<won.length;i++ )
if ( won[i]!=0 )
winners++;
if ( bigerr[0] < quitError )
break ;
If there is not an acceptable number of winners, one neuron is forced to win.

if ( (winners <outputNeuronCount) &&
(winners <train.getTrainingSetCount())
) {
forceWin ( won ) ;
continue ;
}
Now that the first weight matrix has been evaluated, it's adjusted based on its error. The adjustment is slight, based on the correction that was calculated when the error was determined. This two-step process of adjusting the error calculation and adjusting the weight matrix is continued until the error falls below 10%.

adjustWeights ( rate , learnMethod , won , bigcorr, correc ) ;
This is the process by which a neural network is trained. The method for adjusting the weights and calculating the error is shown in the KohonenNetwork.java file.

Conclusion
The example presented here is very modular. The neural network Java files contained in this example are KohonenNetwork.java, Network.java, and TrainingSet.java. These files do not pertain to character recognition. The other files are responsible for the user interface and downsampling. One limitation mentioned in the article is that only one drawing can be defined per character. The underlying Kohonen network classes would easily support this feature. This is something that could be added to the user interface with a few more classes.

Neural networks provide an efficient way of performing certain operations that would otherwise be very difficult. Consider how a character recognition program would work without neural networks. You'd likely find yourself writing complex routines that traced outlines, analyzed angles, and did other graphical analysis. Neural networks should be considered anytime complex patterns must be recognized. These patterns don't need to be graphical in nature. Any form of data that can have patterns is a candidate for a neural network solution.

More Stories By Jeff Heaton

Jeff Heaton is the author of “Programming Spiders, Bots and Aggregators in Java” by Sybex. Jeff can be contacted through his website at http://www.jeffheaton.com.

Comments (4) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Stratos 08/10/04 11:22:05 AM EDT

Personally, I think it is a good example of neural net application, combining theory and practice.
And those who have searched through the Internet for practical examples know how difficult it is to find one.
Well done, Jeff.
Your article has been a perfect introduction for me...

Patrick Nelson 05/24/04 05:28:14 PM EDT

Just reading your article, and also the AI section of your website.

I was confused, because you repeatedly refer to a "setTrainingSet" that doesn''t exist. Or does it?

ocean 05/12/02 09:02:00 PM EDT

not bad