Jan 20, 2023

Detect Text Sentiment With A Neural Network In C#

In this article I’m going to build an app that can automatically detect the sentiment of an English text.

I actually tried this before with a 1-dimensional convolutional neural network. That approach worked quite well with a final accuracy of 86%, but unfortunately my solution started overfitting right away.

A much better way to analyze English text is by using a specialized type of recurrent neural network called an LSTM network.

All recurrent neural networks have an internal state (a type of memory) that helps them make sense of written language. But an LSTM network actually has two types of memory: long-term and short-term memory. That makes the network well-suited to process language.

So how will an LSTM network do? Will it perform better than the 1-dimensional convolutional network?

Let’s find out!

I will use the same IMDB Movie Dataset again, this is a dataset with 25,000 positive movie reviews and 25,000 negative movie reviews. The reviews look like this:

Sweet sweet data!

The datafile is not a text file but a binary file, this is because the movie reviews have already been preprocessed. Each word in the reviews has been converted to an index number in a dictionary, and the words have been sorted in reverse order and padded with zeroes so each review is exactly 500 numbers long.

I will build an LSTM network that reads in these 500-word sequences and then makes a prediction for each review if it is positive or negative.

Let’s get started.

I will need to build a new application from scratch:

$ dotnet new console -o LstmDemo
$ cd LstmDemo

I will also copy the dataset file imdb data.zip into this folder because the code I’m going to type next will expect it here.

Now I’ll install the following packages:

$ dotnet add package CNTK.GPU
$ dotnet add package XPlot.Plotly
$ dotnet add package Fsharp.Core

The CNTK.GPU library is Microsoft’s Cognitive Toolkit that can train and run deep neural networks. And Xplot.Plotly is an awesome plotting library based on Plotly. The library is designed for F# so I also need to pull in the Fsharp.Core library.

The CNTK.GPU package will train and run deep neural networks using my GPU. I’ll need an NVidia GPU and Cuda graphics drivers for this to work.

If you don’t have an NVidia GPU or suitable drivers, the library will fall back and use the CPU instead. This will work but training neural networks will take significantly longer.

CNTK is a low-level tensor library for building, training, and running deep neural networks. The code to build deep neural network can get a bit verbose, so I’ve developed a little wrapper called CNTKUtil that will help you write code faster.

I’ll download the CNTKUtil files in a new CNTKUtil folder at the same level as my project folder, so I can create a project reference like this:

$ dotnet add reference ..\CNTKUtil\CNTKUtil.csproj

Now I’m ready to start writing code. I’ll edit the Program.cs file with Visual Studio Code and add the following code:

using System.IO.Compression;
using System;
using System.IO;
using System.Linq;
using CNTK;
using CNTKUtil;
using XPlot.Plotly;
using System.Collections.Generic;

namespace LstmDemo
{
    /// <summary>
    /// The main program class.
    /// </summary>
    public class Program
    {
        // filenames for data set
        private static string dataPath = Path.Combine(Environment.CurrentDirectory, "IMDB Dataset.csv");

        /// <summary>
        /// The main program entry point.
        /// </summary>
        /// <param name="args">The command line parameters.</param>
        static void Main(string[] args)
        {
            // check the compute device
            Console.WriteLine("Checking compute device...");
            Console.WriteLine($"  Using: {NetUtil.CurrentDevice.AsString()}");

            // unpack archive
            if (!File.Exists("x_train_imdb.bin"))
            {
                ZipFile.ExtractToDirectory("imdb_data.zip", ".");
            }

            // load training and test data
            Console.WriteLine("Loading data files...");
            var sequenceLength = 500;
            var training_data = DataUtil.LoadBinary<float>("x_train_imdb.bin", 25000, sequenceLength);
            var training_labels = DataUtil.LoadBinary<float>("y_train_imdb.bin", 25000);
            var testing_data = DataUtil.LoadBinary<float>("x_test_imdb.bin", 25000, sequenceLength);
            var testing_labels = DataUtil.LoadBinary<float>("y_test_imdb.bin", 25000);

            // the rest of the code goes here...
        }
    }
}

The code uses File.Exists and ZipFile.ExtractToDirectory to extract the dataset files from the zipfile if that hasn’t been done yet. Then I call DataUtil.LoadBinary to load to load the training and testing data in memory. Note the sequenceLength variable that indicates that we’re working with movie reviews that have been padded to a length of 500 words.

I now have 25,000 movie reviews ready for training and 25,000 movie reviews ready for testing. Each review has been encoded with each word converted into a numerical dictionary index, and the reviews have been padded with zeroes so that they’re all 500 floats long.

Now I need to tell CNTK what shape the input data has that I’ll train the neural network on, and what shape the output data of the neural network will have:

// build features and labels
var features = NetUtil.Var(new int[] { 1 }, CNTK.DataType.Float);
var labels = NetUtil.Var(new int[] { 1 }, CNTK.DataType.Float, 
    dynamicAxes: new List<CNTK.Axis>() { CNTK.Axis.DefaultBatchAxis() });

// the rest of the code goes here...

You might be surprised to see that first Var method call where I specify a tensor size of one. But remember that the LSTM network is a recurrent neural network that reads a sequence of data. During each time iteration I provide only a single sequence element to the network, and this is just one single number.

The second Var method tells CNTK that I want my neural network to output a single float value. But because this is a recurrent neural network, I have to specify that I want to use the default batch axis

My next step is to design the neural network. I’m going to build the following network:

This network uses a single LSTM layer to process the movie reviews, and a single dense layer to classify the results into a positive or negative prediction.

Here’s how to build this neural network:

// build the network
var lstmUnits = 32;
var network = features
    .OneHotOp(10000, true)
    .Embedding(32)
    .LSTM(lstmUnits, lstmUnits)
    .Dense(1, CNTKLib.Sigmoid)
    .ToNetwork();
Console.WriteLine("Model architecture:");
Console.WriteLine(network.ToSummary());

// the rest of the code goes here...

Note how I’m first calling OneHotOp to convert each word into a one-hot encoded vector with 10,000 elements. I then call Embedding to embed these values in a 32-dimensional space. The call to LSTM adds an LSTM layer with 32 compute elements, and the final Dense call sets up a classifier final layer with a single node using Sigmoid activation.

Then I use the ToSummary method to output a description of the architecture of the neural network to the console.

Now I need to decide which loss function to use to train the neural network, and how I am going to track the prediction error of the network during each training epoch.

For this assignment I’ll use BinaryCrossEntropy as the loss function because it’s the standard metric for measuring binary classification loss.

I will track the error with the BinaryClassificationError metric. This is the number of times (expressed as a percentage) that the model predictions are wrong. An error of 0 means the predictions are correct all the time, and an error of 1 means the predictions are wrong all the time.

// set up the loss function and the classification error function
var lossFunc = CNTKLib.BinaryCrossEntropy(network.Output, labels);
var errorFunc = NetUtil.BinaryClassificationError(network.Output, labels);

// the rest of the code goes here...

Next I will need to decide which algorithm to use to train the neural network. There are many possible algorithms derived from Gradient Descent that I can use here.

For this assignment I’m going to use the AdamLearner. You can learn more about the Adam algorithm here: https://machinelearningmastery.com/adam ...

// set up a learner
var learner = network.GetAdamLearner(
    learningRateSchedule: (0.001, 1),
    momentumSchedule: (0.9, 1),
    unitGain: true);

// the rest of the code goes here...

These configuration values are a good starting point for many machine learning scenarios, but you can tweak them if you like to try and improve the quality of your predictions.

We’re almost ready to train. My final step is to set up a trainer and an evaluator for calculating the loss and the error during each training epoch:

// set up a trainer and an evaluator
var trainer = network.GetTrainer(learner, lossFunc, errorFunc);
var evaluator = network.GetEvaluator(errorFunc);

// train the model
Console.WriteLine("Epoch\tTrain\tTrain\tTest");
Console.WriteLine("\tLoss\tError\tError");
Console.WriteLine("-----------------------------");

// the rest of the code goes here...

The GetTrainer method sets up a trainer which will track the loss and the error for the training partition. And GetEvaluator will set up an evaluator that tracks the error in the test partition.

Now I’m finally ready to start training the neural network!

I’ll add the following code:

var maxEpochs = 10;
var batchSize = 128;
var loss = new double[maxEpochs];
var trainingError = new double[maxEpochs];
var testingError = new double[maxEpochs];
var batchCount = 0;
for (int epoch = 0; epoch < maxEpochs; epoch++)
{
    // training and testing code goes here...
}

// show final results
var finalError = testingError[maxEpochs-1];
Console.WriteLine();
Console.WriteLine($"Final test error: {finalError:0.00}");
Console.WriteLine($"Final test accuracy: {1 - finalError:0.00}");

// plotting code goes here...

I am training the network for 10 epochs using a batch size of 128. During training I’ll track the loss and errors in the loss, trainingError and testingError arrays.

Once training is done, I show the final testing error on the console. This is the percentage of mistakes the network makes when predicting review sentiment.

Note that the error and the accuracy are related: accuracy = 1 — error. So I also report the final accuracy of the neural network.

Here’s the code to train the neural network. This code should go inside the for loop:

// train one epoch on batches
loss[epoch] = 0.0;
trainingError[epoch] = 0.0;
batchCount = 0;
training_data.Batch(batchSize, (data, begin, end) =>
{
    // get the current batch
    var featureBatch = features.GetSequenceBatch(sequenceLength, training_data, begin, end);
    var labelBatch = labels.GetBatch(training_labels, begin, end);
    // train the network on the batch
    var result = trainer.TrainBatch(
        new[] {
            (features, featureBatch),
            (labels,  labelBatch)
        },
        false
    );
    loss[epoch] += result.Loss;
    trainingError[epoch] += result.Evaluation;
    batchCount++;
});

// show results
loss[epoch] /= batchCount;
trainingError[epoch] /= batchCount;
Console.Write($"{epoch}\t{loss[epoch]:F3}\t{trainingError[epoch]:F3}\t");

// testing code goes here...

The Batch() call splits the data up in a collection of 128-record batches. The second argument to Batch() is a function that will be called for every batch.

Inside the batch function I first call GetSequenceBatch to get a feature batch containing 500-word sequences, and then I call GetBatch to get a corresponding label batch. Then I call TrainBatch to train the neural network on these two batches of training data.

The TrainBatch method returns the loss and error, but only for training on the 128-record batch. So I simply add up all these values and divide them by the number of batches in the dataset. That gives me the average loss and error for the predictions on the training partition during the current epoch, and I report this to the console.

So now I know the training loss and error for one single training epoch. The next step is to test the network by making predictions about the data in the testing partition and calculate the testing error.

This code should go inside the epoch loop and right below the training code:

// test one epoch on batches
testingError[epoch] = 0.0;
batchCount = 0;
testing_data.Batch(batchSize, (data, begin, end) =>
{
    // get the current batch for testing
    var featureBatch = features.GetSequenceBatch(sequenceLength, testing_data, begin, end);
    var labelBatch = labels.GetBatch(testing_labels, begin, end);
    // test the network on the batch
    testingError[epoch] += evaluator.TestBatch(
        new[] {
            (features, featureBatch),
            (labels,  labelBatch)
        }
    );
    batchCount++;
});
testingError[epoch] /= batchCount;
Console.WriteLine($"{testingError[epoch]:F3}");

I again call Batch to get a batch of testing records, and GetSequenceBatch and GetBatch to get the feature and label batches. But note that I’m now providing the testing_data and testing_labels arrays.

I call TestBatch to test the neural network on the 128-record test batch. The method returns the error for the batch, and I again add up the errors for each batch and divide by the number of batches.

That gives me the average error in the neural network predictions on the test partition for this epoch.

After training completes, the training and testing errors for each epoch will be available in the trainingError and testingError arrays. Let’s use XPlot to create a nice plot of the two error curves so I can check for overfitting:

// plot the error graph
var chart = Chart.Plot(
    new [] 
    {
        new Graph.Scatter()
        {
            x = Enumerable.Range(0, maxEpochs).ToArray(),
            y = trainingError.Select(v => 1 - v),
            name = "training",
            mode = "lines+markers"
        },
        new Graph.Scatter()
        {
            x = Enumerable.Range(0, maxEpochs).ToArray(),
            y = testingError.Select(v => 1 - v),
            name = "testing",
            mode = "lines+markers"
        }
    }
);
chart.WithOptions(new Layout.Layout() 
{
    yaxis = new Graph.Yaxis()
    {
        rangemode = "tozero"
    }
});
chart.WithXTitle("Epoch");
chart.WithYTitle("Accuracy");
chart.WithTitle("Movie Review Sentiment");

// save chart
File.WriteAllText("chart.html", chart.GetHtml());

This code creates a Plot with two Scatter graphs. The first one plots 1 — trainingError which is the training accuracy, and the second one plots 1 — testingError which is the testing accuracy.

Finally I use File.WriteAllText to write the plot to disk as a HTML file.

I am now ready to build the app. I’ll start with the CNTKUtil library:

$ cd ../CNTKUtil
$ dotnet build -o bin/Debug/netcoreapp3.0 -p:Platform=x64

This will build the CNKTUtil project. Note how I am specifying the x64 platform because the CNTK library requires a 64-bit build.

Now I can do the same in the LstmDemo folder:

$ cd ../LstmDemo
$ dotnet build -o bin/Debug/netcoreapp3.0 -p:Platform=x64

This will build the app. Note how I’m again specifying the x64 platform.

Now I can run the app:

$ dotnet run

The app will create the neural network, load the dataset, train the network on the data, and create a plot of the training and testing errors for each epoch.

Here’s what the running app looks like in my terminal window:

Note the compute device, CNTK has correctly detected my NVidia GeForce GTX 1060 graphics adapter and is using it to train the neural network.

Also note that the convolutional neural network is quite large with over 300k trainable parameters!

The app will write the training plot to disk in a new file called chart.html and it looks like this:

The accuracy curves look sort of okay. Both are a bit jittery but the curves stick close together and there doesn’t seem to be any overfitting. The final accuracy is 0.86 on training and 0.83 on testing.

This is a great result and very close to what I got earlier with my 1-dimensional convolutional network. And this time the network is not overfitting, so I could train for more epochs to try and get a better result.

Deep Learning With C# And CNTK

This code example is part of my online training course Deep Learning with C# and CNTK that teaches developers how to build machine learning applications in C# with Microsoft's CNTK deep learning library.

View The Training Course

I made this training course after I had completed a course on Tensorflow and Python. I started wondering if it would be possible to use C# code to weave neural network layers together to create advanced deep learning applications.

After a bit of research, I discovered the Microsoft CNTK library. Despite the fact that this library is end of life and is no longer supported by Microsoft, it can still be used to build advanced neural architectures.

So please check out this course if you like. It will get you up to speed on deep learning with C# and CNTK, and covers regression, classification, neural networks, convnets, CNNs and RNNs, transfer learning, finetuning, data augmentation, and much more.