Ensembling is one approach that improves the performance of a neural network
by combining a number of independent neural networks, usually by either
averaging or summing up their individual outputs. We modify this ensembling
approach by training the sub-networks concurrently instead of independently.
This concurrent training of sub-networks leads them to cooperate with each
other, and we refer to them as “cooperative ensemble”. Meanwhile, the
mixture-of-experts approach improves a neural network performance by dividing
up a given dataset to its sub-networks. It then uses a gating network that
assigns a specialization to each of its sub-networks called “experts”. We
improve on these aforementioned ways for combining a group of neural networks
by using a k-Winners-Take-All (kWTA) activation function, that acts as the
combination method for the outputs of each sub-network in the ensemble. We
refer to this proposed model as “kWTA ensemble neural networks” (kWTA-ENN).
With the kWTA activation function, the losing neurons of the sub-networks are
inhibited while the winning neurons are retained. This results in sub-networks
having some form of specialization but also sharing knowledge with one another.
We compare our approach with the cooperative ensemble and mixture-of-experts,
where we used a feed-forward neural network with one hidden layer having 100
neurons as the sub-network architecture. Our approach yields a better
performance compared to the baseline models, reaching the following test
accuracies on benchmark datasets: 98.34% on MNIST, 88.06% on Fashion-MNIST,
91.56% on KMNIST, and 95.97% on WDBC.

Improving Neural Network Performance through Cooperative Ensemble and kWTA Activation Function

Ensembling is a popular technique used to enhance the performance of neural networks by combining the outputs of multiple independent models. Traditionally, ensembling involves training these models independently and then averaging or summing their outputs. However, in this article, we introduce a novel approach called “cooperative ensemble” where the sub-networks are trained concurrently, leading to cooperation among them.

The concurrent training of sub-networks in the cooperative ensemble model allows them to work together, leveraging their individual strengths while compensating for each other’s weaknesses. This multi-disciplinary concept combines principles from neural network architecture and cooperative learning strategies.

Furthermore, we propose a modification to the traditional mixture-of-experts approach. In this approach, the dataset is divided into sub-networks, also known as “experts”. The sub-networks are specialized in certain aspects of the data and are assigned a unique role. However, unlike the traditional method, we introduce a gating network that assigns a specialization to each sub-network. This gating network plays a crucial role in deciding how to distribute the data among the experts.

Expanding on these ideas, we introduce the concept of using a k-Winners-Take-All (kWTA) activation function as a method for combining the outputs of each sub-network in the ensemble. The kWTA activation function inhibits losing neurons while retaining winning neurons within each sub-network. This creates a form of specialization within individual sub-networks while enabling knowledge sharing among them. This unique combination of concepts from neural network activation functions and ensembling techniques makes our proposed model, called “kWTA ensemble neural networks” (kWTA-ENN), particularly powerful.

In our experiments, we compared the performance of our kWTA-ENN model with both the cooperative ensemble and mixture-of-experts approaches. We utilized a feed-forward neural network with one hidden layer comprising of 100 neurons as the sub-network architecture. The results were promising, with our approach achieving significantly higher test accuracies on benchmark datasets.

On the MNIST dataset, our kWTA-ENN model achieved an impressive test accuracy of 98.34%. Similarly, on the Fashion-MNIST dataset, it attained 88.06% accuracy. The performance on the KMNIST dataset was also noteworthy at 91.56%. Finally, our model achieved a test accuracy of 95.97% on the WDBC dataset.

In conclusion, our multi-disciplinary approach combining cooperative ensemble techniques, mixture-of-experts methods, and the innovative kWTA activation function has proven to be highly effective in improving the performance of neural networks. With its ability to balance specialization and knowledge sharing among sub-networks, the kWTA-ENN model holds great promise for various applications in the field of machine learning.

Read the original article