By Yohei Kawaguchi
Research & Development Group, Hitachi, Ltd.
Nowadays, a shortage of skilled workers has become a serious societal issue that necessitates an automated system for machinery maintenance. Sound-monitoring systems for checking a machine's operational condition ("machine health" hereafter) have become promising for supporting or replacing skilled maintenance technicians who listen for sounds from machinery and judge the overall condition of a machine. A practical technology for "acoustic-scene classification" is necessary to properly detect anomalous sound from normal conditions.
In this post, we present a method for detecting anomalous sounds in real environments such as factories. In such environments, it is known that reverberation and background noise are mixed in an observed sound signal, so the detection performance is degraded. To solve that problem, the detection method must have robustness against reverberation and background noise. To attain that robustness, two types of architecture are available: "modular architecture" or "end-to-end architecture." Modular architecture, which consists of front-end modules and back-end modules working independently, is suitable for anomalous sound detection. Front-end modules perform acoustic-signal-processing procedures such as "denoising," while the back-end modules perform classification. Although the end-to-end architecture, which is a deep neural network with a high representation power can be totally optimized by using training data, it is unsuitable for sound-monitoring systems as sufficient amount of training data cannot be obtained. In addition, assuring the quality of the end-to-end architecture is a more difficult process than that of assuring the quality of the modular architecture.
To solve the problem of reverberation and background noise, our proposed architecture applies a front-end ensemble consisting of a blind-dereverberation (BD) algorithm and anomalous-sound-extraction (ASE) algorithms (See Figure 1). It can be expected that a dereverberation algorithm and a denoising algorithm can improve the detection performance. However, any algorithm has pros and cons, so it is impossible to choose the "best" front-end algorithm. Thus, as for our method, we applied an ensemble of multiple front-end algorithms. The BD algorithm proposed by Togami et al. *1 , which is suitable for unknown anomalous sounds, is used because it does not rely on training. In addition, multiple ASE algorithms based on non-negative matrix factorization (NMF) *2 , non-negative matrix underapproximation (NMU) *3 , and non-negative novelty extraction (NNE) *4 are used in parallel, and these algorithms complement each other.
We experimentally evaluated the improvement by applying the proposed method in anomaly detection for automated machines repeating a series of work (See Figure 2). Table 1 lists the "area under the curve" (AUC) for each autoencoder and each ensemble. In the table, each label from (A) to (H) corresponds to autoencoders in Figure 1. Comparing "w/o BD" with "w/ BD" in Table 1 reveals that BD improves the performance of anomalous sound detection. Moreover, comparing "w/o ASE" with the other ASEs in Table 1 reveals that the ASE algorithms also improve the performance of anomalous sound detection. In addition, when input SNR is -10 dB or -15 dB, the AUCs of the proposed ensemble are higher than those of the best combination of BD and ASE, whereas those of the ensembles (A)-(H), (E)-(H), and (B)-(D) are relatively low. These results indicate that the front-end algorithms in the proposed ensemble complement each other in noisy cases.
Our results indicated that the proposed method consisting of the BD algorithm, ASE algorithms, and ensemble architecture, significantly improves detection performance and solves the problem of reverberation and background noise. Therefore, the proposed method, namely, an ensemble of dereverberation and anomalous sound extraction, is promising for making anomalous sound detection more practical.
*1 M. Togami, Y. Kawaguchi, R. Takeda, Y. Obuchi, and N. Nukaga, "Optimized speech dereverberation from probabilistic perspective for time varying acoustic transfer function," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 21, no. 7, pp. 1369–1380, July 2013.
*2 D.D. Lee and H.S. Seung, "Learning the parts of objects with nonnegative matrix factorization," Nature, vol. 401, no. 6755, pp. 788–791, Nov. 1999.
*3 M. Tepper and G. Sapiro, "Nonnegative matrix underapproximation for robust multiple model fitting," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nov. 2017, pp. 655–663.
*4 Y. Kawaguchi, T. Endo, K. Ichige, and K. Hamada, "Non-negative novelty extraction: A new non-negativity constraint for NMF," in Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), Sept. 2018, pp. 256–260.