Theoretical work

During the last three years the main direction of my science work is investigating overfitting phenomena and estimating the amount of information contained in feature description of the object. Below, there given short description of obtained results and some directions for future work.

Convex stabilizer. It is clear that the use of several classificators for solving pattern recognition task increases the reliability of the result making the final algorithm less overtrained. As direct estimation of overfitting degree of each method is unavailable, there is a need of developing another way of discovering the fact of overtraining. A gradient of posterior probability estimate is proposed as a possible mean. Aggregation scheme is required to classify correctly all objects from the control sample and to have maximum stability in the current point. This results to a convex combination of estimates returned by the initial algorithms where the coefficients of the combination depend on the location of given object with respect to objects from control sample, local effectiveness of corresponding algorithm and its local stability. You can download details in English and Russian here.

Program complex for designing fuzzy expert systems.Detailed information about it you can find here.

Universal system of pattern recognition and forecasting.This work has been done by collective of researchers from Computing Centre of the Russian Academy of Sciences and "Solutions Ltd." firm from 2002 under the direction of academician Yuri Zhuravlev and Dr. Vladimir Ryazanov. The main task is to unite different approaches to pattern recognition which exist at the present time in one unified program system. Special attention was paid to different schemes of classifier fusion which allow to unite best properties of each algorithmic family for obtaining the best possible result. Currently the work over this project is nearly finished. More detailed information you can find in the site of Solutions Ltd. This work was supported by Russian foundation for basic research (grants 02-01-08007, 02-01-00558, 03-01-00580) and foundation of assistance to small innovative enterprises (contract 1680p/3566).

Regularization of machine learning procedure.The task of training in pattern recognition is ill-posed from mathematical point of view. In particular, the absence of demand for uniqueness and stability leads to unstable classification and overfitting. At the same time in mathematics there known methods for solving such ill-posed tasks. Regularization methods allow to change the task making it well-posed. The obtained solution, generally saying, will not be strictly correct from theoretical point of view, but will be significantly more valuable in practice. It seems obvious that when we train pattern recognition algorithm, we may neglect wrong classification of some part of the objects if the received solution will have better generalization capability. By changing quality functional (adding to it a regularizer) we reject the demand of minimizing the rate of errors on the training sample paying also attention to stable (in the sense of small changes of objects' coordinates) classification.
In 2004 I've carried out some research work together with Dmitry Kropotov and 4th year student from Moscow University Igor Tolstov on regularizing decision trees method. The results of experiments showed that regularization based on stability principle in many cases outperforms alternative ways of generalization control. In the nearest future I am planning to use the concept of Bayesian regularization (somewhat like Relevance Vector Machines) and as a consequence the idea of stability with respect to classifier's parameters for further study. I would like also to extend the maximal evidence principle for some algorithmic families (e.g. to use it for kernel function selection in SVM). Some of already achieved results you may find in "Articles" section.

Probabilistic filtration of signals. One of the most important tasks arising in digital signal processing is signals filtration or noise detection and removal with minimal distortion of useful part of the signal. The most methods of low-pass filtration use some kind of signals smoothing which leads to the distortion of some local (but sometimes the most important for further analysis) peculiarities of signal. Probabilistic filtration, based on estimating likelihood of points presence in signal, deals with a signal like with random process and allows to avoid the distortion. Those parts of signal which are identified as noise are removed from further consideration while the filtered signal is reconstructed with interpolation methods. The parts which are free of noise remain unchanged and hence keep all details. This method was developed and successfully used in drug intoxication detection according to pupillograms (signals which show the size of pupil after the light flare). The work was done in collaboration with Iritech inc. which kindly allowed to publish the part of received results.

Estimation of minimal possible rate of classification errors. This is one of directions of my future work. Currently, alas, there is only conceptual basis for further research. Probably for any recognition task there exists maximal possible level of correct recognition. You can't jump over your head. If the features are not informative enough, we can't achieve 100% of correct answers on objects taken from the universal set fundamentally. An easy example: if we try to separate two one-dimensional normal samples with means 0 and 2 and unite variance, we won't get higher than 85% of correctly recognized objects under any conditions. Unfortunately methods of mathematical statistics require a lot of prior assumptions which can't be checked in practice when we have small samples, high dimensions etc. At the same time practical experience shows that the use of rich enough families of classifiers of different nature leads to approximately the same level of errors. This is indirect confirmation of the fact that for any task the approximate level of errors rate can be estimated not depending on the algorithms which will be used. I think methods of information theory could help here. If you have any thoughts on this aspect, I would be very grateful if you could share them with me.

Hosted by uCoz