To reduce the size of datasets generated from scientific experiments, computer programmers use algorithms that can find and extract the principal features that represent the most salient statistical properties. But such algorithms cannot be applied directly to these large volumes of data.
Doctoral student Reza Oftadeh, who is advised by Dr. Dylan Shell, developed an algorithm applicable to large datasets. It is a useful machine-learning tool because it can extract and directly order features from most salient to least.
“There are many ad hoc ways to extract these features using machine-learning algorithms, but we now have a fully rigorous theoretical proof that our model can find and extract these prominent features from the data simultaneously, doing so in one pass of the algorithm,” said Oftadeh.
To make a more intelligent algorithm, the researchers propose adding a new cost function to the network that provides the exact location of the features directly ordered by their relative importance. Once incorporated, their method results in more efficient processing that can be fed bigger datasets to perform classic data analysis.
This research was funded by the National Science Foundation and the U.S. Army Research Office Young Investigator Award.