Since sequences can represent such diverse data, the subsequence matching problem is relevant for several interesting applications. An example is sign spotting. Here, the input sequence is a video of a specific sign from some sign language, e.g., American Sign Language (ASL). The database contains sign language videos. The goal is to identify spots in the database videos where the input sign occurs.
Another example application is query-by-humming. Imagine having a melody in your mind, that you can't remember (or never knew in the first place) what song it is from. In a query-by-humming system, the user gives that melody as input to the system, by simply humming that melody on the computer's microphone. The system contains a large database of melodies, and it identifies the database melodies that contain parts similar to what the user hummed.
In this project, our goal was to different design methods, sometimes more generally applicable, sometimes applicable to more specific types of data, that can improve the state-of-the-art in subsequence matching. Improving the state-of-the-art means improving accuracy and efficiency in such systems, so that the system identifies more often the correct results, and the system produces those results to the user faster.
We have introduced an embedding-based framework for subsequence matching in time-series databases that improves the efficiency of processing subsequence matching queries under the Dynamic Time Warping (DTW) distance measure. This framework partially reduces subsequence matching to vector matching, using an embedding that maps each query sequence to a vector and each database time series into a sequence of vectors. The database embedding is computed offline, as a preprocessing step. At runtime, given a query object, an embedding of that object is computed online. Relatively few areas of interest are efficiently identified in the database sequences by comparing the embedding of the query with the database vectors. Those areas of interest are then fully explored using the exact DTW-based subsequence matching algorithm.
We have developed a method called reference-based string alignment for efficient subsequence matching of strings, under the edit distance and the Smith-Waterman similarity measure. Our method operates under the assumption that the query and its optimal subsequence match do not differ by more than 15% (as a fraction of the query length). We have obtained state-of-the-art indexing performance, as measured on a DNA dataset where the database contains a sequence of over 23 million letters. Our method outperforms BLAST for query lengths ranging from 40 to 10000, and the improvement is about 2 orders of magnitude for query lengths of 10000.
We have developed a novel subsequence matching method that can be used for identifying occurrences of gestures and signs in video databases and in streaming video. The key novelty of this method is that it does not require accurate hand detection. Instead of assuming that a hand detector module detects the hands correctly at each frame, we make the milder assumption that the hand detection outputs, at each frame, a set of candidate hand locations (e.g., 10 or 20 candidate locations). Feature vectors are extracted from each of those locations. Given a gesture model, our method identifies the sequence of candidate feature vectors (selecting one candidate from each frame) that best matches the model. Although the number of sequences of candidate feature vectors is exponential to the length of the video, our algorithm finds the best matching sequence in polynomial time, using dynamic programming.
We have defined a novel metric measure for time series, called MSM (move-split-merge). This metric uses as building blocks three fundamental operations: Move, Split, and Merge, which can be applied in sequence to transform any time series into any other time series. A Move operation changes the value of a single element, a Split operation converts a single element into two consecutive elements, and a Merge operation merges two consecutive elements into one. Each operation has an associated cost, and the MSM distance between two time series is defined to be the cost of the cheapest sequence of operations that transforms the first time series into the second one.
We have defined a property of distance measures that we called "consistency", which is satisfied by a diverse set of metrics, such as the Euclidean distance, ERP, Levenshtein distance, Frechet distance. We have proposed an indexing method that can be applied on top of any consistent metric. An attractive feature of this method is that it can be applied on top of several different metrics, including metrics that may be proposed in the future, as long as they satisfy the consistency property. This is in contrast to several existing methods (including some of our own methods) which work only with specific distance measures.
We have developed a computer vision method for detecting falls in videos captured using a depth camera (such as the popular Kinect camera from Microsoft). Detecting falls can be seen as a specific application within the subsequence matching framework, as fall detection involves identifying time series subsequences of interest (i.e., corresponding to falls) within a larger sequence (the entire video), and the start, end, and length of these subsequences of interest is not known a priori. The key distinguishing feature of our method is that it requires only a single camera, and that it does not require the training examples to be captured from the same viewpoint as the test sequences. For example, in our method, moving the camera or turning it to a somewhat different viewpoint does not require capturing new training data or recalibrating our system in any way. This is in contrast to several existing vision-based methods, where moving a camera oftentimes requires time-consuming recalibration of the system (which cannot even be performed by non-technical users) and oftentimes also requires capturing new training data from the new camera position and viewpoint.