Análise de conteúdo de vídeo por meio de aprendizado ativo

AUTOR(ES)
DATA DE PUBLICAÇÃO

2007

RESUMO

Advances in compression techniques, decreasing cost of storage, and high-speed transmission have facilitated the way videos are created, stored and distributed. As a consequence, videos are now being used in many applications areas. The increase in the amount of video data deployed and used in today s applications reveals not only the importance as multimedia data type, but also led to the requirement of efficient management of video data. This management paved the way for new research areas, such as indexing and retrieval of video with respect to their spatio-temporal, visual and semantic contents. This thesis presents work towards a unified framework for semi-automated video indexing and interactive retrieval. To create an efficient index, a set of representative key frames are selected which capture and encapsulate the entire video content. This is achieved by, firstly, segmenting the video into its constituent shots and, secondly, selecting an optimal number of frames between the identified shot boundaries. We first developed an automatic segmentation algorithm (shot boundary detection) to get rid of parameters and thresholds, we explore a supervised classification method. We adopted a SVM classifier due to its ability to use very high dimensional feature spaces (using the kernel trick) while at the same time keeping strong generalization guarantees from a few training examples. We deeply evaluated the combination of features and kernels in the whole data set. We evaluate the performance of our classifier with different kernel functions. Our experiments, strictly following the TRECVID Evaluation protocol. We present the results obtained, for shot extraction TRECVID 2006 Task. We provide good results dealing with a large amount of features thanks to our kernel-based SVM classifier method. The next step after segmentation is the key frame extraction. They will be selected to minimize representational redundancy whilst still portraying the content in each shot, i.e., selecting an optimal number of frames between the identified shot boundaries. We propose an interactive video retrieval system: RETINVID based on RETIN system, a content-based search engine image retrieval. The goal of active learning when applied to indexing is to significantly reduce the number of key frames annotated by the user. We use active learning to aid in the semantic labeling of video databases. The learning approach proposes sample key-frame(s) of a video to the user for annotation and updates the database with the new annotations. It then uses its accumulative knowledge to propagate the labels to the rest of the database, after which it proposes new key frames samples for the user to annotate. The sample key frames are selected based on their ability to increase the knowledge gained by the system. Therefore, we have chosen an active learning approach because of its capacity to retrieve complex categories, specifically through the use of kernel functions. The lack of training data, the unbalance of the classes and the size of the feature vectors can be overcome by active learning. We perform an experiment against the 2005 TRECVID benchmark in the high-level task.

ASSUNTO(S)

computação teses. indexação automatica teses. sistemas de recuperação da informação teses.

Documentos Relacionados