Departamento de Ciencias de la Computación
Permanent URI for this community
Browse
Browsing Departamento de Ciencias de la Computación by Description "Tesis de maestría"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Análisis del dolor crónico en pacientes adultos mediante la exploración espacio-temporal de las expresiones faciales(Universidad Católica San Pablo, 2021) Mauricio Condori, Manasses Antoni; Camara Chavez, GuillermoLa tesis se centra en cuantificar del dolor de pacientes adultos (entre 25 y 65 años) mediante el aprendizaje de las expresiones faciales utilizando técnicas de aprendizaje profundo. El principal aporte del trabajo es considerar el ciclo de respuesta ante un estímulo sobre la zona de dolor. De esta manera, se pueden evaluar los picos de dolor a lo largo de una secuencia, no solo a nivel de fotogramas. Se emplea la base de datos shoulder-pain-expression de la Universidad de McMaster, debido a que es la predilecta por el estado del arte. La base de datos presenta varios retos, como el desbalance de las categorías o errores en la recolección de datos. Las secuencias de vídeos se dividen en fragmentos, para luego aplicar políticas de balance de datos. El pre-procesamiento incluye cambio de tamaño, normalización de la iluminación y el tratamiento de rostros (detección, segmentación y frontalización). Una CNN extrae características por fotograma (espacial) y una RNN las procesa (temporal) para inferir el nivel de dolor del paciente. Los resultados superan el estado del arte en el análisis por fotograma (MAE: 0.4798, MSE: 0.5801, PCC: 0.7076, ICC: 0.5829 y ACC: 0.8921) y en secuencias (MAE: 0.4772, MSE: 0.6030, PCC: 0.8281, ICC: 0.7542, ACC: 0,8777).Item Deep learning models for spatial prediction of fine particulate matter(Universidad Católica San Pablo, 2023) Colchado Soncco, Luis Ernesto; Ochoa Luna, Jose EduardoStudies indicate that air pollutant concentrations affect human health. Especially, Fine Particulate Matter (PM2.5) is the most dangerous pollutant because this is related to cardiovascular and respiratory diseases, among others. Therefore, governments must monitor and control pollutant concentrations. To this end, many of them have implemented Air quality monitoring (AQM) networks. However, AQM stations are usually spatially sparse due to their high costs in implementation and maintenance, leaving large áreas without a measure of pollution. Numerical models based on the simulation of diffusion and reaction process of air pollutants have been proposed to infer their spatial distribution. However, these models often require an extensive inventory of data and variables, as well as high-end computing hardware. In this research, we propose two deep learning models. The first is a generative model called Conditional Generative adversarial Network (cGAN). Additionally, we add a loss based on the predicted observation and the k nearest neighbor stations to smooth the randomness of adversarial learning. This variation is called Spatial-learning cGAN (cGANSL), which got better performance for spatial prediction. To interpolate PM2.5 on a location, cGANSL and classical methods like Inverse Distance Weighting (IDW) need to select the k nearest neighbor stations based on straight distance. However, this selection may leave out data from more distant neighbors that could provide valuable information. In this sense, the second proposed model in this study is a Neural Network with an attention-based layer. This model uses a recently proposed attention layer to build a structured graph of the AQM stations, where each station is a graph node to weight the k nearest neighbors for nodes based on attention kernels. The learned attention layer can generate a transformed feature representation for unobserved location, which is further processed by a neural network to infer the pollutant concentration. Based on data from AQM network in Beijing, meteorological conditions, and information from satellite products such as vegetation index (NDVI) and human activity or population-based on Nighttime Light producto (NTL). The cGANSL had a better performance than IDW, Ordinary Kriging (OK), and Neural Network with an attention mechanism. In this experiment, spatial prediction models that selected the k nearest neighbors had a good performance. That may be AQM station Beijing’s high correlation between them. However, using data from the AQM network of Sao Paulo, where AQM stations have a low correlation, the Neural network with an attention-based layer have better performance than IDW, OK, and cGANSL. Besides, the normalized attention weights computed by our attention model showed that in some cases, the attention given to the nearest nodes is independent of their spatial distances. Therefore, the attention model is more flexible since it can learn to interpolate PM2.5 concentration levels based on the available data of the AQM network and some context information. Finally, we found that NDVI and NTL are high related to air pollutant concentration predicted by the attention model.Item Detección automática personalizada de la intensidad del dolor de expresiones faciales en video usando multitask learning(Universidad Católica San Pablo, 2023) Quispe Pinares, Jefferson; Camara Chavez, GuillermoLos métodos de Aprendizaje Profundo han logrado resultados impresionantes en varias tareas complejas como la estimación del dolor a partir de expresiones faciales en videos (secuencias de frames). La estimación de dolor es difícil de medir, debido a que es subjetiva y a las características propias de cada persona. Sin embargo, su estimaci´on es importante para procesos de evaluación clínica. Este trabajo de investigación propone la estimación de la intensidad del dolor automático a través de dos etapas: 1) mediante un enfoque de frame-level usando Convolutional Neural Network, (CNN) con Transferencia de Aprendizaje de un modelo preentrenado de rostros con un módulo de Atención Espacial y modelos secuenciales usando Recurrent Neural Network (RNN) para obtener una estimación más precisa del dolor; 2) estimación de la medida del dolor usando Visual Analog Score (VAS) y las otras escalas de dolor mediante Multitask Learning (MTL) personalizado con frame-level obtenido de la primera etapa con características personales de un individuo; lo que nos permite lograr resultados importantes de dolor por sequence-level. El uso del enfoque de MTL para personalizar las estimaciones mediante la realización de múltiples tareas en grupos de personas similares junto a semejantes tareas, proporciona mejoras importantes en el rendimiento de la predicción del VAS. La mejora en la precisión es notable con respecto a los modelos no personalizados obteniendo 2.25 usando la métrica MAE y 0.47 en ICC usando el modelo denominado PSPI+PF Personalized Multitask. Por otro lado tenemos los datos obtenidos de la base de datos reales para entrenar, el cual es de 2.17 usando la m´etrica MAE y 0.51 de ICC según el modelo PSPI (GT) Personalized Multitask.Item Multimodal unconstrained people recognition with face and ear images using deep learning(Universidad Católica San pablo, 2023) Ramos Cooper, Solange Griselly; Camara Chavez, GuillermoMultibiometric systems rely on the idea of combining multiple biometric methods into one single process that leads to a more reliable and accurate system. The combination of two different biometric traits such as face and ear results in an advantageous and complementary process when using 2D images taken under uncontrolled conditions. In this work, we investigate several approaches to fuse information from the face and ear images to recognize people in a more accurate manner than using each method separately. We leverage the research maturity level of the face recognition field to build, first a truly multimodal database of ear and face images called VGGFace-Ear dataset, second a model that can describe ear images with high generalization called VGGEar model, and finally explore fusion strategies at two different levels in a common recognition pipeline, feature and score levels. Experiments on the UERC dataset have shown, first of all, an improvement of around 7% compared to the state-of-the-art methods in the ear recognition field. Second, fusing information from the face and ear images increases recognition rates from 79% and 82%, in the unimodal face and ear recognition respectively, to 94% recognition rate using the Rank-1 metric.Item Polyp image segmentation with polyp2seg(Universidad Católica San Pablo, 2023) Mandujano Cornejo, Vittorino; Montoya Zegarra, Javier AlexanderColorectal cancer (CRC) is the third most common type of cancer worldwide. It can be prevented by screening the colon and detecting polyps which might become malign. Therefore, an accurate diagnosis of polyps in colonoscopy images is crucial for CRC prevention. The introduction of computational techniques, well known as Computed Aided Diagnosis, facilitates diffusion and improves early recognition of potentially cancerous tissues. In this work, we propose a novel hybrid deep learning architecture for polyp image segmentation named Polyp2Seg. The model adopts a transformer architecture as its encoder to extract multi-hierarchical features. Additionally, a novel Feature Aggregation Module (FAM) merges progressively the multilevel features from the encoder to better localise polyps by adding semantic information. Next, a Multi-Context Attention Module (MCAM) removes noise and other artifacts, while incorporating a multi-scale attention mechanism to further improve polyp detection. Quantitative and qualitative experiments on five challenging datasets and over 5 different SOTAs demonstrate that our method significantly improves the segmentation accuracy of Polyps under different evaluation metrics. Our model achieves a new state-of the-art over most of the datasets.Item Priority sampling and visual attention for self-driving car(Universidad Católica San pablo, 2023) Flores Benites, Victor; Mora Colque, Rensso Victor HugoEnd-to-end methods facilitate the development of self-driving models by employing a single network that learns the human driving style from examples. However, these models face problems such as distributional shift, causal confusion, and high variance. To address these problems we propose two techniques. First, we propose the priority sampling algorithm, which biases a training sampling towards unknown observations for the model. Priority sampling employs a trade-off strategy that incentivizes the training algorithm to explore the whole dataset. Our results show a reduction of the error in the control signals in all the models studied. Moreover, we show evidence that our algorithm limits overtraining on noisy training samples. As a second approach, we propose a model based on the theory of visual attention (Bundesen, 1990) by which selecting relevant visual information to build an optimal environment representation. Our model employs two visual information selection mechanisms: spatial and feature-based attention. Spatial attention selects regions with visual encoding similar to contextual encoding, while feature-based attention selects features disentangled with useful information for routine driving. Furthermore, we encourage the model to recognize new sources of visual information by adding a bottom-up input. Results in the CoRL-2017 dataset (Dosovitskiy et al., 2017) show that our spatial attention mechanism recognizes regions relevant to the driving task. Our model builds disentangled features with low cosine similarity, but with high representation similarity. Finally, we report performance improvements over traditional end-to-end models.Item Weakly supervised spatiotemporal violence detection in surveillance video(Universidad Católica San pablo, 2023) Choqueluque Roman, David Gabriel; Camara Chavez, GuillermoViolence Detection in surveillance video is an important task to prevent social and personal security issues. Usually, traditional surveillance systems need a human operator to monitor a large number of cameras, leading to problems such as miss detections and false positive detections. To address this problem, in last years, researchers have been proposing computer vision-based methods to detect violent actions. The violence detection task could be considered a sub-task of the action recognition task but violence detection has been less investigated. Although a lot of action recognition works were proposed for human behavior analysis, there are just a few CCTV-based surveillance methods for analyzing violent actions. In the literature of violence detection, most of the methods tackle the problem as a classication task, where a short video is labeled as violent or non-violent. Just a few methods tackle the problem as a spatiotemporal detection task, where the method should detect spatially and temporally violent actions. We assume that the lack of such methods is due the exorbitant cost of annotating, at frame-level, current violence datasets. In this work, we propose a spatiotemporal violence detection method using a weakly supervised approach to train the model using only video-level labels. Our proposal uses a Deep Learning model following a Fast-RCNN (Girshick, 2015) style architecture extended temporally. Our method starts by generating spatiotemporal proposals leveraging a pre-trained person detector and motion appearance to build such proposals called action tubes. An action tube is dened as a set of temporally related bounding boxes that enclose and track a person doing an action. Then, a video with the action tubes is fed to the model to extract spatiotemporal features, and nally, we train a tube classier based on Multiple-instance learning (Liu et al., 2012). The spatial localization relies on the pre-trained person detector and motion regions extracted from dynamic images (Bilen et al., 2017). A dynamic image summarizes the movement of a set of frames to an image. Meanwhile, temporal localization is done by the action tubes by grouping spatial regions over time. We evaluate the proposed method on four publicly available datasets such as Hockey Fight, RWF-2000, RLVSD and UCFCrime2Local. Our proposal achieves an accuracy score of 97:3%, 88:71%, and 92:88% for violence detection in the Hockey Fight, RWF-2000, and RLVSD datasets, respectively; which are very close to the state-of-the-art methods. Besides, our method is able to detect spatial locations in video frames. To validate our spatiotemporal violence detection results, we use the UCFCrime2Local dataset. The proposed approach reduces the spatiotemporal localization error to 31:92%, which demonstrates the feasibility of the approach to detect and track violent actions.