Conferences / Workshops
A. Salvador*, N. Hynes*, Y. Aytar, J. Marín, F. Ofli, I. Weber, A. Torralba * denotes equal contribution
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
E. Kocabey, M. Camurcu, F. Ofli, Y. Aytar, J. Marín, A. Torralba, I. Weber
International AAAI Conference on Web and Social Media
(ICWSM), 2017
Javier Marín, David Vázquez, Antonio M. López, Jaume Amores, Bastian Leibe
IEEE International Conference on Computer Vision, (ICCV), 2013
@InProceedings{Jmarin2013,
author = "Javier Marin, David Vázquez, Antonio M. López, Jaume Amores, Bastian Leibe",
title = "Random Forests of Local Experts for Pedestrian Detection",
booktitle = "IEEE International Conference on Computer Vision",
year = "2013",
}
Pedestrian detection is one of the most challenging tasks in computer vision, and has received a lot of attention in the last years. Recently, some authors have shown the advantages of using combinations of part/patch-based detectors in order to cope with the large variability of poses and the existence of partial occlusions. In this paper, we propose a pedestrian detection method that efficiently combines multiple local experts by means of a Random Forest ensemble. The proposed method works with rich block-based representations such as HOG and LBP, in such a way that the same features are reused by the multiple local experts, so that no extra computational cost is needed with respect to a holistic method. Furthermore, we demonstrate how to integrate the proposed approach with a cascaded architecture in order to achieve not only high accuracy but also an acceptable efficiency. In particular, the resulting detector operates at five frames per second using a laptop machine. We tested the proposed method with well-known challenging datasets such as Caltech, ETH, Daimler, and INRIA. The method proposed in this work consistently ranks among the top performers in all the datasets, being either the best method or having a small difference with the best one.
Jiaolong Xu, David Vázquez, Antonio M. López, Javier Marín, Daniel Ponsa
IEEE Intelligent Vehicles Symposium (IV), 2013
@InProceedings{xvl2013a,
author = {Jiaolong Xu and David Vázquez and Antonio L&ocutepez and Javier Mar&icuten and Daniel Ponsa},
title = {Learning a Multiview Part-based Model in Virtual World for Pedestrian Detection},
booktitle = {IEEE Intelligent Vehicles Symposium},
year = {2013}
}
State-of-the-art deformable part-based models based on latent SVM have shown excellent results on human detection. In this paper, we propose to train a multiview deformable part-based model with automatically generated part examples from virtual-world data. The method is efficient as: (i) the part detectors are trained with precisely extracted virtual examples, thus no latent learning is needed, (ii) the multiview pedestrian detector enhances the performance of the pedestrian root model, (iii) a top-down approach is used for part detection which reduces the searching space. We evaluate our model on Daimler and Karlsruhe Pedestrian Benchmarks with publicly available Caltech pedestrian detection evaluation framework and the result outperforms the state-of-the-art latent SVM V4.0, on both average miss rate and speed (our detector is ten times faster).
David Vázquez, Antonio M. López, Daniel Ponsa, Javier Marín
NIPS Domain Adaptation Workshop: Theory and Application (NIPSW), 2011
@InProceedings{VLP2011b,
author = {David Vazquez and Antonio Lopez and Daniel Ponsa and Javier Marin},
title = {Cool world: domain adaptation of virtual and real worlds for human detection using active learning},
booktitle = {NIPS Domain Adaptation Workshop: Theory and Application},
year = {2011}
}
Image based human detection is of paramount interest for different applications. The most promising human detectors rely on discriminatively learnt classifiers, i.e., trained with labelled samples. However, labelling is a manual intensive task, especially in cases like human detection where it is necessary to provide at least bounding boxes framing the humans for training. To overcome such problem, in Marin et al. we have proposed the use of a virtual world where the labels of the different objects are obtained automatically. This means that the human models (classifiers) are learnt using the appearance of realistic computer graphics. Later, these models are used for human detection in images of the real world. The results of this technique are surprisingly good. However, these are not always as good as the classical approach of training and testing with data coming from the same camera and the same type of scenario. Accordingly, in Vazquez et al. we cast the problem as one of supervised domain adaptation. In doing so, we assume that a small amount of manually labelled samples from real-world images is required. To collect these labelled samples we use an active learning technique. Thus, ultimately our human model is learnt by the combination of virtual- and real-world labelled samples which, to the best of our knowledge, was not done before. Here, we term such combined space cool world. In this extended abstract we summarize our proposal, and include quantitative results from Vazquez et al. showing its validity.
David Vázquez, Antonio M. López, Daniel Ponsa, Javier Marín
International Conference on Multimodal Interaction (ICMI), 2011
@InProceedings{VLP2011a,
author = {David Vazquez and Antonio Lopez and Daniel Ponsa and Javier Marin},
title = {Virtual Worlds and Active Learning for Human Detection},
booktitle = {13th International Conference on Multimodal Interaction},
year = {2011}
}
Image based human detection is of paramount interest due to its potential applications in fields such as advanced driving assistance, surveillance and media analysis. However, even detecting non-occluded standing humans remains a challenge of intensive research. The most promising human detectors rely on classifiers developed in the discriminative paradigm, i.e., trained with labelled samples. However, labeling is a manual intensive step, especially in cases like human detection where it is necessary to provide at least bounding boxes framing the humans for training. To overcome such problem, some authors have proposed the use of a virtual world where the labels of the different objects are obtained automatically. This means that the human models (classifiers) are learnt using the appearance of rendered images, i.e., using realistic computer graphics. Later, these models are used for human detection in images of the real world. The results of this technique are surprisingly good. However, these are not always as good as the classical approach of training and testing with data coming from the same camera, or similar ones. Accordingly, in this paper we address the challenge of using a virtual world for gathering (while playing a videogame) a large amount of automatically labelled samples (virtual humans and background) and then training a classifier that performs equal, in real-world images, than the one obtained by equally training from manually labelled real-world samples. For doing that, we cast the problem as one of domain adaptation. In doing so, we assume that a small amount of manually labelled samples from real-world images is required. To collect these labelled samples we propose a non-standard active learning technique. Therefore, ultimately our human model is learnt by the combination of virtual and real world labelled samples (Fig. 1), which has not been done before. We present quantitative results showing that this approach is valid.
Javier Marín, David Vázquez, David Gerónimo, Antonio M. López
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010
@InProceedings{MVG2010,
author = {Javier Marin and David Vazquez and David Geronimo and Antonio Lopez},
title = {Learning Appearance in Virtual Scenarios for Pedestrian Detection},
booktitle = {23rd IEEE Conference on Computer Vision and Pattern Recognition},
year = {2010}
}
Detecting pedestrians in images is a key functionality to avoid vehicle-to-pedestrian collisions. The most promising detectors rely on appearance-based pedestrian classifiers trained with labelled samples. This paper addresses the following question: can a pedestrian appearance model learnt in virtual scenarios work successfully for pedestrian detection in real images? (Fig. 1). Our experiments suggest a positive answer, which is a new and relevant conclusion for research in pedestrian detection. More specifically, we record training sequences in virtual scenarios and then appearance-based pedestrian classifiers are learnt using HOG and linear SVM. We test such classifiers in a publicly available dataset provided by Daimler AG for pedestrian detection benchmarking. This dataset contains real world images acquired from a moving car. The obtained result is compared with the one given by a classifier learnt using samples coming from real images. The comparison reveals that, although virtual samples were not specially selected, both virtual and real based training give rise to classifiers of similar performance.
Journals
J. Marín*, A. Biswas*, F. Ofli, N. Hynes, A. Salvador, Y. Aytar, I. Weber, A. Torralba * denotes equal contribution
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
David Vázquez, Antonio M. López, Javier Marín, Daniel Ponsa and David Gerónimo
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014
@InProceedings{Dvazquez2014,
author = "Vázquez, D. and Marín, J. and López, A. M. and Ponsa, D. and Gerónimo, D."
title = "Virtual and Real World Adaptation for Pedestrian Detection",
journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
volume = "36(4)",
pages = "797 - 809",
year = "2014",
}
Pedestrian detection is of paramount interest for many applications. Most promising detectors rely on discriminatively learnt classifiers, i.e., trained with annotated samples. However, the annotation step is a human intensive and subjective task worth to be minimized. By using virtual worlds we can automatically obtain precise and rich annotations. Thus, we face the question: can a pedestrian appearance model learnt in realistic virtual worlds work successfully for pedestrian detection in realworld images?. Conducted experiments show that virtual-world based training can provide excellent testing accuracy in real world, but it can also suffer the dataset shift problem as real-world based training does. Accordingly, we have designed a domain adaptation framework, V-AYLA, in which we have tested different techniques to collect a few pedestrian samples from the target domain (real world) and combine them with the many examples of the source domain (virtual world) in order to train a domain adapted pedestrian classifier that will operate in the target domain. V-AYLA reports the same detection accuracy than when training with many human-provided pedestrian annotations and testing with real-world images of the same domain. To the best of our knowledge, this is the first work demonstrating adaptation of virtual and real worlds for developing an object detector.
Javier Marín, David Vázquez, Antonio M. López, Jaume Amores and Ludmila I. Kuncheva
IEEE Transactions on Systems, Man, and Cybernetics (Part B) (TSMCB), 2014
@InProceedings{Jmarin2014,
author = "Marín, J. and Vázquez, D. and López, A. M. and Amores, J. and Kuncheva, L. I."
title = "Occlusion handling via random subspace classifiers for human detection",
journal = "IEEE Transactions on Systems, Man, and Cybernetics (Part B)",
volume = "44(3)"
pages = "342-354",
year = "2014",
}
This paper describes a general method to address partial occlusions for human detection in still images. The Random Subspace Method (RSM) is chosen for building a classifier ensemble robust against partial occlusions. The component classifiers are chosen on the basis of their individual and combined performance. The main contribution of this work lies in our approach’s capability to improve the detection rate when partial occlusions are present without compromising the detection performance on non occluded data. In contrast to many recent approaches, we propose a method which does not require manual labelling of body parts, defining any semantic spatial components, or using additional data coming from motion or stereo. Moreover, the method can be easily extended to other object classes. The experiments are performed on three large datasets: the INRIA person dataset, the Daimler Multicue dataset, and a new challenging dataset, called PobleSec, in which a considerable number of targets are partially occluded. The different approaches are evaluated at the classification and detection levels for both partially occluded and non-occluded data. The experimental results show that our detector outperforms state-of-the-art approaches in the presence of partial occlusions, while offering performance and reliability similar to those of the holistic approach on non-occluded data. The datasets used in our experiments have been made publicly available for benchmarking purposes.
Jiaolong Xu, David Vázquez, Antonio M. López, Javier Marín and Daniel Ponsa
IEEE Transactions on Intelligent Transportation Systems (TITS), 2014
@InProceedings{Jxu2014,
author = "Xu, J. and Vázquez, D. and López, and Marín, J. and Ponsa, D."
title = "Learning a Part-based Pedestrian Detector in Virtual World",
journal = "IEEE Transactions on Intelligent Transportation Systems",
volume = "15(5)"
pages = "2121-2131",
year = "2014",
}
Detecting pedestrians with on-board vision systems is of paramount interest for assisting drivers to prevent vehicle-to-pedestrian accidents. The core of a pedestrian detector is its classification module, which aims at deciding if a given image window contains a pedestrian. Given the difficulty of this task, many classifiers have been proposed during the last fifteen years. Among them, the so-called (deformable) part-based classifiers including multi-view modeling are usually top ranked in accuracy. Training such classifiers is not trivial since a proper aspect clustering and spatial part alignment of the pedestrian training samples are crucial for obtaining an accurate classifier. In this paper, first we perform automatic aspect clustering and part alignment by using virtual-world pedestrians, i.e., human annotations are not required. Second, we use a mixture-of-parts approach that allows part sharing among different aspects. Third, these proposals are integrated in a learning framework which also allows to incorporate real-world training data to perform domain adaptation between virtual- and real-world cameras. Overall, the obtained results on four popular on-board datasets show that our proposal clearly outperforms the state-of-the-art deformable part-based detector known as latent SVM.
Book chapters
Javier Marín, David Gerónimo, David Vázquez, Antonio M. López
Book chapter: Handbook of Pattern Recognition: Methods and Application, 2012
@InBook{MGV2012,
author = {Javier Marin and David Geronimo and David Vazquez and Antonio Lopez},
title = {Pedestrian Detection: Exploring Virtual Worlds},
booktitle = {Handbook of Pattern Recognition: Methods and Application},
year = {2012}
}
Handbook of pattern recognition will include contributions from university educators and active research experts. This Handbook is intended to serve as a basic reference on methods and applications of pattern recognition. The primary aim of this handbook is providing the community of pattern recognition with a readable, easy to understand resource that covers introductory, intermediate and advanced topics with equal clarity. Therefore, the Handbook of pattern recognition can serve equally well as reference resource and as classroom textbook. Contributions cover all methods, techniques and applications of pattern recognition. A tentative list of relevant topics might include: 1- Statistical, structural, syntactic pattern recognition. 2- Neural networks, machine learning, data mining. 3- Discrete geometry, algebraic, graph-based techniques for pattern recognition. 4- Face recognition, Signal analysis, image coding and processing, shape and texture analysis. 5- Document processing, text and graphics recognition, digital libraries. 6- Speech recognition, music analysis, multimedia systems. 7- Natural language analysis, information retrieval. 8- Biometrics, biomedical pattern analysis and information systems. 9- Other scientific, engineering, social and economical applications of pattern recognition. 10- Special hardware architectures, software packages for pattern recognition.
Patents
Image Processing System to Detect Object of Interest
Farzin Ghorban rajabizadeh, Yu Su, Javier Marín Tur, Alessandro Colombo
EU Patent application number: EP16175330.6, filled June 2016.
Applicant: Delphi Technologies, Inc.
Demos
Javier Marín, Sebastian Ramos, David Vázquez, Antonio M. López, Jaume Amores, Bastian Leibe, Germán Ros
British Machine Vision Conference (BMVC), 2013