Utvidet returrett til 31. januar 2025

Bøker i Synthesis Lectures on Computer Vision-serien

Filter
Filter
Sorter etterSorter Serierekkefølge
  • av Ian H. Jermyn
    725,-

    Statistical analysis of shapes of 3D objects is an important problem with a wide range of applications. This analysis is difficult for many reasons, including the fact that objects differ in both geometry and topology. In this manuscript, we narrow the problem by focusing on objects with fixed topology, say objects that are diffeomorphic to unit spheres, and develop tools for analyzing their geometries. The main challenges in this problem are to register points across objects and to perform analysis while being invariant to certain shape-preserving transformations. We develop a comprehensive framework for analyzing shapes of spherical objects, i.e., objects that are embeddings of a unit sphere in #x211D;, including tools for: quantifying shape differences, optimally deforming shapes into each other, summarizing shape samples, extracting principal modes of shape variability, and modeling shape variability associated with populations. An important strength of this framework is that it is elastic: it performs alignment, registration, and comparison in a single unified framework, while being invariant to shape-preserving transformations. The approach is essentially Riemannian in the following sense. We specify natural mathematical representations of surfaces of interest, and impose Riemannian metrics that are invariant to the actions of the shape-preserving transformations. In particular, they are invariant to reparameterizations of surfaces. While these metrics are too complicated to allow broad usage in practical applications, we introduce a novel representation, termed square-root normal fields (SRNFs), that transform a particular invariant elastic metric into the standard L metric. As a result, one can use standard techniques from functional data analysis for registering, comparing, and summarizing shapes. Specifically, this results in: pairwise registration of surfaces; computation of geodesic paths encoding optimal deformations; computation of Karcher means and covariances under the shape metric; tangent Principal Component Analysis (PCA) and extraction of dominant modes of variability; and finally, modeling of shape variability using wrapped normal densities. These ideas are demonstrated using two case studies: the analysis of surfaces denoting human bodies in terms of shape and pose variability; and the clustering and classification of the shapes of subcortical brain structures for use in medical diagnosis. This book develops these ideas without assuming advanced knowledge in differential geometry and statistics. We summarize some basic tools from differential geometry in the appendices, and introduce additional concepts and terminology as needed in the individual chapters.

  • av Walter Scheirer & Terrance Boult
    485,-

  • av Stan Z. Li, Sergio Escalera, Guodong Guo, m.fl.
    485,-

    This book revises and expands upon the prior edition of Multi-Modal Face Presentation Attack Detection. The authors begin with fundamental and foundational information on face spoofing attack detection, explaining why the computer vision community has intensively studied it for the last decade. The authors also discuss the reasons that cause face anti-spoofing to be essential for preventing security breaches in face recognition systems. In addition, the book describes the factors that make it difficult to design effective methods of face presentation attack detection challenges. The book presents a thorough review and evaluation of current techniques and identifies those that have achieved the highest level of performance in a series of ChaLearn face anti-spoofing challenges at CVPR and ICCV. The authors also highlight directions for future research in face anti-spoofing that would lead to progress in the field. Additional analysis, new methodologies, and a more comprehensive survey of solutions are included in this new edition.

  • av Xiu-Shen Wei
    486,-

    This book provides a comprehensive overview of the fine-grained image analysis research and modern approaches based on deep learning, spanning the full range of topics needed for designing operational fine-grained image systems. The author begins by providing detailed background information on FGIA, focusing on recognition and retrieval. The author also provides the fundamentals of convolutional neural networks to further make it easier for readers to understand the technical content in the book. The book introduces the main technical paradigms, technological developments, and representative approaches of fine-grained image recognition and fine-grained image retrieval. The author covers multiple popular research topics and includes cross-domain knowledge. The book also highlights advanced applications and topics for future research.

  • av Lei Huang
    790,-

    This book presents and surveys normalization techniques with a deep analysis in training deep neural networks.  In addition, the author provides technical details in designing new normalization methods and network architectures tailored to specific tasks.  Normalization methods can improve the training stability, optimization efficiency, and generalization ability of deep neural networks (DNNs) and have become basic components in most state-of-the-art DNN architectures.  The author provides guidelines for elaborating, understanding, and applying normalization methods.  This book is ideal for readers working on the development of novel deep learning algorithms and/or their applications to solve practical problems in computer vision and machine learning tasks.  The book also serves as a resource researchers, engineers, and students who are new to the field and need to understand and train DNNs.

  • av Gabriela Csurka
    630,-

    Transfer learning (TL), and in particular domain adaptation (DA), has emerged as an effective solution to overcome the burden of annotation, exploiting the unlabeled data available from the target domain together with labeled data or pre-trained models from similar, yet different source domains.

  • av Michael Teutsch
    701,-

    Human visual perception is limited to the visual-optical spectrum. Machine vision is not. Cameras sensitive to the different infrared spectra can enhance the abilities of autonomous systems and visually perceive the environment in a holistic way. Relevant scene content can be made visible especially in situations, where sensors of other modalities face issues like a visual-optical camera that needs a source of illumination. As a consequence, not only human mistakes can be avoided by increasing the level of automation, but also machine-induced errors can be reduced that, for example, could make a self-driving car crash into a pedestrian under difficult illumination conditions. Furthermore, multi-spectral sensor systems with infrared imagery as one modality are a rich source of information and can provably increase the robustness of many autonomous systems. Applications that can benefit from utilizing infrared imagery range from robotics to automotive and from biometrics to surveillance. In this book, we provide a brief yet concise introduction to the current state-of-the-art of computer vision and machine learning in the infrared spectrum. Based on various popular computer vision tasks such as image enhancement, object detection, or object tracking, we first motivate each task starting from established literature in the visual-optical spectrum. Then, we discuss the differences between processing images and videos in the visual-optical spectrum and the various infrared spectra. An overview of the current literature is provided together with an outlook for each task. Furthermore, available and annotated public datasets and common evaluation methods and metrics are presented. In a separate chapter, popular applications that can greatly benefit from the use of infrared imagery as a data source are presented and discussed. Among them are automatic target recognition, video surveillance, or biometrics including face recognition. Finally, we conclude with recommendations for well-fitting sensor setups and data processing algorithms for certain computer vision tasks. We address this book to prospective researchers and engineers new to the field but also to anyone who wants to get introduced to the challenges and the approaches of computer vision using infrared images or videos. Readers will be able to start their work directly after reading the book supported by a highly comprehensive backlog of recent and relevant literature as well as related infrared datasets including existing evaluation frameworks. Together with consistently decreasing costs for infrared cameras, new fields of application appear and make computer vision in the infrared spectrum a great opportunity to face nowadays scientific and engineering challenges.

  • av Rameswar Panda
    371,-

    Person re-identification is the problem of associating observations of targets in different non-overlapping cameras. Most of the existing learning-based methods have resulted in improved performance on standard re-identification benchmarks, but at the cost of time-consuming and tediously labeled data. Motivated by this, learning person re-identification models with limited to no supervision has drawn a great deal of attention in recent years.In this book, we provide an overview of some of the literature in person re-identification, and then move on to focus on some specific problems in the context of person re-identification with limited supervision in multi-camera environments. We expect this to lead to interesting problems for researchers to consider in the future, beyond the conventional fully supervised setup that has been the framework for a lot of work in person re-identification.Chapter 1 starts with an overview of the problems in person re-identification and the major research directions. We provide an overview of the prior works that align most closely with the limited supervision theme of this book. Chapter 2 demonstrates how global camera network constraints in the form of consistency can be utilized for improving the accuracy of camera pair-wise person re-identification models and also selecting a minimal subset of image pairs for labeling without compromising accuracy. Chapter 3 presents two methods that hold the potential for developing highly scalable systems for video person re-identification with limited supervision. In the one-shot setting where only one tracklet per identity is labeled, the objective is to utilize this small labeled set along with a larger unlabeled set of tracklets to obtain a re-identification model. Another setting is completely unsupervised without requiring any identity labels. The temporal consistency in the videos allows us to infer about matching objects across the cameras with higher confidence, even with limited to no supervision. Chapter 4 investigates person re-identification in dynamic camera networks. Specifically, we consider a novel problem that has received very little attention in the community but is critically important for many applications where a new camera is added to an existing group observing a set of targets. We propose two possible solutions for on-boarding new camera(s) dynamically to an existing network using transfer learning with limited additional supervision. Finally, Chapter 5 concludes the book by highlighting the major directions for future research.

  • av Jun Wan
    419,-

    For the last ten years, face biometric research has been intensively studied by the computer vision community. Face recognition systems have been used in mobile, banking, and surveillance systems. For face recognition systems, face spoofing attack detection is a crucial stage that could cause severe security issues in government sectors. Although effective methods for face presentation attack detection have been proposed so far, the problem is still unsolved due to the difficulty in the design of features and methods that can work for new spoofing attacks. In addition, existing datasets for studying the problem are relatively small which hinders the progress in this relevant domain.In order to attract researchers to this important field and push the boundaries of the state of the art on face anti-spoofing detection, we organized the Face Spoofing Attack Workshop and Competition at CVPR 2019, an event part of the ChaLearn Looking at People Series. As part of this event, we released the largest multi-modal face anti-spoofing dataset so far, the CASIA-SURF benchmark. The workshop reunited many researchers from around the world and the challenge attracted more than 300 teams. Some of the novel methodologies proposed in the context of the challenge achieved state-of-the-art performance. In this manuscript, we provide a comprehensive review on face anti-spoofing techniques presented in this joint event and point out directions for future research on the face anti-spoofing field.

  • av Kristin J. Dana
    798,-

    Visual pattern analysis is a fundamental tool in mining data for knowledge. Computational representations for patterns and texture allow us to summarize, store, compare, and label in order to learn about the physical world. Our ability to capture visual imagery with cameras and sensors has resulted in vast amounts of raw data, but using this information effectively in a task-specific manner requires sophisticated computational representations. We enumerate specific desirable traits for these representations: (1) intraclass invariance-to support recognition; (2) illumination and geometric invariance for robustness to imaging conditions; (3) support for prediction and synthesis to use the model to infer continuation of the pattern; (4) support for change detection to detect anomalies and perturbations; and (5) support for physics-based interpretation to infer system properties from appearance. In recent years, computer vision has undergone a metamorphosis with classic algorithms adapting to new trends in deep learning. This text provides a tour of algorithm evolution including pattern recognition, segmentation and synthesis. We consider the general relevance and prominence of visual pattern analysis and applications that rely on computational models.

  • av Michael Felsberg
    660,-

    Under the title "e;Probabilistic and Biologically Inspired Feature Representations,"e; this text collects a substantial amount of work on the topic of channel representations. Channel representations are a biologically motivated, wavelet-like approach to visual feature descriptors: they are local and compact, they form a computational framework, and the represented information can be reconstructed. The first property is shared with many histogram- and signature-based descriptors, the latter property with the related concept of population codes. In their unique combination of properties, channel representations become a visual Swiss army knife-they can be used for image enhancement, visual object tracking, as 2D and 3D descriptors, and for pose estimation. In the chapters of this text, the framework of channel representations will be introduced and its attributes will be elaborated, as well as further insight into its probabilistic modeling and algorithmic implementation will be given. Channel representations are a useful toolbox to represent visual information for machine learning, as they establish a generic way to compute popular descriptors such as HOG, SIFT, and SHOT. Even in an age of deep learning, they provide a good compromise between hand-designed descriptors and a-priori structureless feature spaces as seen in the layers of deep networks.

  • av Ha Quang Minh
    798,-

    Covariance matrices play important roles in many areas of mathematics, statistics, and machine learning, as well as their applications. In computer vision and image processing, they give rise to a powerful data representation, namely the covariance descriptor, with numerous practical applications.In this book, we begin by presenting an overview of the {\it finite-dimensional covariance matrix} representation approach of images, along with its statistical interpretation. In particular, we discuss the various distances and divergences that arise from the intrinsic geometrical structures of the set of Symmetric Positive Definite (SPD) matrices, namely Riemannian manifold and convex cone structures. Computationally, we focus on kernel methods on covariance matrices, especially using the Log-Euclidean distance.We then show some of the latest developments in the generalization of the finite-dimensional covariance matrix representation to the {\it infinite-dimensional covariance operator} representation via positive definite kernels. We present the generalization of the affine-invariant Riemannian metric and the Log-Hilbert-Schmidt metric, which generalizes the Log-Euclidean distance. Computationally, we focus on kernel methods on covariance operators, especially using the Log-Hilbert-Schmidt distance. Specifically, we present a two-layer kernel machine, using the Log-Hilbert-Schmidt distance and its finite-dimensional approximation, which reduces the computational complexity of the exact formulation while largely preserving its capability. Theoretical analysis shows that, mathematically, the approximate Log-Hilbert-Schmidt distance should be preferred over the approximate Log-Hilbert-Schmidt inner product and, computationally, it should be preferred over the approximate affine-invariant Riemannian distance.Numerical experiments on image classification demonstrate significant improvements of the infinite-dimensional formulation over the finite-dimensional counterpart. Given the numerous applications of covariance matrices in many areas of mathematics, statistics, and machine learning, just to name a few, we expect that the infinite-dimensional covariance operator formulation presented here will have many more applications beyond those in computer vision.

  • av Tat-Jun Chin & David Suter
    725,-

    Outlier-contaminated data is a fact of life in computer vision. For computer vision applications to perform reliably and accurately in practical settings, the processing of the input data must be conducted in a robust manner. In this context, the maximum consensus robust criterion plays a critical role by allowing the quantity of interest to be estimated from noisy and outlier-prone visual measurements. The maximum consensus problem refers to the problem of optimizing the quantity of interest according to the maximum consensus criterion. This book provides an overview of the algorithms for performing this optimization. The emphasis is on the basic operation or "inner workings" of the algorithms, and on their mathematical characteristics in terms of optimality and efficiency. The applicability of the techniques to common computer vision tasks is also highlighted. By collecting existing techniques in a single article, this book aims to trigger further developments in this theoretically interesting and practically important area.

  • av Walter J. Scheirer
    651,-

    A common feature of many approaches to modeling sensory statistics is an emphasis on capturing the "e;average."e; From early representations in the brain, to highly abstracted class categories in machine learning for classification tasks, central-tendency models based on the Gaussian distribution are a seemingly natural and obvious choice for modeling sensory data. However, insights from neuroscience, psychology, and computer vision suggest an alternate strategy: preferentially focusing representational resources on the extremes of the distribution of sensory inputs. The notion of treating extrema near a decision boundary as features is not necessarily new, but a comprehensive statistical theory of recognition based on extrema is only now just emerging in the computer vision literature. This book begins by introducing the statistical Extreme Value Theory (EVT) for visual recognition. In contrast to central-tendency modeling, it is hypothesized that distributions near decision boundaries form a more powerful model for recognition tasks by focusing coding resources on data that are arguably the most diagnostic features. EVT has several important properties: strong statistical grounding, better modeling accuracy near decision boundaries than Gaussian modeling, the ability to model asymmetric decision boundaries, and accurate prediction of the probability of an event beyond our experience. The second part of the book uses the theory to describe a new class of machine learning algorithms for decision making that are a measurable advance beyond the state-of-the-art. This includes methods for post-recognition score analysis, information fusion, multi-attribute spaces, and calibration of supervised machine learning algorithms.

  • av Margrit Betke
    504,-

    In the human quest for scientific knowledge, empirical evidence is collected by visual perception. Tracking with computer vision takes on the important role to reveal complex patterns of motion that exist in the world we live in. Multi-object tracking algorithms provide new information on how groups and individual group members move through three-dimensional space. They enable us to study in depth the relationships between individuals in moving groups. These may be interactions of pedestrians on a crowded sidewalk, living cells under a microscope, or bats emerging in large numbers from a cave. Being able to track pedestrians is important for urban planning; analysis of cell interactions supports research on biomaterial design; and the study of bat and bird flight can guide the engineering of aircraft. We were inspired by this multitude of applications to consider the crucial component needed to advance a single-object tracking system to a multi-object tracking system-data association.Data association in the most general sense is the process of matching information about newly observed objects with information that was previously observed about them. This information may be about their identities, positions, or trajectories. Algorithms for data association search for matches that optimize certain match criteria and are subject to physical conditions. They can therefore be formulated as solving a "e;constrained optimization problem"e;-the problem of optimizing an objective function of some variables in the presence of constraints on these variables. As such, data association methods have a strong mathematical grounding and are valuable general tools for computer vision researchers.This book serves as a tutorial on data association methods, intended for both students and experts in computer vision. We describe the basic research problems, review the current state of the art, and present some recently developed approaches. The book covers multi-object tracking in two and three dimensions. We consider two imaging scenarios involving either single cameras or multiple cameras with overlapping fields of view, and requiring across-time and across-view data association methods. In addition to methods that match new measurements to already established tracks, we describe methods that match trajectory segments, also called tracklets. The book presents a principled application of data association to solve two interesting tasks: first, analyzing the movements of groups of free-flying animals and second, reconstructing the movements of groups of pedestrians. We conclude by discussing exciting directions for future research.

  • av Margrit Betke
    872,-

    Because circular objects are projected to ellipses in images, ellipse fitting is a first step for 3-D analysis of circular objects in computer vision applications. For this reason, the study of ellipse fitting began as soon as computers came into use for image analysis in the 1970s, but it is only recently that optimal computation techniques based on the statistical properties of noise were established. These include renormalization (1993), which was then improved as FNS (2000) and HEIV (2000). Later, further improvements, called hyperaccurate correction (2006), HyperLS (2009), and hyper-renormalization (2012), were presented. Today, these are regarded as the most accurate fitting methods among all known techniques. This book describes these algorithms as well implementation details and applications to 3-D scene analysis. We also present general mathematical theories of statistical optimization underlying all ellipse fitting algorithms, including rigorous covariance and bias analyses and the theoretical accuracy limit. The results can be directly applied to other computer vision tasks including computing fundamental matrices and homographies between images. This book can serve not simply as a reference of ellipse fitting algorithms for researchers, but also as learning material for beginners who want to start computer vision research. The sample program codes are downloadable from the website: https://sites.google.com/a/morganclaypool.com/ellipse-fitting-for-computer-vision-implementation-and-applications.

  • av Kenichi Kanatani
    725,-

    Modeling data from visual and linguistic modalities together creates opportunities for better understanding of both, and supports many useful applications. Examples of dual visual-linguistic data includes images with keywords, video with narrative, and figures in documents. We consider two key task-driven themes: translating from one modality to another (e.g., inferring annotations for images) and understanding the data using all modalities, where one modality can help disambiguate information in another. The multiple modalities can either be essentially semantically redundant (e.g., keywords provided by a person looking at the image), or largely complementary (e.g., meta data such as the camera used). Redundancy and complementarity are two endpoints of a scale, and we observe that good performance on translation requires some redundancy, and that joint inference is most useful where some information is complementary. Computational methods discussed are broadly organized into ones for simple keywords, ones going beyond keywords toward natural language, and ones considering sequential aspects of natural language. Methods for keywords are further organized based on localization of semantics, going from words about the scene taken as whole, to words that apply to specific parts of the scene, to relationships between parts. Methods going beyond keywords are organized by the linguistic roles that are learned, exploited, or generated. These include proper nouns, adjectives, spatial and comparative prepositions, and verbs. More recent developments in dealing with sequential structure include automated captioning of scenes and video, alignment of video and text, and automated answering of questions about scenes depicted in images.

  • av Kobus Barnard
    651,-

    Background subtraction is a widely used concept for detection of moving objects in videos. In the last two decades there has been a lot of development in designing algorithms for background subtraction, as well as wide use of these algorithms in various important applications, such as visual surveillance, sports video analysis, motion capture, etc. Various statistical approaches have been proposed to model scene backgrounds. The concept of background subtraction also has been extended to detect objects from videos captured from moving cameras. This book reviews the concept and practice of background subtraction. We discuss several traditional statistical background subtraction models, including the widely used parametric Gaussian mixture models and non-parametric models. We also discuss the issue of shadow suppression, which is essential for human motion analysis applications. This book discusses approaches and tradeoffs for background maintenance. This book also reviews many of the recent developments in background subtraction paradigm. Recent advances in developing algorithms for background subtraction from moving cameras are described, including motion-compensation-based approaches and motion-segmentation-based approaches. For links to the videos to accompany this book, please see sites.google.com/a/morganclaypool.com/backgroundsubtraction/Table of Contents: Preface / Acknowledgments / Figure Credits / Object Detection and Segmentation in Videos / Background Subtraction from a Stationary Camera / Background Subtraction from a Moving Camera / Bibliography / Author's Biography

  • av Ahmed Elgammal
    578,-

    In its early years, the field of computer vision was largely motivated by researchers seeking computational models of biological vision and solutions to practical problems in manufacturing, defense, and medicine. For the past two decades or so, there has been an increasing interest in computer vision as an input modality in the context of human-computer interaction. Such vision-based interaction can endow interactive systems with visual capabilities similar to those important to human-human interaction, in order to perceive non-verbal cues and incorporate this information in applications such as interactive gaming, visualization, art installations, intelligent agent interaction, and various kinds of command and control tasks. Enabling this kind of rich, visual and multimodal interaction requires interactive-time solutions to problems such as detecting and recognizing faces and facial expressions, determining a person's direction of gaze and focus of attention, tracking movement of the body, and recognizing various kinds of gestures. In building technologies for vision-based interaction, there are choices to be made as to the range of possible sensors employed (e.g., single camera, stereo rig, depth camera), the precision and granularity of the desired outputs, the mobility of the solution, usability issues, etc. Practical considerations dictate that there is not a one-size-fits-all solution to the variety of interaction scenarios; however, there are principles and methodological approaches common to a wide range of problems in the domain. While new sensors such as the Microsoft Kinect are having a major influence on the research and practice of vision-based interaction in various settings, they are just a starting point for continued progress in the area. In this book, we discuss the landscape of history, opportunities, and challenges in this area of vision-based interaction; we review the state-of-the-art and seminal works in detecting and recognizing the human body and its components; we explore both static and dynamic approaches to "e;looking at people"e; vision problems; and we place the computer vision work in the context of other modalities and multimodal applications. Readers should gain a thorough understanding of current and future possibilities of computer vision technologies in the context of human-computer interaction.

  • av Matthew Turk
    475,-

    As networks of video cameras are installed in many applications like security and surveillance, environmental monitoring, disaster response, and assisted living facilities, among others, image understanding in camera networks is becoming an important area of research and technology development. There are many challenges that need to be addressed in the process. Some of them are listed below:- Traditional computer vision challenges in tracking and recognition, robustness to pose, illumination, occlusion, clutter, recognition of objects, and activities;- Aggregating local information for wide area scene understanding, like obtaining stable, long-term tracks of objects;- Positioning of the cameras and dynamic control of pan-tilt-zoom (PTZ) cameras for optimal sensing;- Distributed processing and scene analysis algorithms;- Resource constraints imposed by different applications like security and surveillance, environmental monitoring, disaster response, assisted living facilities, etc. In this book, we focus on the basic research problems in camera networks, review the current state-of-the-art and present a detailed description of some of the recently developed methodologies. The major underlying theme in all the work presented is to take a network-centric view whereby the overall decisions are made at the network level. This is sometimes achieved by accumulating all the data at a central server, while at other times by exchanging decisions made by individual cameras based on their locally sensed data. Chapter One starts with an overview of the problems in camera networks and the major research directions. Some of the currently available experimental testbeds are also discussed here. One of the fundamental tasks in the analysis of dynamic scenes is to track objects. Since camera networks cover a large area, the systems need to be able to track over such wide areas where there could be both overlapping and non-overlapping fields of view of the cameras, as addressed in Chapter Two: Distributed processing is another challenge in camera networks and recent methods have shown how to do tracking, pose estimation and calibration in a distributed environment. Consensus algorithms that enable these tasks are described in Chapter Three. Chapter Four summarizes a few approaches on object and activity recognition in both distributed and centralized camera network environments. All these methods have focused primarily on the analysis side given that images are being obtained by the cameras. Efficient utilization of such networks often calls for active sensing, whereby the acquisition and analysis phases are closely linked. We discuss this issue in detail in Chapter Five and show how collaborative and opportunistic sensing in a camera network can be achieved. Finally, Chapter Six concludes the book by highlighting the major directions for future research. Table of Contents: An Introduction to Camera Networks / Wide-Area Tracking / Distributed Processing in Camera Networks / Object and Activity Recognition / Active Sensing / Future Research Directions

  • av Amit Roy-Chowdhury
    475,-

    Being able to recover the shape of 3D deformable surfaces from a single video stream would make it possible to field reconstruction systems that run on widely available hardware without requiring specialized devices. However, because many different 3D shapes can have virtually the same projection, such monocular shape recovery is inherently ambiguous. In this survey, we will review the two main classes of techniques that have proved most effective so far: The template-based methods that rely on establishing correspondences with a reference image in which the shape is already known, and non-rigid structure-from-motion techniques that exploit points tracked across the sequences to reconstruct a completely unknown shape. In both cases, we will formalize the approach, discuss its inherent ambiguities, and present the practical solutions that have been proposed to resolve them. To conclude, we will suggest directions for future research. Table of Contents: Introduction / Early Approaches to Non-Rigid Reconstruction / Formalizing Template-Based Reconstruction / Performing Template-Based Reconstruction / Formalizing Non-Rigid Structure from Motion / Performing Non-Rigid Structure from Motion / Future Directions

  • av Matthieu Salzmann
    475,-

    Face detection, because of its vast array of applications, is one of the most active research areas in computer vision. In this book, we review various approaches to face detection developed in the past decade, with more emphasis on boosting-based learning algorithms. We then present a series of algorithms that are empowered by the statistical view of boosting and the concept of multiple instance learning. We start by describing a boosting learning framework that is capable to handle billions of training examples. It differs from traditional bootstrapping schemes in that no intermediate thresholds need to be set during training, yet the total number of negative examples used for feature selection remains constant and focused (on the poor performing ones). A multiple instance pruning scheme is then adopted to set the intermediate thresholds after boosting learning. This algorithm generates detectors that are both fast and accurate. We then present two multiple instance learning schemes for face detection, multiple instance learning boosting (MILBoost) and winner-take-all multiple category boosting (WTA-McBoost). MILBoost addresses the uncertainty in accurately pinpointing the location of the object being detected, while WTA-McBoost addresses the uncertainty in determining the most appropriate subcategory label for multiview object detection. Both schemes can resolve the ambiguity of the labeling process and reduce outliers during training, which leads to improved detector performances. In many applications, a detector trained with generic data sets may not perform optimally in a new environment. We propose detection adaption, which is a promising solution for this problem. We present an adaptation scheme based on the Taylor expansion of the boosting learning objective function, and we propose to store the second order statistics of the generic training data for future adaptation. We show that with a small amount of labeled data in the new environment, the detector's performance can be greatly improved. We also present two interesting applications where boosting learning was applied successfully. The first application is face verification for filtering and ranking image/video search results on celebrities. We present boosted multi-task learning (MTL), yet another boosting learning algorithm that extends MILBoost with a graphical model. Since the available number of training images for each celebrity may be limited, learning individual classifiers for each person may cause overfitting. MTL jointly learns classifiers for multiple people by sharing a few boosting classifiers in order to avoid overfitting. The second application addresses the need of speaker detection in conference rooms. The goal is to find who is speaking, given a microphone array and a panoramic video of the room. We show that by combining audio and visual features in a boosting framework, we can determine the speaker's position very accurately. Finally, we offer our thoughts on future directions for face detection. Table of Contents: A Brief Survey of the Face Detection Literature / Cascade-based Real-Time Face Detection / Multiple Instance Learning for Face Detection / Detector Adaptation / Other Applications / Conclusions and Future Work

  • av Salman Khan
    798,-

    Computer vision has become increasingly important and effective in recent years due to its wide-ranging applications in areas as diverse as smart surveillance and monitoring, health and medicine, sports and recreation, robotics, drones, and self-driving cars. Visual recognition tasks, such as image classification, localization, and detection, are the core building blocks of many of these applications, and recent developments in Convolutional Neural Networks (CNNs) have led to outstanding performance in these state-of-the-art visual recognition tasks and systems. As a result, CNNs now form the crux of deep learning algorithms in computer vision.This self-contained guide will benefit those who seek to both understand the theory behind CNNs and to gain hands-on experience on the application of CNNs in computer vision. It provides a comprehensive introduction to CNNs starting with the essential concepts behind neural networks: training, regularization, and optimization of CNNs.The book also discusses a wide range of loss functions, network layers, and popular CNN architectures, reviews the different techniques for the evaluation of CNNs, and presents some popular CNN tools and libraries that are commonly used in computer vision. Further, this text describes and discusses case studies that are related to the application of CNN in computer vision, including image classification, object detection, semantic segmentation, scene understanding, and image generation.This book is ideal for undergraduate and graduate students, as no prior background knowledge in the field is required to follow the material, as well as new researchers, developers, engineers, and practitioners who are interested in gaining a quick understanding of CNN models.

Gjør som tusenvis av andre bokelskere

Abonner på vårt nyhetsbrev og få rabatter og inspirasjon til din neste leseopplevelse.