Ñæàòèå âèäåî - 3D Objects
Àíãëèéñêèå ìàòåðèàëû |
|||
Àâòîðû | Íàçâàíèå ñòàòüè | Îïèñàíèå | Ðåéòèíã |
Ioannis Kompatsiaris, and Michael Gerassimos Strintzis, | Spatiotemporal Segmentation and Tracking of Objects for Visualization of Videoconference Image Sequences |
Abstract—In this paper, a procedure is described for the segmentation, content-based coding, and visualization of videoconference image sequences. First, image sequence analysis is used to estimate the shape and motion parameters of the person facing the camera. A spatiotemporal filter, taking into account the intensity differences between consequent frames, is applied, in order to separate the moving person from the static background. The foreground is segmented in a number of regions in order to identify the face. For this purpose, we propose the novel procedure of K-Means with connectivity constraint algorithm as a general segmentation algorithm combining several types of information including intensity, motion and compactness. In this algorithm, the use of spatiotemporal regions is introduced since a number of frames are analyzed simultaneousl, y and as a result, the same region is present in consequent frames. Based on this information, a 3-D ellipsoid is adapted to the person’s face using an efficient and robust algorithm. The rigid 3-D motion is estimated next using a least median of squares approach. Finally, a Virtual Reality Modeling Language (VRML) file is created containing all the above information; this file may be viewed by using any VRML 2.0 compliant browser. RAR 391 êáàéò |
|
Candemir Toklu, A. Murat Tekalp, and A. Tanju Erdem | Semi-Automatic Video Object Segmentation in the Presence of occlusion |
Abstract—We describe a semi-automatic approach for segmenting a video sequence into spatio-temporal video objects in the presence of occlusion. Motion and shape of each video object is represented by a 2-D mesh. Assuming that the boundary of an object of interest is interactively marked on some keyframes, the proposed method finds the boundary of the object in all other frames automatically by tracking the 2-D mesh representation of the object in both forward and backward directions. A key contribution of the proposed method is automatic detection of covered and uncovered regions at each frame, and assignment of pixels in the uncovered regions to the object or background based on color and motion similarity. Experimental results are presented on two MPEG-4 test sequences and the resulting segmentations are evaluated both visually and quantitatively. RAR 209 êáàéò |
|
Is¸ýl Celasun and A. Murat Tekalp, | Optimal 2-D Hierarchical Content-Based Mesh Design and Update for Object-Based Video |
Abstract—Representation of video objects (VOs) using hierarchical 2-D content-based meshes for accurate tracking and level of detail (LOD) rendering have been previously proposed, where a simple suboptimal hierarchical mesh design algorithm was employed. However, it was concluded that the performance of the tracking and rendering very much depends on how well each level of the hierarchical mesh structure fits the VO under consideration. To this effect, this paper proposes an optimized design of hierarchical 2-D content-based meshes with a shape-adaptive simplification and temporal update mechanism for object-based video. Particular contributions of this work are: 1) analysis of optimal number of nodes for the initial fine level-of-detail mesh design; 2) adaptive shape simplification across hierarchy levels; 3) optimization of the interior-node decimation method to remove only a maximal independent set to preserve Delaunay topology across hierarchy levels for better bitrate versus quality performance; and 4) a mesh-update mechanism which serves to update temporally 2-D dynamic mesh in case of occlusion due to 3-D motion and self-occlusion. The proposed optimized and temporally updated hierarchical mesh representation can be applied in object-based video coding, retrieval, and manipulation. RAR 3564 êáàéò |
|
Toshiaki Fujii, Tadahiko Kimoto, and Masayuki Tanimoto | A New Flexible Acquisition System of Ray-Space Data for Arbitrary Objects |
Abstract—Conventional ray-space acquisition systems require very precise mechanisms to control the small movement of cameras or objects. Most of them adopt camera with a gantry or a turntable. Although they are good for acquiring the ray space of small objects, they are not suitable for ray-space acquisition of very large structures, such as a building, tower, etc. This paper proposes a new ray-space acquisition system which consists of a camera and a 3-D position and orientation sensor. It is not only a compact and easy-to-handle system, but is also free from limitations of size or shape, in principle. It can obtain any ray-space data as far as the camera is located within the coverage of the 3-D sensor. This paper describes our system and its specifications. Experimental results are also presented. RAR 212 êáàéò |
|
Jin Liu, David Przewozny, and Siegmund Pastoor | Layered Representation of Scenes Based on Multiview Image Analysis |
Abstract—This paper describes a novel cooperative procedure for the segmentation of multiview image sequences exploiting multiple sources of information. Compared to other approaches, no a priori information is needed about the structure and the arrangement of objects in the scene. Three cameras in a particular unsymmetrical set-up are used in the system. The color distribution and the object contours in the constituent 2-D images, the disparity information in stereo image pairs, as well as motion information in subsequent images, are analyzed and evaluated in a cooperative procedure to get reliable segmentation results. The scene is decomposed into a variable number of depth layers, with each layer showing a subset of the segmented regions. The layered representation can be used in a variety of applications. In this paper, the application aims at synthesizing 3-D images for enhanced telepresence allowing the user to “look around” in natural scenes (intermediate views for interactive displays). In another application, 3-D images showing a natural depth-of-focus are synthesized in order to improve viewing comfort with 3-D displays. RAR 435 êáàéò |
|
Atsushi Marugame, Akio Yamada, and Mutsumi Ohta | Focused Object Extraction with Multiple Cameras |
Abstract—This paper describes a novel framework for object extraction from images utilizing multiple cameras. Focused regions in images and disparities of point correspondences among multiple images are 3-D clues for the extraction.We examine the extraction of focused objects from images by these automatically acquired clues. Edges in images captured by the cameras are detected, and disparities of the edges in focused regions become the clues, called disparity keys. A focused object is extracted from an image as a set of edge intervals with the disparity keys. The falsely extracted parts can be detected by discontinuous contours of the object and recovered by contour morphing. Some experimental results under different conditions demonstrate the effectiveness and robustness of the proposed method. The method can be applied to image synthesis methods, such as Synthesis/Natural Hybrid Coding (SNHC) and to object-scalable coding in MPEG-4. RAR 968 êáàéò |
|
Sila Ekmekci | Encoding and Reconstruction of Incomplete 3-D Video Objects |
Abstract—A new approach for compact representation, MPEG-4 encoding, and reconstruction of video objects captured by an uncalibrated system of multiple cameras is presented. The method is based on the incomplete 3-D (I3D) technique, which was initially investigated for stereo video objects captured by parallel cameras. Non-overlapping portions of the object are extracted from the reference views, each view having the corresponding portion with the highest resolution. This way, the redundancy of the initial multiview data is reduced. The areas which are extracted from the basis views are denoted as areas of interest. The output of the analysis stage, i.e., the areas of interest and the corresponding parts of the disparity fields are encoded in the MPEG-4 bitstream. Disparity fields define the correspondence relations between the reference views. The view synthesis is performed by disparity-oriented reprojection of the areas of interest into the virtual view plane and can be seen as an intermediate postprocessing stage between the decoder and the scene compositor. This work performs the extension from parallel stereo views to arbitrary configured multi-views with new analysis and synthesis algorithms. Moreover, a two-way interaction is built between the analysis and reconstruction stages, which provides the tradeoff between the final image quality and amount of data transmitted. The focus is on a low-complexity solution enabling online processing capability while preserving the MPEG-4 compatibility of the I3D representation. It is finally shown that our method yields quite convincing results despite the minimal data used and the approximations involved. RAR 321 êáàéò |
|
Peter Eisert, Eckehard Steinbach, and Bernd Girod, Fellow | Automatic Reconstruction of Stationary 3-D Objects from Multiple Uncalibrated Camera Views |
Abstract—A system for the automatic reconstruction of real-world objects from multiple uncalibrated camera views is presented. The camera position and orientation for all views, the 3-D shape of the rigid object, as well as the associated color information, are recovered from the image sequence. The system proceeds in four steps. First, the internal camera parameters describing the imaging geometry are calibrated using a reference object. Second, an initial 3-D description of the object is computed from two views. This model information is then used in a third step to estimate the camera positions for all available views using a novel linear 3-D motion and shape estimation algorithm. The main feature of this third step is the simultaneous estimation of 3-D camera-motion parameters and object shape refinement with respect to the initial 3-D model. The initial 3-D shape model exhibits only a few degrees of freedom and the object shape refinement is defined as flexible deformation of the initial shape model. Our formulation of the shape deformation allows the object texture to slide on the surface, which differs from traditional flexible body modeling. This novel combined shape and motion estimation using sliding texture considerably improves the calibration data of the individual views in comparison to fixed-shape model-based camera-motion estimation. Since the shape model used for model-based camera-motion estimation is only approximate, a volumetric 3-D reconstruction process is initiated in the fourth step that combines the information from all views simultaneously. The recovered object consists of a set of voxels with associated color information that describes even fine structures and details of the object. New views of the object can be rendered from the recovered 3-D model, which has potential applications in virtual reality or multimedia systems and the emerging field of video coding using 3-D scene models. RAR 1321 êáàéò |
|
Gian Luca Foresti | Object Recognition and Tracking for Remote Video Surveillance |
Abstract—In this paper, a system for real-time object recognition and tracking for remote video surveillance is presented. In order to meet real-time requirements, a unique feature, i.e., the statistical morphological skeleton, which achieves low computational complexity, accuracy of localization, and noise robustness has been considered for both object recognition and tracking. Recognition is obtained by comparing an analytical approximation of the skeleton function extracted from the analyzed image with that obtained from model objects stored into a database. Tracking is performed by applying an extended Kalman filter to a set of observable quantities derived from the detected skeleton and other geometric characteristics of the moving object. Several experiments are shown to illustrate the validity of the proposed method and to demonstrate its usefulness in video-based applications. RAR 1347 êáàéò |
|
Ebroul Izquierdo, and Xiaohua Feng | Modeling Arbitrary Objects Based on Geometric Surface Conformity |
Abstract—In this paper, we address the problem of efficient and flexible modeling of arbitrary three-dimensional (3-D) objects and the accurate tracking of the generated model. These goals are reached by combining available multiview image analysis tools with a straightforward 3-D modeling method, which exploit wellestablished techniques from both computer vision and computer graphics, improving and combining them with new strategies. The basic idea of the technique presented is to use feature points and relevant edges in the images as nodes and edges of an initial twodimensional wire grid. The method is adaptive in the sense that an initial rough surface approximation is progressively refined at the locations where the triangular patches do not approximate the surface accurately. The approximation error is measured according to the distance of the model to the object surface, taking into account the reliability of the depth estimated from the stereo image analysis. Once the initial wireframe is available, it is deformed and updated from frame to frame according to the motion of the object points chosen to be nodes. At the end of this process we obtain a temporally consistent 3-D model, which accurately approximates the visible object surface and reflects the physical characteristics of the surface with as few planar patches as possible. The performance of the presented methods is confirmed by several computer experiments. RAR 1081 êáàéò |
|
Jens-Rainer Ohm, and Karsten M¨uller | Incomplete 3-D Multiview Representation of Video Objects |
Abstract—This paper introduces a new form of representation for three-dimensional (3-D) video objects. We have developed a technique to extract disparity and texture data from video objects that are captured simultaneously with multiple-camera configurations. For this purpose, we derive an “area of interest” (AOI) for each of the camera views, which represents an area on the video object’s surface that is best visible from this specific camera viewpoint. By combining all AOI’s, we obtain the video object plane as an unwrapped surface of a 3-D object, containing all texture data visible from any of the cameras. This texture surface can be encoded like any 2-D video object plane, while the 3-D information is contained in the associated disparity map. It is then possible to reconstruct different viewpoints from the texture surface by simple disparity-based projection. The merits of the technique are efficient multiview encoding of single video objects and support for viewpoint adaptation functionality, which is desirable in mixing natural and synthetic images. We have performed experiments with the MPEG-4 video verification model, where the disparity map is encoded by use of the tools provided for grayscale alpha data encoding. Due to its simplicity, the technique is suitable for applications that require real-time viewpoint adaptation toward video objects. RAR 669 êáàéò |
|
Joo-Hee Moon, Ji-Heon Kweon, and Hae-Kwang Kim | Boundary Block-Merging (BBM) Technique for Efficient Texture Coding of Arbitrarily Shaped Object |
Abstract—We present an efficient texture coding method which enhances the coding efficiency of conventional discrete cosine transform (DCT) with padding techniques for arbitrarily shaped objects in object-based video coding where shape information is provided. The BBM (boundary block-merging) technique is applied to the boundary macroblocks of 16 . 16 pixels of a VOP (video object plane) which consist of both background and object pixels. A macroblock consists of four subblocks of 8 . 8 pixels. For boundary subblocks consisting of object and background pixels, padding is performed in the background region. For a pair of padded boundary subblocks in a macroblock of which alignment belongs to a predefined set, one subblock is rotated 180. and merged into another one if object pixels do not overlap. After merging, the boundary macroblock is coded using the conventional DCT coding. The merging process reduces the number of subblocks to be DCT coded, and high correlation between adjacent subblocks makes the number of DCT coding bits small. Experimentation has been done on various test sequences under different test conditions, and verifies significant coding efficiency improvement: reduction of coding bits for luminance boundary blocks by 5.7–11.9% at the same PSNR values compared with the padding-based DCT without BBM. RAR 1443 êáàéò |
|
Di Zhong and Shih-Fu Chang | An Integrated Approach for Content-Based Video Object Segmentation and Retrieval |
Abstract—Object-based video data representations enable unprecedented functionalities of content access and manipulation. In this paper, we present an integrated approach using region-based analysis for semantic video object segmentation and retrieval. We first present an active system that combines low-level region segmentation with user inputs for defining and tracking semantic video objects. The proposed technique is novel in using an integrated feature fusion framework for tracking and segmentation at both region and object levels. Experimental results and extensive performance evaluation show excellent results compared to existing systems. Building upon the segmentation framework, we then present a unique region-based query system for semantic video object. The model facilitates powerful object search, such as spatio-temporal similarity searching at multiple levels. RAR 404 êáàéò |
|
Munchurl Kim, Jae Gark Choi, Daehee Kim, Hyung Lee, Myoung Ho Lee, Chieteuk Ahn, and Yo-Sung Ho | A VOP Generation Tool: Automatic Segmentation of Moving Objects in Image Sequences Based on Spatio-Temporal Information |
Abstract—The new MPEG-4 video coding standard enables content-based functionalities. In order to support the philosophy of the MPEG-4 visual standard, each frame of video sequences should be represented in terms of video object planes (VOP’s). In other words, video objects to be encoded in still pictures or video sequences should be prepared before the encoding process starts. Therefore, it requires a prior decomposition of sequences into VOP’s so that each VOP represents a moving object. This paper addresses an image segmentation method for separating moving objects from the background in image sequences. The proposed method utilizes the following spatio-temporal information. 1) For localization of moving objects in the image sequence, two consecutive image frames in the temporal direction are examined and a hypothesis testing is performed by comparing two variance estimates from two consecutive difference images, which results in an F-test. 2) Spatial segmentation is performed to divide each image into semantic regions and to find precise object boundaries of the moving objects. The temporal segmentation yields a change detection mask that indicates moving areas (foreground) and nonmoving areas (background), and spatial segmentation produces spatial segmentation masks. A combination of the spatial and temporal segmentation masks produces VOP’s faithfully. This paper presents various experimental results. with objects, and reuse of content information by scene composition, which are all suitable for multimedia applications. RAR 681 êáàéò |
|
Sotiris Malassiotis and Michael G. Strintzis, | Tracking Textured Deformable Objects Using a Finite-Element Mesh |
Abstract—This paper presents an algorithm for the estimation of the motion of textured objects undergoing nonrigid deformations over a sequence of images. An active mesh model, which is a finite-element deformable membrane, is introduced in order to achieve efficient representation of global and local deformations. The mesh is constructed using an adaptive triangulation procedure that places more triangles over high detail areas. Through robust least squares techniques and modal analysis, efficient estimation of global object deformations is achieved, based on a set of sparse displacement measurements. A local warping procedure is then applied to minimize the intensity matching error between subsequent images, and thus estimate local deformations. Among the major contributions of this paper are novel techniques developed to acquire knowledge of the object dynamics and structure directly from the image sequence, even in the absence of prior intelligence regarding the scene. Specifically, a coarse-to-fine estimation scheme is first developed, which adapts the model to locally deforming features. Subsequently, principal components modal analysis is used to accumulate knowledge of the object dynamics. This knowledge is finally exploited to constrain the object deformation. The problem of tracking the model over time is addressed, and a novel motion-compensated prediction approach is proposed to facilitate this. A novel method for the determination of the dynamical principal axes of deformation is developed. The experimental results demonstrate the efficiency and robustness of the proposed scheme, which has many potential applications in the areas of image coding, image analysis, and computer graphics. RAR 627 êáàéò |
|
Chuang Gu, and Ming-Chieh Lee | Semiautomatic Segmentation and Tracking of Semantic Video Objects |
Abstract—This paper introduces a novel semantic video object extraction system using mathematical morphology and a perspective motion model. Inspired by the results from the study of the human visual system, we intend to solve the semantic video object extraction problem in two separate steps: supervised I- frame segmentation, and unsupervised P-frame tracking. First, the precise semantic video object boundary can be found using a combination of human assistance and a morphological segmentation tool. Second, the semantic video objects in the remaining frames are obtained using global perspective motion estimation and compensation of the previous semantic video object plus boundary refinement as used for I frames. RAR 376 êáàéò |
|
Yining Deng, and B. S. Manjunath | NeTra-V: Toward an Object-Based Video Representation |
Abstract— We present here a prototype video analysis and retrieval system, called NeTra-V, that is being developed to build an object-based video representation for functionalities such as search and retrieval of video objects. A region-based content description scheme using low-level visual descriptors is proposed. In order to obtain regions for local feature extraction, a new spatio-temporal segmentation and region-tracking scheme is employed. The segmentation algorithm uses all three visual features: color, texture, and motion in the video data. A group processing scheme similar to the one in the MPEG-2 standard is used to ensure the robustness of the segmentation. The proposed approach can handle complex scenes with large motion. After segmentation, regions are tracked through the video sequence using extracted local features. The results of tracking are sequences of coherent regions, called “subobjects.” Subobjects are the fundamental elements in our low-level content description scheme, which can be used to obtain meaningful physical objects in a high-level content description scheme. Experimental results illustrating segmentation and retrieval are provided. RAR 347 êáàéò |
|
Hiroyuki Katata, Norio Ito, and Hiroshi Kusao | Temporal-Scalable Coding Based on Image Content |
Abstract— An object-based temporal scalability codec is proposed by introducing shape coding, a new motion estimation/compensation method, weighting techniques, and background composition. The major feature of this technique is determining the frame rate of the selected objects in the motion picture individually so that the motion of the selected region is smoother than that of the other area. The observation of the computer simulation proves that the proposed method achieves the better image quality and it enables us to represents the motion of the selected objects hierarchically. RAR 355 êáàéò |
|
Philippe Salembier, Ferran Marqu´es, Montse Pard`as, Josep Ramon Morros, Isabelle Corset, Sylvie Jeannin, Lionel Bouchard, Fernand Meyer, and Beatriz Marcotegui | Segmentation-Based Video Coding System Allowing the Manipulation of Objects |
Abstract—This paper presents a generic video coding algorithm allowing the content-based manipulation of objects. This manipulation is possible thanks to the definition of a spatiotemporal segmentation of the sequences. The coding strategy relies on a joint optimization in the rate-distortion sense of the partition definition and of the coding techniques to be used within each region. This optimization creates the link between the analysis and synthesis parts of the coder. The analysis defines the time evolution of the partition, as well as the elimination or the appearance of regions that are homogeneous either spatially or in motion. The coding of the texture as well as of the partition relies on region-based motion compensation techniques. The algorithm offers a good compromise between the ability to track and manipulate objects and the coding efficiency. RAR 1509 êáàéò |
|
Kevin J. O’Connell | Object-Adaptive Vertex-Based Shape Coding Method |
Abstract—The paper presents a new technique for compactly representing the shape of a visual object within a scene. This method encodes the vertices of a polygonal approximation of the object’s shape by adapting the representation to the dynamic range of the relative locations of the object’s vertices and by exploiting an octant-based representation of each individual vertex. The object-level adaptation to the relative-location dynamic range provides the flexibility needed to efficiently encode objects of different sizes and with different allowed approximation distortion. At the vertex-level, the octant-based representation allows coding gains for vertices closely spaced relative to the object-level dynamic range. This vertex coding method may be used with techniques which code the polygonal approximation error for further gains in coding efficiency. Results are shown which demonstrate the effectiveness of the vertex encoding method. The rate-distortion comparisons presented show that the technique’s adaptive nature allows it to operate efficiently over a wide range of rates and distortions and across a variety of input material, whereas other methods are efficient over more limited conditions. RAR 121 êáàéò |
|
Emmanuel Reusens, Touradj Ebrahimi, and Murat Kunt, Fellow | Dynamic Coding of Visual Information |
Abstract—This paper introduces a novel approach to visual data compression. The approach, named dynamic coding, consists of an effective competition between several representation models used for describing data portions. The image data is represented as the union of several regions each approximated by a representation model locally appropriate. The dynamic coding concept leads to attractive features such as genericness, flexibility, and openness and is therefore particularly suited to a multimedia environment in which many types of applications are involved. Dynamic coding is a general proposal to visual data compression and many variations on the same theme may be designed. They differ by the particular procedure by which the data is segmented into objects and the local representation model selected. As an illustrative example, a video compression scheme based on the principles of dynamic coding is presented. This compression algorithm performs a joint optimization of the segmentation (restricted to a so-called generalized quadtree partition) together with the representation models associated with each data segment. Four representation models are competing namely, fractal, motion compensation, text and graphics, and background modes. Optimality is defined with respect to a ratedistortion tradeoff and the optimization procedure leads to a multicriterion segmentation. RAR 303 êáàéò |
|
Mark R. Banham, and James C. Brailean | A Selective Update Approach to Matching Pursuits Video Coding |
Abstract—This paper addresses an approach to video coding utilizing an iterative nonorthogonal expansion technique called “matching pursuits” (MP) in combination with a new algorithm for selecting an appropriate coding technique at each frame in a sequence. This decision algorithm is called “selective update” and is based on an estimate of the amount and type of motion occurring between coded frames in a video sequence. This paper demonstrates that the matching pursuits approach is most efficient for video coding when motion compensation results in prediction error which is well localized to the edges of moving objects. In the presence of global motion, such as panning and zooming, or in the presence of objects entering or leaving a scene, matching pursuits becomes less effective than orthogonal transform-based coding techniques like the blockbased discrete cosine transform (DCT). The rate-distortion characteristics of matching pursuits and block-wise DCT coding are used to demonstrate how MP coding can be more efficient than block-wise DCT-based coding. When an appropriate combination of these nonorthogonal and orthogonal transforms are used for encoding a complete low bit-rate video sequence, improved overall compression efficiency can be achieved. Results are shown which demonstrate the effectiveness of a hybrid video codec based on this concept. RAR 304 êáàéò |
|
Fran?cois Br?emond and Monique Thonnat | Tracking Multiple Nonrigid Objects in Video Sequences |
Abstract— This paper presents a method to track multiple nonrigid objects in video sequences. First, we present related works on tracking methods. Second, we describe our proposed approach. We use the notion of target to represent the perception of object motion. To handle the particularities of nonrigid objects we define a target as an individually tracked moving region or as a group of moving regions globally tracked. Then we explain how to compute the trajectory of a target and how to compute the correspondences between known targets and moving regions newly detected. In the case of an ambiguous correspondence we define a compound target to freeze the associations between targets and moving regions until a more accurate information is available. Finally we provide an example to illustrate the way we have implemented the proposed tracking method for videosurveillance applications. RAR 845 êáàéò |
|
Kiyoharu Aizawa, Kazuya Kodama, and Akira Kubota | Producing Object-Based Special Effects by Fusing Multiple Differently Focused Images |
Abstract—We propose a novel approach for producing special visual effects by fusing multiple differently focused images. This method differs from conventional image-fusion techniques because it enables us to arbitrarily generate object-based visual effects such as blurring, enhancement, and shifting. Notably, the method does not need any segmentation. Using a linear imaging model, it directly generates the desired image from multiple differently focused images. RAR 666 êáàéò |
|
Radu S. Jasinschi, Thumpudi Naveen, Ali J. Tabatabai, and Paul Babic-Vovk | Apparent 3-D Camera Velocity—Extraction and Applications |
Abstract—In this paper, we describe a robust method for the extraction of the apparent 3-D camera velocity and 3-D scene structure information. Our method performs the extraction of the apparent 3-D camera velocity in a fully automated way without any knowledge about 3-D scene content information as used in current methods. This has the advantage that it can be used to fully automate the generation of natural-looking virtual/augmented environments, as well as in video-database browsing. First, we describe our method for the robust extraction of 3-D parameters. This method is a combination of the eight-point method in structure-from-motion with a statistical technique to automatically select feature points in the image, irrespective of 3-D content information. Second, we discuss two applications which use the results of the 3-D parameter extraction. The first application is the generation of sprite layers using 3-D camera velocity information to represent an eight-parameter perspective image-to-sprite mapping plus 3-D scene depth information for the sprite layering. The second application is the use of 3-D camera velocity for the indexing of large video databases according to a set of seven independent types of camera motion. RAR 357 êáàéò |
|
Chun-Jen Tsai and Aggelos K. Katsaggelos, Fellow | Sequential Construction of 3-D-Based Scene Description |
Abstract—Binocular camera systems are commonly used to construct 3-D-based scene description. However, there is a tradeoff between the length of the camera baseline and the difficulty of the matching problem and the extent of the field of view of the 3-D scene. A large baseline system provides better depth resolution than a smaller baseline system at the expense of a narrower field of view. To increase the depth resolution without increasing the difficulty of the matching problem and decreasing the field of view of the 3-D scene, a sequential 3-D-based scene description technique is proposed in this paper. Multiple small-baseline 3-D scene descriptions from a single moving camera or an array of cameras are used to sequentially construct a large baseline 3-D scene description while maintaining the field of view of a small-baseline system. A Bayesian framework using a disparity-space image (DSI) technique for disparity estimation is presented. The cost function for large baseline image matching is designed based not only on the photometric matching error, the smoothness constraint, and the ordering constraint, but also on the previous disparity estimates from smaller baseline stereo image pairs as a prior model. Texture information is registered along the scan path of the camera(s). Experimental results demonstrate the effectiveness of this technique in visual communication applications. RAR 545 êáàéò |
|
In Kyu Park, Il Dong Yun, and Sang Uk Lee, | Automatic 3-D Model Synthesis from Measured Range Data |
Abstract—In this paper, we propose an algorithm to construct 3-D surface model from a set of range data, based on non-uniform rational B-splines (NURBS) surface-fitting technique. It is assumed that the range data is initially unorganized and scattered 3-D points, while their connectivity is also unknown. The proposed algorithm consists of three stages: initial model approximation employing -means clustering, hierarchical decomposition of the initial model, and construction of NURBS surface patch network. The initial model is approximated by both polyhedral and triangular model. Then, the initial model is represented by a hierarchical graph, which is efficiently used to construct the 1 continuous NURBS patch network of the whole object. Experiments are carried out on synthetic and real range data to evaluate the performance of the proposed algorithm. It is shown that the initial model as well as the NURBS patch network are constructed automatically with tolerable computation. The modeling error of the NURBS model is reduced to 10%, compared with the initial mesh model. RAR 295 êáàéò |
|
Haruo Hoshino, Fumio Okano, and Ichiro Yuyama | A Study on Resolution and Aliasing for Multi-Viewpoint Image Acquisition |
Abstract—We equate multi-viewpoint image acquisition with object sampling from different viewpoints, and calculate the resolution of multi-viewpoint camera systems. Aliasing, which occurs with the sampling, leads to depth shifting of objects. For instance, an image of a distant object may be taken as if it were near when aliasing occurs. A condition of the camera pitch free from aliasing is discussed. An appropriate prefilter for the sampling can eliminate alias-causing spatial–frequency components, even when the camera pitch is large. We analyze the characteristics of the prefilters from the aspects of depth shifting, ghosting, and waveform distortion. The experimental results show that a prefilter, which reduces ghosting, can be realized optically. For precise acquisition of multi-viewpoint images, however, a prefilter with electrical processing is needed. RAR 354 êáàéò |
|
André Redert, Emile Hendriks, and Jan Biemond, Fellow | 3-D Scene Reconstruction with Viewpoint Adaptation on Stereo Displays |
Abstract—In this paper, we propose a generic algorithm for the geometrically correct reconstruction of 3-D scenes on stereo displays with viewpoint adaptation. This forms the basis of multiviewpoint systems, which are currently the most promising candidates for real-time implementations of 3-D visual communication systems. The reconstruction algorithm needs 3-D tracking of the viewers’ eyes with respect to the display. We analyze the effect of eye-tracking errors. A simple bound will be derived, below which reconstruction errors cannot be observed.We design a multiviewpoint system using a recently introduced image-based scene representation. The design formed the basis of the real-time multiviewpoint system that was recently built in the European PANORAMA project. Experiments with both natural and synthetic scenes show that the proposed reconstruction algorithm performs well. The experiments are performed by computer simulations and the real-time PANORAMA system. RAR 633 êáàéò |
|
Fabio Lavagetto and Roberto Pockaj | The Facial Animation Engine: Toward a High-Level Interface for the Design of MPEG-4 Compliant Animated Faces |
Abstract—In this paper, we propose a method for implementing a high-level interface for the synthesis and animation of animated virtual faces that is in full compliance with MPEG-4 specifications. This method allows us to implement the simple facial object profile and part of the calibration facial object profile. In fact, starting from a facial wireframe and from a set of con- figuration files, the developed system is capable of automatically generating the animation rules suited for model animation driven by a stream of facial animation parameters. If the calibration parameters (feature points and texture) are available, the system is able to exploit this information for suitably modifying the geometry of the wireframe and for performing its animation by means of calibrated rules computed ex novo on the adapted somatics of the model. Evidence of the achievable performance is reported at the end of this paper by means of figures showing the capability of the system to reshape its geometry according to the decoded MPEG-4 facial calibration parameters and its effectiveness in performing facial expressions. RAR 1230 êáàéò |
|
J¨orgen Ahlberg and Haibo Li | Representing and Compressing Facial Animation Parameters Using Facial Action Basis Functions |
Abstract—In model-based, or semantic, coding, parameters describing the nonrigid motion of objects, e.g., the mimics of a face, are of crucial interest. The facial animation parameters (FAP’s) specified in MPEG-4 compose a very rich set of such parameters, allowing a wide range of facial motion. However, the FAP’s are typically correlated and also constrained in their motion due to the physiology of the human face. We seek here to utilize this spatial correlation to achieve efficient compression. As it does not introduce any interframe delay, the method is suitable for interactive applications, e.g., videophone and interactive video, where low delay is a vital issue. RAR 86 êáàéò |
|
? | Introduction to the Special Issue on Object-Based Video Coding and Description |
RAR 136 êáàéò |
|
Wilfried Philips | Comparison of Techniques for Intra-Frame Coding of Arbitrarily Shaped Video Object Boundary Blocks |
Abstract— This paper presents experimental results that demonstrate that the weakly separable polynomial orthonormal transform outperforms the shape adaptive discrete cosine transform (SADCT) and the recently introduced improved SADCT with .dc correction at the expense of a nonprohibitive increase in the number of computations. Some other improvements to SADCT-like schemes are also suggested. noise ratio (PSNR) of the O-SADCT is typically 1–2 dB better than that of the NO-SADCT for any given bit rate. RAR 75 êáàéò |
|
Bert DeKnuydt, Stef Desmet, and Luc Van Eycken | Coding of Dynamic Texture for Mapping on 3-D Scenes |
Abstract—As the availability of powerful 3-D scene renderers grows, the demand for high visual quality 3-D scenes is increasing. Besides more detailed geometric and texture information, this presupposes the ability to map dynamic textures. This is obviously needed to model movies, computers, and TV screens but also, for example, for the landscape as seen from inside a moving vehicle or shadow and lighting effects that are not modeled separately. Downloading the complete scene to the user, before letting him interact with the scene, becomes very impractical and inefficient with huge scenes. If, as is often the case, a back channel is available, on-demand downloading allows the user to start interacting with the scene immediately. Specifically for dynamic texture, if we know the viewpoint of the user (or several users), we can code the texture taking into account the viewing conditions, i.e., coding and transmitting each part of the texture with the required resolution only. Applications that would benefit from view-dependent coding of dynamic textures include (but are not limited to) multiplayer three-dimensional (3-D) games, walkthroughs of dynamic constructions or scenes, and 3-D simulations of dynamic systems. In this paper, the feasibility of such a scheme based on an adapted optimal level allocation video codec is shown together with the huge data-rate reductions that can be achieved with it. RAR 743 êáàéò |
|
Hideyuki Fujishima, Yusuke Takemoto, Takao Onoye, and Isao Shirakawa, Fellow | An Architecture of a Matrix-Vector Multiplier Dedicated to Video Decoding and Three-Dimensional Computer Graphics |
Abstract—An architecture of a matrix-vector multiplier (MVM) is devised, which is dedicated to MPEG-4 natural/synthetic video decoding. The MVM can perform the matrix-vector multiplication both in the inverse discrete cosine transform (IDCT) and in the geometrical transformation of three-dimensional computer graphics (3-D CG); or, specifically, it can achieve the multiplication of a 4 . 4 matrix by a four-tuple vector necessary in the one-dimensional IDCT for eight pixels and in the geometrical transformation for a point in a 3-D space. This paper describes a new architecture of this MVM and also shows the implementation result of a functional module composed of four MVM’s with the use of 440-k transistors, which can operate at 20 MHz or less. RAR 433 êáàéò |
|
G. L. Foresti | A Real-Time System for Video Surveillance of Unattended Outdoor Environments |
Abstract—This paper describes a visual surveillance system for remote monitoring of unattended outdoor environments. The system, which works in real time, is able to detect, localize, track, and classify multiple objects moving in a surveilled area. The object classification task is based on a statistical morphological operator, the statistical pecstrum (called specstrum), which is invariant to translations, rotations, and scale variations, and it is robust to noise. Classification is performed by matching the specstrum extracted from each detected object with the specstra extracted from multiple views of different real object models contained in a large database. Outdoor images are used to test the system in real functioning conditions. Performances about good classification percentage, false and missed alarms, viewpoint invariance, noise robustness, and processing time are evaluated. RAR 258 êáàéò |
|
A. Murat Tekalp, Yucel Altunbasak, and Gozde Bozdagi | Two- Versus Three-Dimensional Object-Based Video Compression |
Abstract— This paper compares two-dimensional (2-D) and threedimensional (3-D) object modeling in terms of their capabilities and performance (peak signal-to-noise-ratio and visual image quality) for very low bitrate video coding. We show that 2-D object-based coding with affine/perspective transformations and triangular mesh models can simulate almost all capabilities of 3-D object-based approaches using wireframe models at a fraction of the computational cost. Furthermore, experiments indicate that a 2-D mesh-based coder–decoder performs favorably compared to the new H.263 standard in terms of visual quality. RAR 479 êáàéò |
|
Minoru Etoh, Choong Seng Boon, and Shinya Kadono | Template-Based Video Coding with Opacity Representation |
Abstract— We describe an image coding scheme based on multiple templates for interactive audio-visual (A/V) database retrieval. The image sequence to be coded consists of overlapping image planes with opacity information and depth ordering. In this method, each image plane is independently encoded to a different bit stream where each video object sequence is reconstructed from representative frames (i.e., templates) by global and local deformations. Owing to this coding scheme, the following new functionalities are supported: identification and manipulation of two-dimensional (2-D) video objects, selective decoding and browsing of visual contents, and very high compression efficiency. We have extended an MPEG1 coding software to an object-based coding software. Experimental results of the proposed scheme prove the above advantages. RAR 665 êáàéò |
|
Federico Pedersini, Augusto Sarti, and Stefano Tubaro | Visible Surface Reconstruction With Accurate Localization of Object Boundaries |
A common limitation of many techniques for 3-D reconstruction from multiple perspective views is the poor quality of the results near the object boundaries. The interpolation process applied to “unstructured” 3-D data (“clouds” of non-connected 3-D points) plays a crucial role in the global quality of the 3-D reconstruction. In this paper, we present a method for interpolating unstructured 3-D data, which is able to perform a segmentation of such data into different data sets that correspond to different objects. The algorithm is also able to perform an accurate localization of the boundaries of the objects. The method is based on an iterative optimization algorithm. As a first step, a set of surfaces and boundary curves are generated for the various objects. Then, the edges of the original images are used for refining such boundaries as best as possible. Experimental results with real data are presented for proving the effectiveness of the proposed algorithm. RAR 1127 êáàéò |
|
Liang Zhang | Automatic Adaptation of a Face Model Using Action Units for Semantic Coding of Videophone Sequences |
Abstract—The topic of investigation is automatic adaptation of a face model at the beginning of a videophone sequence for implementing mimic analysis by means of action units in a semantic coder. Here, not only the face model is to be adapted to match the real face, but also initial values of action units are to be determined. In the proposed algorithm, eye and mouth features are first estimated using deformable templates. Then, the face model Candide is adapted to these estimated features in three steps, namely: 1) the global adaptation; 2) the local adaptation; and 3) the mimic adaptation. For the mimic adaptation, six action units are used and their initial values are determined. The proposed adaptation algorithm differs from previous works in the following aspects: 1) there is no restriction on the rotation for the global adaptation of the face model and 2) initial values of action units are determined due to the mimic adaptation. The proposed algorithm has been experimented onto synthetic images and natural head-and-shoulder videophone sequences with a spatial resolution corresponding to CIF and a frame rate of 10 Hz. The average errors for the estimation of eye and mouth features and for the adaptation of the face model amount to 1.936 (pel) and 2.009 (pel), respectively. With this adaptation algorithm, mimic analysis for semantic coding by means of action units in the subsequent frames is realizable. RAR 482 êáàéò |
|
Nebojsa Jojic, Jin Gu, Helen C. Shen, and Thomas S. Huang, Fellow | Computer Modeling, Analysis, and Synthesis of Dressed Humans |
Abstract—In this paper, we present computer vision techniques for building dressed human models using images. In the first part, we develop an algorithm for three-dimensional body reconstruction and texture mapping using contour, stereo, and texture information from several images and deformable superquadrics as the model parts. In the second part, we demonstrate a novel vision technique for analysis of cloth draping behavior. This technique allows for estimation of cloth model parameters, such as bending properties, but can also be used to estimate the contact points between the body and clothing in the range data of dressed humans. Combined with our body reconstruction algorithm and additional constraints on the articulation model, the detection of the garment–body contact points allows construction of a dressed human model in which even the geometry that was covered by clothing in the available data is reasonably well estimated. RAR 905 êáàéò |
|
Ñàéò î ñæàòèè >> Ñòàòüè è èñõîäíèêè >>
Ìàòåðèàëû ïî âèäåî
Ñìîòðèòå òàêæå ìàòåðèàëû:
- Ïî öâåòîâûì ïðîñòðàíñòâàì
- Ïî JPEG
- Ïî JPEG-2000
íàâåðõ
Ïîäãîòîâèëè Ñåðãåé Ãðèøèí è Äìèòðèé Âàòîëèí