4D TeleCast: Large Scale Dissemination of Multi-view Multi-stream 3D Contents
[2011 - present] Related Publications: ICDCS'2012, MMSYS'2013
3D Tele-immersive (3DTI) systems create real-time multi-stream and multi-view 3D collaborative contents from multiple sites to allow interactive shared activities in virtual environments. Applications of 3DTI include online sports, tele-health, remote learning and collaborative arts. In addition to interactive participants in 3DTI environments, we envision a large number of passive viewers that (a) watch the interactive activities in 3DTI shared environments, and (b) select views of the activities at run time. To achieve this vision, we present 4D TeleCast, a novel 3D content distribution framework providing the functionality of multi-view selection. It addresses the following challenges: (1) supporting a large number of concurrent views as well as viewers, (2) preserving the unique nature of 3DTI multi-stream and multi-view dependencies, and (3) allowing dynamic viewer behavior such as view changes and large-scale simultaneous viewer arrivals or departures. We divide the problem space into two: (1) overlay construction problem that aims to minimize the cost of distribution of 3D contents, and maximize the number of concurrent viewers, and (2) view synchronization problem that aims to preserve the multi-stream dependencies in a view.
OFDiff: OpenFlow Based Data Center Debugging
[2011-present]
Debugging operational data centers can be very hard due totheir large scale, distributed state, and the substantial dynamism present in application characteristics. Currently thereis no such tool to aid data center operators to detect anddebug changes in data center behavior across time. Such tools are useful to debug a data center faults by using logs of working and non-working states. To this extent, we propose OFDiff, an offline low overhead data center debugging tool leveraging the OpenFlow technology. Based on our previous experience of using OpenFlow as a data center sensing and measurement technology, we develop techniques to debug the data center environment by detecting changes in applications and data center infrastructure. Furthermore, we identify common data center operations and attribute the changes in data center to these operations. Changes in applications, which cannot be attributed to administrator operations are further analyzed to associate them to system contexts, possibly causing the deviations. We detect the application changes, underlying infrastructure deviations as well as data center operations from the OpenFlow control traffic, which makes our tool easy to use and very useful both in deployment and production phases.
DIAMOND: Bandwidth Efficient Online Correlation Based Anomaly Detection in Distributed Interactive Environments
[2010-2011] Related Publications: ISM'2010
Distributed Interactive Multimedia Environments (DIMEs) show important dependency constraints between application and underlying system components over time. For example, the video frame rate and the underlying bandwidth usage have a strong performance dependency. Performance dependencies must also be considered among distributed components. These dependencies over a time-span form correlation relationships. Violations of such correlation relationships represent collective anomalies. Users and most specifically DIME application developers face problems of finding (detecting), localizing such anomalies, and adapting against them in real-time. Current practices are to collect joint application-system metadata characterizing behaviors of application and system components while a DIME session is running, and then analyze them offline. Our goal is to provide a framework, called DIAMOND, that allows for real-time and unobtrusive collection and organization of joint application system metadata in order to assist in finding such correlation violations in the system. DIAMOND works in four steps: (a) real-time metadata collection, (b) metadata processing to allow efficient computation of correlation constraints, (c) metadata distribution for efficient clustering of distributed metadata, and (d) anomaly detection, localization, and evolution monitoring based on violations of correlation relationships.
RESCUE: Scaling Data Plane Logging in Large Scale Network
[2009-2010] Related Publications: MILCOM'2011
Understanding and troubleshooting wide area networks(such as military backbone networks and ISP networks are challenging tasks due to their large, distributed, and highly dynamic nature. Building a system that can record and replay fine-grained behaviors of such networks would simplify this problem by allowing operators to recreate the sequence and precise ordering of events (e.g., packet-level forwarding decisions, route changes, failures) taking place in their networks. However, doing this at large scales seems intractable due to the vast amount of information that would need to be logged. In this paper, we propose a scalable and reliable framework to monitorfine-grained data-plane behavior within a large network. We give a feasible architecture for a distributed logging facility, a tree-based data structure for log compression and show how this logged information helps network operators to detect and debug anomalous behavior of the network. Experimental results obtained through trace-driven simulations and Click software router experiments show that our design is lightweight in terms of processing time, memory requirement and control overhead, yet still achieves over 99% precision in capturing network events.
QoS (Quality of Service) impact on QoE (Quality of Experience) in 3D Tele-immersive Interactive Environments
[2009-2012] Related Publications: MM'2009, ISM'2010, MM'2011
The past decades have witnessed a rapid growth of Distributed Interactive Multimedia Environments (DIMEs). Despite their intensity of user-involved interaction, the existing evaluation frameworks remain very much system-centric. As a step toward the human-centric paradigm, we present a conceptual framework of Quality of Experience (QoE) in DIMEs, to model, measure, and understand user experience and its relationship with the traditional Quality of Service (QoS) metrics. A multi-displinary approach is taken to build up the framework based on the theoretical results from various fields including psychology, cognitive sciences, sociology, and information technology. We introduce a mapping methodology to quantify the correlations between QoS and QoE, and describe our controlled and uncontrolled studies as illustrating examples. The results present the first deep study to model the multi-facet QoE construct, map the QoS-QoE relationship, and capture the human-centricquality modalities in the context of DIMEs.
We also conduct a psychophysical study that measures the perceptual thresholds of a new factor called Color-plus-Depth Level-of-Detail peculiar to polygon-based 3D tele-immersive video. The results demonstrate the existence of Just Noticeable Degradation and Just Unacceptable Degradation thresholds on the factor. In light of the results,we describe the design and implementation of a real-time perception-based quality adaptor for 3D tele-immersive video.Our experimental results show that the adaptation scheme can reduce resource usage while considerably enhancing theoverall perceived visual quality.
CloudInsight: Learning Based Debugging and Troubleshooting for Virtual Cloud Data Center
[2010-2011] Related Publications: SRDS'2011
Cloud computing provides a revolutionary new computing paradigm for deploying enterprise applications and Internet services. Rather than operating their own datacenters, today cloud users run their applications on the remote cloud infrastructures that are owned and managed by cloud providers. However, the cloud computing paradigm also introduces some new challenges in system management. Cloud users create virtual machine instances to run their specific application logic without knowing the underlying physical infrastructure. On the other side, cloud providers manage and operate their cloud infrastructures without knowing their customers’ applications. Due to the decoupled ownership of applications and infrastructures, if a problem occurs, there is no visibility for either cloud users or providers to understand the whole context of the incident and solve it quickly. Tothis end, we propose a software solution, CloudInsight, to provide some visibility through the middle virtualization layer for both cloud users and providers to address their problems quickly. CloudInsight automatically tracks each VM instance’s configuration status and maintains their life-cycle configuration records in a configuration management database (CMDB). When a user reports a problem, our algorithms automatically analyze CMDB to probabilistically determine the root cause and invoke a recovery process by interacting with the clouduser. Experimental results over data from Amazon EC2 online support forum and NEC Labs’ research cloud infrastructures demonstrate that our approach can effectively automate the problem troubleshooting process in cloud environments.
Q-Tree: Multi-Attribute Based Query Solution for Large Scale Distributed Interactive Environments
[2008-2009] Related Publications: ICDCS'2009
Users and administrators of large distributed systems are frequently in need of monitoring and management of its various components, data items and resources. Though there exist several distributed query and aggregation systems, the clustered structure of tele-immersive interactive frameworks and their time-sensitive nature and application requirements represent a new class of systems which poses different challengeson this distributed search. Multi-attribute composite range queries are one of the key features in this class. Queries are given in high level descriptions and then transformed into multi-attribute composite range queries. Designing such a query engine with minimum traffic overhead, low service latency, and with static and dynamic nature of large datasets, is a challenging task. In this paper, we proposea general multi-attribute based range query framework, Q-Tree, that provides efficient support for this class of systems. In order to serve efficient queries, Q-Tree builds a single topology-aware tree overlay by connecting the participating nodes in a bottom-up approach, and assigns range intervals on each node in a hierarchical manner. We show the relative strength of Q-Tree by analytically comparing it against P-Tree, P-Ring, Skip-Graph and Chord. With fine-grainedload balancing and overlay maintenance, our simulations with PlanetLab traces show that our approach can answer complex queries within a fraction of a second.
[2011 - present] Related Publications: ICDCS'2012, MMSYS'2013
3D Tele-immersive (3DTI) systems create real-time multi-stream and multi-view 3D collaborative contents from multiple sites to allow interactive shared activities in virtual environments. Applications of 3DTI include online sports, tele-health, remote learning and collaborative arts. In addition to interactive participants in 3DTI environments, we envision a large number of passive viewers that (a) watch the interactive activities in 3DTI shared environments, and (b) select views of the activities at run time. To achieve this vision, we present 4D TeleCast, a novel 3D content distribution framework providing the functionality of multi-view selection. It addresses the following challenges: (1) supporting a large number of concurrent views as well as viewers, (2) preserving the unique nature of 3DTI multi-stream and multi-view dependencies, and (3) allowing dynamic viewer behavior such as view changes and large-scale simultaneous viewer arrivals or departures. We divide the problem space into two: (1) overlay construction problem that aims to minimize the cost of distribution of 3D contents, and maximize the number of concurrent viewers, and (2) view synchronization problem that aims to preserve the multi-stream dependencies in a view.
OFDiff: OpenFlow Based Data Center Debugging
[2011-present]
Debugging operational data centers can be very hard due totheir large scale, distributed state, and the substantial dynamism present in application characteristics. Currently thereis no such tool to aid data center operators to detect anddebug changes in data center behavior across time. Such tools are useful to debug a data center faults by using logs of working and non-working states. To this extent, we propose OFDiff, an offline low overhead data center debugging tool leveraging the OpenFlow technology. Based on our previous experience of using OpenFlow as a data center sensing and measurement technology, we develop techniques to debug the data center environment by detecting changes in applications and data center infrastructure. Furthermore, we identify common data center operations and attribute the changes in data center to these operations. Changes in applications, which cannot be attributed to administrator operations are further analyzed to associate them to system contexts, possibly causing the deviations. We detect the application changes, underlying infrastructure deviations as well as data center operations from the OpenFlow control traffic, which makes our tool easy to use and very useful both in deployment and production phases.
DIAMOND: Bandwidth Efficient Online Correlation Based Anomaly Detection in Distributed Interactive Environments
[2010-2011] Related Publications: ISM'2010
Distributed Interactive Multimedia Environments (DIMEs) show important dependency constraints between application and underlying system components over time. For example, the video frame rate and the underlying bandwidth usage have a strong performance dependency. Performance dependencies must also be considered among distributed components. These dependencies over a time-span form correlation relationships. Violations of such correlation relationships represent collective anomalies. Users and most specifically DIME application developers face problems of finding (detecting), localizing such anomalies, and adapting against them in real-time. Current practices are to collect joint application-system metadata characterizing behaviors of application and system components while a DIME session is running, and then analyze them offline. Our goal is to provide a framework, called DIAMOND, that allows for real-time and unobtrusive collection and organization of joint application system metadata in order to assist in finding such correlation violations in the system. DIAMOND works in four steps: (a) real-time metadata collection, (b) metadata processing to allow efficient computation of correlation constraints, (c) metadata distribution for efficient clustering of distributed metadata, and (d) anomaly detection, localization, and evolution monitoring based on violations of correlation relationships.
RESCUE: Scaling Data Plane Logging in Large Scale Network
[2009-2010] Related Publications: MILCOM'2011
Understanding and troubleshooting wide area networks(such as military backbone networks and ISP networks are challenging tasks due to their large, distributed, and highly dynamic nature. Building a system that can record and replay fine-grained behaviors of such networks would simplify this problem by allowing operators to recreate the sequence and precise ordering of events (e.g., packet-level forwarding decisions, route changes, failures) taking place in their networks. However, doing this at large scales seems intractable due to the vast amount of information that would need to be logged. In this paper, we propose a scalable and reliable framework to monitorfine-grained data-plane behavior within a large network. We give a feasible architecture for a distributed logging facility, a tree-based data structure for log compression and show how this logged information helps network operators to detect and debug anomalous behavior of the network. Experimental results obtained through trace-driven simulations and Click software router experiments show that our design is lightweight in terms of processing time, memory requirement and control overhead, yet still achieves over 99% precision in capturing network events.
QoS (Quality of Service) impact on QoE (Quality of Experience) in 3D Tele-immersive Interactive Environments
[2009-2012] Related Publications: MM'2009, ISM'2010, MM'2011
The past decades have witnessed a rapid growth of Distributed Interactive Multimedia Environments (DIMEs). Despite their intensity of user-involved interaction, the existing evaluation frameworks remain very much system-centric. As a step toward the human-centric paradigm, we present a conceptual framework of Quality of Experience (QoE) in DIMEs, to model, measure, and understand user experience and its relationship with the traditional Quality of Service (QoS) metrics. A multi-displinary approach is taken to build up the framework based on the theoretical results from various fields including psychology, cognitive sciences, sociology, and information technology. We introduce a mapping methodology to quantify the correlations between QoS and QoE, and describe our controlled and uncontrolled studies as illustrating examples. The results present the first deep study to model the multi-facet QoE construct, map the QoS-QoE relationship, and capture the human-centricquality modalities in the context of DIMEs.
We also conduct a psychophysical study that measures the perceptual thresholds of a new factor called Color-plus-Depth Level-of-Detail peculiar to polygon-based 3D tele-immersive video. The results demonstrate the existence of Just Noticeable Degradation and Just Unacceptable Degradation thresholds on the factor. In light of the results,we describe the design and implementation of a real-time perception-based quality adaptor for 3D tele-immersive video.Our experimental results show that the adaptation scheme can reduce resource usage while considerably enhancing theoverall perceived visual quality.
CloudInsight: Learning Based Debugging and Troubleshooting for Virtual Cloud Data Center
[2010-2011] Related Publications: SRDS'2011
Cloud computing provides a revolutionary new computing paradigm for deploying enterprise applications and Internet services. Rather than operating their own datacenters, today cloud users run their applications on the remote cloud infrastructures that are owned and managed by cloud providers. However, the cloud computing paradigm also introduces some new challenges in system management. Cloud users create virtual machine instances to run their specific application logic without knowing the underlying physical infrastructure. On the other side, cloud providers manage and operate their cloud infrastructures without knowing their customers’ applications. Due to the decoupled ownership of applications and infrastructures, if a problem occurs, there is no visibility for either cloud users or providers to understand the whole context of the incident and solve it quickly. Tothis end, we propose a software solution, CloudInsight, to provide some visibility through the middle virtualization layer for both cloud users and providers to address their problems quickly. CloudInsight automatically tracks each VM instance’s configuration status and maintains their life-cycle configuration records in a configuration management database (CMDB). When a user reports a problem, our algorithms automatically analyze CMDB to probabilistically determine the root cause and invoke a recovery process by interacting with the clouduser. Experimental results over data from Amazon EC2 online support forum and NEC Labs’ research cloud infrastructures demonstrate that our approach can effectively automate the problem troubleshooting process in cloud environments.
Q-Tree: Multi-Attribute Based Query Solution for Large Scale Distributed Interactive Environments
[2008-2009] Related Publications: ICDCS'2009
Users and administrators of large distributed systems are frequently in need of monitoring and management of its various components, data items and resources. Though there exist several distributed query and aggregation systems, the clustered structure of tele-immersive interactive frameworks and their time-sensitive nature and application requirements represent a new class of systems which poses different challengeson this distributed search. Multi-attribute composite range queries are one of the key features in this class. Queries are given in high level descriptions and then transformed into multi-attribute composite range queries. Designing such a query engine with minimum traffic overhead, low service latency, and with static and dynamic nature of large datasets, is a challenging task. In this paper, we proposea general multi-attribute based range query framework, Q-Tree, that provides efficient support for this class of systems. In order to serve efficient queries, Q-Tree builds a single topology-aware tree overlay by connecting the participating nodes in a bottom-up approach, and assigns range intervals on each node in a hierarchical manner. We show the relative strength of Q-Tree by analytically comparing it against P-Tree, P-Ring, Skip-Graph and Chord. With fine-grainedload balancing and overlay maintenance, our simulations with PlanetLab traces show that our approach can answer complex queries within a fraction of a second.