Social Networks
Mitarbeiter: Yang Chen ,
Cong Ding ,
Xiaoming Fu ,
Sufian Hameed ,
Lei Jiao ,
David Koll ,
Wenzhong Li ,
Florian Tegeler ,
Tianyin Xu ,
Konglin Zhu
Projektpartner: University of Goettingen (Computer Science, Medical School, Courant Research Centre Evolution of Social Behaviour), MPI for Dynamics and Self-organization, Nanjing University, Tsinghua University, Deutsche Telekom Laboratories/TU Berlin, University of Cambridge, Swedish Institute of Computer Science and Technion.
We collaborate with several national and international research teams to exploit the social structures and evolutionary social behaviors of real-world individuals and online Internet users, and socially- or proximity-associated individuals, understanding the fundamental principles of epidemics, social networks, and individual behaviors.
Links: Upcoming events related to social networks research:
NetSciCom'12 @ IEEE INFOCOM, Orlando, Mar 30, 2012 SNS'12 @ ACM EuroSys, Bern, Apr 10, 2012 Interdisziplinäre Frühjahrstagung 2012 , Frankfurt, May 25-26, 2012
HotSocial'12 and SNAKDD'12 at ACM KDD, Beijing, Aug 12, 2012 WOSN'12 @ ACM SIGCOMM, Helsinki, Aug 17, 2012Social Structure Summer School , Göttingen, Sept 3-8, 2012Sunbelt Conference of INSNARelated links:
Publikationen of this project:
Disclaimer :
These papers are made available as a means to ensure timely dissemination of scholarly and technical work
on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders,
notwithstanding that they have offered their works here electronically. It is understood that all persons copying
this information will adhere to the terms and constraints invoked by each author's copyright. These works may not
be reposted without the explicit permission of the copyright holder.
2013
Optimizing Data Center Traffic of Online Social Networks ,
Lei Jiao , Jun Li , Xiaoming Fu , The 19th IEEE International Workshop on Local and Metropolitan Area Networks (LANMAN 2013), Brussels, Belgium,
Best Paper Award, April 2013.
Zusammenfassung lesen
With a huge number of users and a very large scale of data, an Online Social Network (OSN) service has to partition its data among multiple servers inside a data center. As data are often partitioned randomly, the response time in accessing the data is however unpredictable. Researchers have proposed social locality to address this concern: if a server hosts the master replica of a user's data, it must also host a replica (either master or slave) of every friend of this user, thus enabling convenient access of all of them on the same server. However, doing so comes with two overheads: the replication storage and the traffic of maintaining replica consistency. Existing work focuses on the former, but overlooks the latter that can consume considerable network resources. In this paper, we study social-locality-aware partitioning of the OSN data while meeting diverse performance goals of data center networks. We formulate the traffic optimization problem and propose a new traffic-aware data partitioning algorithm. Through the evaluations with a large-scale, real-world Twitter trace, we further show that, compared with state-of-the-art algorithms, our algorithm significantly reduces traffic without deteriorating the load balance among servers and causing extra replication storage.
jiao_lanman13.pdf [237.2 kB]
Fighting Spam Using Social GateKeepers ,
Sufian Hameed , Xiaoming Fu , Pan Hui and Nishanth Satry, Networking Science (accepted),
Springer Verlag, 2013.
Zusammenfassung lesen
We introduce LENS (LEveraging social Networking and trust to prevent Spam transmission), a novel spam protection system which leverages the recipient's social network to allow correspondence within the social network to directly pass to the mailbox of the recipient. To enable new senders to send emails, legitimate and authentic users, called Gatekeepers (GKs), are selected from outside the recipient's social circle and within pre-defined social distances. Our evaluations show that LENS provides each recipient reliable email delivery from a large fraction (up to 55% of entire user-base) of the social network; it is also effective and light-weight in accepting all the legitimate inbound emails in the real email traces. LENS imposes zero overhead for the common case of frequent and familiar senders, and remains lightweight for the general case. Our prototype implementation of LENS in Postfix/MailAvenger shows that LENS consumes up to 75% less CPU and 9% less memory as traditional solutions like SpamAssassin.
PDF [484.3 kB]
2012
Cost Optimization for Online Social Networks on Geo-Distributed Clouds ,
Lei Jiao , Jun Li , Tianyin Xu and Xiaoming Fu , the 20th IEEE International Conference on Network Protocols (ICNP 2012), Austin, Texas, USA,
November 2012.
Zusammenfassung lesen
Geo-distributed IaaS (Infrastructure-as-a-Service) clouds provide an intriguing platform to deploy Online Social Network (OSN) services. To leverage the potential of clouds, a major task of OSN providers is optimizing the monetary cost spent on cloud resource utilization while providing satisfactory Quality of Service (QoS) to OSN users. We thus study the problem of cost optimization for the dynamic OSN on multiple geo-distributed clouds over consecutive time periods, with its QoS meeting the pre-defined requirement. We model the QoS as well as the cost of an OSN, formulate the problem, and design a solution named cosplay. Our experiments with a large-scale Twitter trace show that, while always ensuring the QoS as required, cosplay can achieve superior one-time cost reduction compared with the state of the art, and can also reduce the accumulative cost significantly when continuously evaluated over 48 months with dynamics comparable to real-world OSNs.
jiao_icnp12.pdf [465.1 kB]
Predicting offline behaviors from online features -- an ego-centric dynamical network approach ,
Lianghao Dai , Jarder Luo, Xiaoming Fu , Zhichao Li, Proceedings of First ACM International Workshop on Interdisciplinary Social Networks Research (HotSocial 2012), Beijing, China, in conjunction with ACM KDD 2012,
ISBN 978-1-4503-1549-4, August 2012.
Zusammenfassung lesen
Investigating online social behaviors may help us to better understand and predict offline high risk behaviors in gay communities. But how can offline behaviors be predicted from online social networks? This article selects data from 26 online social network groups from QQ (a Chinese based messaging software) administered by gay communities of “W” city of Hubei Province, China. Based on online data mining, social network analysis, and offline semi-structural interviews, we argue that the ego-centric dynamical network analysis—an approach which combines partial network dynamics, individual features, and structure position together—can be used to derive the probabilistic features for predicting offline high risk behaviors (HRB). An example of HRB is “one night stands” (gays for one night: 419) for gay homosexuals.
PDF [1576.4 kB]
Exploring Regional and Global Population Growth in Online Social Networks (extended abstract) ,
Konglin Zhu , Wenzhong Li , Xiaoming Fu , Praxis der Netzwerkforschung 2012 (PdN 2012), Frankfurt am Main, Germany,
May 2012.
2011
GEMSTONE: Empowering Decentralized Social Networking with High Data Availability ,
Florian Tegeler , David Koll , and Xiaoming Fu , IEEE GLOBECOM 2011 - Selected Areas in Communications Symposium - Social Networks Track, Houston, TX, USA,
IEEE, December 2011.
Zusammenfassung lesen
Social networking platforms such as Facebook, MySpace, Twitter, and many others have seen an enormous increase in user population and user provided information. However, users are increasingly concerned about identity and data privacy since information is aggregated at single companies. To address this issue researchers have been investigating alternative solutions, where the users’ data such as profile information, comments and messages is stored at user-controlled nodes. Although these solutions provide a plausible means for avoiding privacy leaking in central instances, they raise a new challenge to design a cost-effective storage replica scheme which ensures a high data availability even when some users are offline. In this paper we present Gemstone, a social network platform, where the data replication scheme leverages a learning mechanism based on social relationships, online patterns of peers and user experiences. Our preliminary evaluation shows that compared to related works, it achieves higher data availability while only requiring a smaller number of data replicas.
PDF [434.9 kB]
Scaling Microblogging Services with Divergent Traffic Demands ,
Tianyin Xu , Yang Chen , Lei Jiao , Ben Y. Zhao, Pan Hui, Xiaoming Fu , ACM/IFIP/USENIX 12th International Middleware Conference (Middleware 2011), Lisboa, Portugal, Lecture Notes in Computer Science 7049, pages 20-40,
Springer Verlag, December 2011.
Zusammenfassung lesen
Today’s microblogging services such as Twitter have long outgrown their initial designs as SMS-based social networks. Instead, a massive and steadily-growing user population of more than 100 million is using Twitter for everything from capturing the mood of the country to detecting earthquakes and Internet service failures. It is unsurprising that the traditional centralized client-server architecture has not scaled with user demand, leading to server overload and significant loss of availability.
In this paper, we argue that the divergence in usage models of microblogging services can best be addressed using complementary mechanisms, one that provides reliable messages between friends, and another that delivers events from popular celebrities and media outlets to their millions of followers. We present Cuckoo, a new microblogging system that offloads processing and bandwidth costs away from a small centralized server base while ensuring reliable message delivery. We use a 20-day Twitter availability measurement to guide our design, and trace-driven emulation of 30,000 Twitter users to evaluate our Cuckoo prototype. Compared to a centralized approach, Cuckoo achieves 30-50% server bandwidth savings and 50-60% CPU load reduction, all while guaranteeing reliable message delivery.
cuckoo.pdf [2593.3 kB]
Latency-Aware Data Partitioning for Geo-Replicated Online Social Networks ,
Lei Jiao , Tianyin Xu , Jun Li , Xiaoming Fu , ACM/IFIP/USENIX 12th International Middleware Conference (Middleware 2011), Poster session, Lisboa, Portugal,
December 2011.
Zusammenfassung lesen
No abstract is associated with this entry. Please download the two-page poster directly. Thanks.
TAMER.pdf [156.4 kB]
Pomelo: Accurate and Decentralized Shortest-path Distance Prediction in Social Graphs ,
Zhuo Chen, Yang Chen , Cong Ding , Beixing Deng, Xing Li, ACM SIGCOMM Computer Communication Review, Volume 41, Issue 4, Pages 406-407,
ACM, October 2011.
Zusammenfassung lesen
Computing the shortest-path distances between nodes is a key problem in analyzing social graphs. Traditional methods like breadth-first search (BFS) do not scale well with graph size. Recently, a Graph Coordinate System, called Orion, has been proposed to estimate shortest-path distances in a scalable way. Orion uses a landmark-based approach, which does not take account of the shortest-path distances between non-landmark nodes in coordinate calculation. Such biased input for the coordinate system cannot characterize the graph structure well. In this paper, we propose Pomelo, which calculates the graph coordinates in a decentralized manner. Every node in Pomelo computes its shortest-path distances to both nearby neighbors and some random distant neighbors. By introducing the novel partial BFS, the computational overhead of Pomelo is tunable. Our experimental results from different representative social graphs show that Pomelo greatly outperforms Orion in estimation accuracy while maintaining the same computational overhead.
p406-chen.pdf [488.2 kB]
Understanding Graph Sampling Algorithms for Social Network Analysis ,
Tianyi Wang, Yang Chen , Zengbin Zhang, Tianyin Xu , Long Jin, Pan Hui, Beixing Deng, Xing Li, The 3rd ICDCS Workshop on Simplifying Complex Networks for Practitioners (SIMPLEX'11),
Minneapolis, MN, USA, June 2011.
Zusammenfassung lesen
Being able to keep the graph scale small while capturing the properties of the original social graph, graph sampling provides an efficient, yet inexpensive solution for social network analysis. The challenge is how to create a small, but representative sample out of the massive social graph with millions or even billions of nodes. Several sampling algorithms have been proposed in previous studies, but there lacks fair evaluation and comparison among them. In this paper, we analyze the state-ofart graph sampling algorithms and evaluate their performance on some widely recognized graph properties on directed graphs using large-scale social network datasets. We evaluate not only the commonly used node degree distribution, but also clustering coefficient, which quantifies how well connected are the neighbors of a node in a graph. Through the comparison we have found that none of the algorithms is able to obtain satisfied sampling results in both of these properties, and the performance of each algorithm differs much in different kinds of datasets.
simplex11.pdf [167.5 kB]
Albatross Sampling: Robust and Effective Hybrid Vertex Sampling for Social Graphs ,
Long Jin, Yang Chen , Pan Hui, Cong Ding , Tianyi Wang, Athanasios Vasilakos, Beixing Deng, Xing Li, The 3rd ACM Workshop on Hot Topics in Planet-scale Measurement (HotPlanet'11), Co-located with ACM MobiSys 2011,
June 2011.
Zusammenfassung lesen
Nowadays, Online Social Networks (OSNs) have become dramatically popular and the study of social graphs attracts the interests of a large number of researchers. One critical challenge is the huge size of the social graph, which makes the graph analyzing or even the data crawling incredibly time consuming, and sometimes impossible to be completed. Thus, graph sampling algorithms have been introduced to obtain a smaller subgraph which reflects the properties of the original graph well. Breadth-First Sampling (BFS) is widely used in graph sampling, but it is biased towards high-degree vertices during the process of sampling. Besides, Metropolis-Hasting Random Walk (MHRW), which is proposed to get unbiased samples of the social graph, requires the graph to be well connected. In this paper, we propose a vertex sampling algorithm, so-called Albatross Sampling (AS), which introduces random jump strategy into MHRW during the sampling process. The embedded random jump makes the sampling procedure more flexible and avoids being trapped in some locally well connected part. According to our evaluation, we find that no matter using tightly or loosely connected graphs, AS performs significantly better than MHRW and BFS. On the one hand, AS estimates the degree distribution with much lower Normalized Mean Square Error (NMSE) by consuming the same resource budget. On the other hand, to get an acceptable estimation of the degree distribution, AS requires much less resource budget.
AS_HotPlanet11.pdf [381.0 kB]
Exploring User Social Behaviors in Mobile Social Applications ,
Konglin Zhu , Pan Hui, Yang Chen , Xiaoming Fu , Wenzhong Li , 4th ACM Workshop on Social Network Systems (SNS 2011), in conjunction with ACM EuroSys 2011, Salzburg, Austria,
April 2011.
Zusammenfassung lesen
Mobile social applications are popular as the proliferating of mobile devices. Understanding user social behaviors is important to improve mobile social applications and enhance its quality of service. However, there is still lack of data for real deployment mobile social application on data analysis of human interaction and social behaviors in mobile social networks.
In this paper, we introduce the experiment methodology of deploying the Goose software in two campuses located in Germany and China respectively. Goose is a mobile social network application allows microblogging, message sending. With the help of volunteers, we collect user interaction data in the duration of 15 days. Based on the collected data, our observation reveals the following aspects of user interactions and their influences. First, user overall activities approximately match user daily life work pattern
with a slightly longer time duration and periodically appearance. Second, user encounters in mobile social network follow the heavy tail distribution in small social communities, and user interactions follow the Pareto principle, where about 20% of users make close connections to the other users. Third, communication path between a pair of mobile nodes is mostly within 6 hops, and information diffusion using an epidemic strategy demonstrates that the informed population reaches to 50% in a short term and approaches to 80% in a long term.
PDF [185.3 kB]
Cuckoo: Scaling Microblogging Services with Divergent Traffic Demands ,
Tianyin Xu , Yang Chen , Lei Jiao , Ben Y. Zhao, Pan Hui, Xiaoming Fu , Technical Report No. IFI-TB-2011-01, Institute of Computer Science, University of Goettingen, Goettingen, Germany, ISSN 1611-1044,
January 2011.
Zusammenfassung lesen
Today's microblogging services such as Twitter have long outgrown their initial designs as SMS-based social networks. Instead, a massive and steadily-growing user population of more than 100 million is using Twitter for everything from capturing the mood of the country to detecting earthquakes and Internet service failures. It is unsurprising then, that the traditional centralized client-server architecture has not scaled with user demand, leading to server overload and significant loss of availability.
In this paper, we argue that the divergence in usage models of microblogging services can best be addressed using complementary mechanisms, one that provides reliable messages between friends, and another that delivers events from popular celebrities and media outlets to their millions of followers. We present Cuckoo, a new microblogging system that offloads processing and bandwidth costs away from a small centralized server base while ensuring reliable message delivery. We use a 20-day Twitter availability measurement to guide our design, and trace-driven emulation of 30,000 Twitter users to evaluate our Cuckoo prototype. Compared to a centralized approach, Cuckoo achieves 30-50% server bandwidth savings and 50-60% CPU load reduction, all while guaranteeing reliable message delivery.
Technical Report.pdf [4259.3 kB]
2010
Development of a Mobile Social Networking Platform supporting Decentralized Data Storage optimized by Social Trust ,
David Koll , , master thesis, Center of Computational Sciences, ISSN 1612-6793, No. ZFI-MSC-2010-01, University of Goettingen, Germany,
November 2010.
Unbiased Sampling in Directed Social Graph ,
Tianyi Wang, Yang Chen , Zengbin Zhang, Peng Sun, Beixing Deng, Xing Li, ACM SIGCOMM Computer Communication Review, Volume 40, Issue 4, Pages 401-402,
ACM, October 2010.
Zusammenfassung lesen
Microblogging services, such as Twitter, are among the most important online social networks(OSNs). Different from OSNs such as Facebook, the topology of microblogging service is
a directed graph instead of an undirected graph. Recently, due to the explosive increase of population size, graph sampling has started to play a critical role in measurement and characterization studies of such OSNs. However, previous studies have only focused on the unbiased sampling of undirected social graphs. In this paper, we study the unbiased sampling algorithm for directed social graphs. Based on the traditional Metropolis-Hasting Random Walk (MHRW) algorithm, we propose an unbiased sampling method for directed social graphs(USDSG). Using this method, we get the first, to the best of our knowledge, unbiased sample of directed social graphs. Through extensive experiments comparing with the ‘ground truth’ (UNI, obtained through uniform sampling of directed graph nodes), we show that our method can achieve excellent performance in directed graph sampling and the error to UNI is less than 10%.
SocialGraph_sigcomm10.pdf [58.3 kB]
Twittering by Cuckoo -- Decentralized and Socio-Aware Online Microblogging Services ,
Tianyin Xu , Yang Chen , Xiaoming Fu , Pan Hui, ACM SIGCOMM Computer Communication Review, Volume 40, Issue 4, Pages 473-474,
ACM, October 2010.
Zusammenfassung lesen
Online microblogging services, as exemplified by Twitter, have become immensely popular during the latest years. However, current microblogging systems severely suffer from performance bottlenecks and malicious attacks due to the centralized architecture. Thus, centralized microblogging systems may threaten the scalability, reliability as well as availability of the offered services, not to mention
the high operational and maintenance cost.
This demo presents a decentralized, socio-aware microblogging system named Cuckoo. The key aspects of Cuckoo’s design is to take advantage of the inherent social relations while leveraging peer-to-peer (P2P) techniques in order to provide scalable, reliable microblogging services. The demo will show these aspects of Cuckoo and provide insights on the performance gain that decentralization and socioawareness can bring for microblogging systems.
Cuckoo_sigcomm10.pdf [257.5 kB]
APEX: A Personalization Framework to Improve Quality of Experience for DVD-like Functions in P2P VoD Applications ,
Tianyin Xu , Baoliu Ye, Qinhui Wang, Wenzhong Li , Sanglu Lu, Xiaoming Fu , the 18th IEEE International Workshop on Quality of Service (IWQoS 2010), Beijing, China,
IEEE, June 2010.
Zusammenfassung lesen
The requirement for supporting DVD-like functions raises new challenges to the design of P2P VoD systems. The uncertainty of frequent user DVD-like interactivity makes it difficult to ensure user perceived Quality of Experience (QoE) for real-time streaming services over distributed self-organized P2P overlay networks. Most existing solutions are based on the unreasonable assumption that all the users in P2P VoD systems have the same preference. Few attention has been paid to personalization, which accommodates the differences between users. In this paper, we present a video model which characterizes the personalization information for users' contents and preferences. Based on this model, we develop APEX, a practical personalization framework for P2P VoD applications. APEX makes the personalization practical by using a hybrid architecture which leverages the offline pattern mining on the server side and online collaborative filtering on the peer side. Furthermore, APEX helps peers to personalize navigation, prefetching and membership management, aiming at improving QoE for DVD-like functions by reducing the response latency and optimizing content sharing. Both theoretical analysis and comprehensive simulations show that APEX outperforms most existing schemes in terms of accumulated hit ratio, response latency, and searching efficiency.
PDF [392.2 kB]
Towards Decentralized, Socio-Aware Online Microblogging Services and Data Measurement ,
Tianyin Xu , Yang Chen , Jin Zhao , and Xiaoming Fu , In Proc. ACM MobiSys 2010 HotPlanet Workshop,
ACM, June 2010.
Zusammenfassung lesen
Online microblogging services, as exemplified by Twitter and Yammer, have become immensely popular during the latest three years. Twitter, the most successful microblogging service, has attracted more than 41.7 million users as of July 2009 and is still growing fast. However,
current microblogging systems severely suffer from performance bottlenecks and central points of failure due to their centralized architecture. Thus, centralized microblogging systems may threaten the scalability, reliability, as well as availability of the offered services, not to mention the extremely high operational and maintenance cost.
However, it is not trivial to decentralize microblogging services in a peer-to-peer fashion. The challenges first derive from the heterogeneity of the inherent online social network (OSN) features. The non-reciprocation feature of microblogging services also increases the heterogeneity. Moreover, different from traditional approaches used in centralized server based systems, an efficient, robust and scalable approach for data collection and dissemination in such distributed heterogeneous environments is desirable.
In this paper, we present a decentralized, socio-aware microblogging system named Cuckoo. The design takes advantages of the inherent social relationships while leverages P2P techniques towards scalable, reliable microblogging services. Besides, Cuckoo provides a flexible interface for data collection while circumventing unnecessary traffic on the server. We discuss the benefits that our system may bring for both service providers and end users. We also discuss the technical aspects to be considered and report our work in progress.
Cuckoo_Hotplanet10.pdf [223.9 kB]
2009
Interest-based Peer-to-Peer Group Management ,
Jun Lei , Xiaoming Fu , Second IEEE/ACM International Workshop on Future Multimedia Networking (FMN 2009), Coimbra, Portugal,
Springer LNCS, June 2009.
Zusammenfassung lesen
Peer-to-Peer systems become popular applications but suffer from insufficient resource availability which is caused by free-riders and inefficient lookup algorithms. To address the first cause, a number of recent works have focused on providing appropriate incentive mechanisms to encourage participants to contribute their resources to the P2P systems. To improve the lookup efficiency, locality-aware peer management has been introduced into the research community. However, existing proposals attempt to optimize the service performance during the data transmission period mostly after performing the neighboring lookup, which cannot address the fundamental concern of reducing lookup traffic. Besides, existing implementations select available contributors among random neighbors suggested by a specific server. Therefore, this paper proposes interest-based peer-to-peer management (IPM) protocol to facilitate the peering lookup. Our design philosophy differs from existing work that IPM is a client-only approach and can be represented as either an alternative or a complementary to the current proposals. With additional locality-awareness considerations, IPM can reduce the lookup overhead while optimizing the P2P traffic performance. The simulation results essentially state that IPM can largely improve the efficiency and reliability of P2P media distribution systems, for instance, reduces control overhead by 50% on average and reduces average packet loss rate up to 34.7%.
PDF [234.0 kB]