DK Panda from Ohio State University spoke first and gave a compressed version of a much longer talk that illustrates the benefits of high bandwidth and low latency interconnect for distributed Enterprise services like memcached as well as benchmarks like Yahoo’s Cloud Serving Benchmark (YCSB) — interesting stuff, especially as an illustration of the converging requirements we are seeing between HPC and Enterprise.
My talk starts at about 12m30s in the video and, embarrassingly since it was supposed to be a 10-minute talk, ends at about 29m30s. I made some general comments about the larger context of Big Data, presented results of our performance tests running Hadoop virtualized on the vSphere platform and then ended with some comments about the role of interconnects for Big Data.
Milind Bhandaskar, EMC/Greenplum Chief Engineer, spoke next about how applications drive system development, specifically in the areas of machine learning, analytics and reporting as well as visualization for BigData. He also explained the phases seen in Data Sciences workloads: Obtain, Scrub, Explore, Model, Interpret.
The final panel speaker was Sumanta Chatterjee, VP Server Technologies at Oracle who spoke about Oracle’s view of a BigData appliance built around an Acquire -> Organize -> Analyze workflow that uses Oracle Exa* products. He then spoke about specific issues with respect to BigData and bandwidth.