A Serious Look at 10 Big Data V’s

ibm-big-data

So, what are the V’s representing big data’s biggest challenges? I list below ten (including Doug Laney’s initial 3 V’s) that I have encountered and/or contributed. These V-based characterizations represent ten different challenges associated with the main tasks involving big data (as mentioned earlier: capture, cleaning, curation, integration, storage, processing, indexing, search, sharing, transfer, mining, analysis, and visualization).

  1. Volume: = lots of data (which I have labeled a “Tonnabytes”, to suggest that the actual numerical scale at which the data volume becomes challenging in a particular setting is domain-specific, but we all agree that we are now dealing with a “ton of bytes”).
  2. Variety: = complexity, thousands or more features per data item, the curse of dimensionality, combinatorial explosion, many data types, and many data formats.
  3. Velocity: = high rate of data and information flowing into and out of our systems, real-time, incoming!
  4. Veracity: = necessary and sufficient data to test many different hypotheses, vast training samples for rich micro-scale model-building and model validation, micro-grained “truth” about every object in your data collection, thereby empowering “whole-population analytics”.
  5. Validity: = data quality, governance, master data management (MDM) on massive, diverse, distributed, heterogeneous, “unclean” data collections.
  6. Value: = the all-important V, characterizing the business value, ROI, and potential of big data to transform your organization from top to bottom (including the bottom line).
  7. Variability: = dynamic, evolving, spatiotemporal data, time series, seasonal, and any other type of non-static behavior in your data sources, customers, objects of study, etc.
  8. Venue: = distributed, heterogeneous data from multiple platforms, from different owners’ systems, with different access and formatting requirements, private vs. public cloud.
  9. Vocabulary: = schema, data models, semantics, ontologies, taxonomies, and other content- and context-based metadata that describe the data’s structure, syntax, content, and provenance.
  10. Vagueness: = confusion over the meaning of big data (Is it Hadoop? Is it something that we’ve always had? What’s new about it? What are the tools? Which tools should I use? etc.) Note: I give credit here to Venkat Krishnamurthy (Director of Product Management at YarcData) for introducing this new “V” at the Big Data Innovation Summit in Santa Clara on June 9, 2014.

BLOG POSTS

ADDRESS

650 Parliament Street, Toronto,Ontraio, Canada
Phone: (416) 939-0044
Fax: (647) 720-2214
Website: http://www.datajadoo.com
Email: info@datajadoo.com

DISCLAIMER

Important:: This site has been setup purely for showcasing the analytic's skills of Data Jadoo. All the content are designed by Data Jadoo. Author retains his or her views on the topics expressed here. All images are copyrighted to their respective creators.