WHAT’S THE BIG DATA ?
Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis. Some people like to constrain big data to digital inputs like web behavior and social network interactions; however we can’t exclude traditional data derived from product transaction information, financial records and interaction channels, such as the call center and point-of-sale. All of that is big data. In defining big data, it’s also important to understand the mix of unstructured and multi-structured data that comprises the volume of information.
Unstructured data comes from information that is not organized or easily interpreted by traditional databases or data models, and typically, it’s text-heavy. Metadata, Twitter tweets, and other social media posts are good examples of unstructured data.
Structured data refers to a variety of data formats and types and can be derived from interactions between people and machines, such as web applications or social networks.
Industry leaders like the global analyst firm Gartner use phrases like “volume” (the amount of data), “velocity” (the speed of information generated and flowing into the enterprise) and “variety” (the kind of data available) to begin to frame the big data discussion. Others have focused on additional V’s, such as big data’s “veracity” and “value.”
LET’S LOOK AT THE IMPORTANT Vs OF BIG DATA
Volume: Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.
Velocity.: Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.
Variety: Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.
Variability: In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data involved.
Complexity: Today’s data comes from multiple sources. And it is still an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control.
BIG DATA TECHNOLOGY
Big data technology must support search, development, governance and analytics services for all data types—from transaction and application data to machine and sensor data to social, image and geospatial data, and more.
Systems: Your infrastructure must capitalize on real-time information flowing through your organization. It must be optimized for analytics to respond dynamically—with automated business processes, better agility and improved economics—to the increasing demands of big data.
Privacy: To protect your reputation and brand, your platform must comprise stringent policies and practices around privacy and data protection, safeguarding all of the data and insights on which your business relies.
Governance: The right platform instills trust, so you can act with confidence. It controls how information is created, shared, cleansed, consolidated, protected, maintained, retired and integrated within your enterprise.
Storage: To achieve economies and efficiencies, you must run certain analytics close to the data, while it is in motion. But for data you elect to store, your infrastructure must embody a defensible disposal strategy that reduces the run rate of storage, legal expense and risk.
Security: As you infuse analytics into your organization, data security becomes more central to your competitive advantage profile. Your infrastructure must have strong security measures built in to guard your organization against internal and external threats.
Cloud: To relieve the pressure that big data is placing on your IT infrastructure, you can host big data and analytics solutions on the cloud. Achieve the scalability, flexibility, expandability and economics that will provide competitive advantage into the future.
ARCHITECTURE OF BIG DATA
Big Data Analytics for Manufacturing Applications can be based on a 5C architecture (connection, conversion, cyber, cognition, and configuration). Let’s analyze the 5C Level Architectures;
Smart Connection: Acquiring accurate and reliable data from machines and their components is the first step in developing a cyber-physical system application. The data might be directly measured by sensors or obtained from controller or enterprise manufacturing systems such as ERP, MES, SCM and CMM. Two important factors at this level have to be considered. First, considering various types of data, a seamless and tether-free method to manage data acquisition procedure and transferring data to the central server is required where specific protocols such as MTConnect, etc. are effectively useful. On the other hand, selecting proper sensors (type and specification) is the second important consideration for the first level.
Data-to-Information Conversion: Meaningful information has to be inferred from the data. Currently, there are several tools and methodologies available for the data to information conversion level. In recent years, extensive focus has been applied to develop these algorithms specifically for prognostics and health management applications. By calculating health value, estimated remaining useful life, etc., the second level of CPS architecture brings self-awareness to machines.
Cyber: The cyber level acts as central information hub in this architecture. Information is being pushed to it from every connected machine to form the machines network. Having massive information gathered, specific analytics has to be used to extract additional information that provide better insight over the status of individual machines among the fleet. These analytics provide machines with self-comparison ability, where the performance of a single machine can be compared with and rated among the fleet and on the other hand, similarities between machine performance and previous assets (historical information) can be measured to predict the future behavior of the machinery. In this paper we briefly introduce an efficient yet effective methodology for managing and analyzing information at cyber level.
Cognition: Implementing CPS upon this level generates a thorough knowledge of the monitored system. Proper presentation of the acquired knowledge to expert users supports the correct decision to be taken. Since comparative information as well as individual machine status is available, decision on priority of tasks to optimize the maintaining process can be made. For this level, proper info-graphics are necessary to completely transfer acquired knowledge to the users.
Configuration: The configuration level is the feedback from cyber space to physical space and act as supervisory control to make machines self-configure and self-adaptive. This stage acts as resilience control system (RCS) to apply the corrective and preventive decisions, which has been made in cognition level, to the monitored system
WHAT IS CHANGING IN THE REALM OF BIG DATA ?
Big data is changing the way people within organizations work together. It is creating a culture in which business and IT leaders must join forces to realize value from all data. Insights from big data can enable all employees to make better decisions—deepening customer engagement, optimizing operations, preventing threats and fraud, and capitalizing on new sources of revenue. But escalating demand for insights requires a fundamentally new approach to architecture, tools and practices.