ARTICLE

Linked Data Connectivity – Graphs are the Crux of the Biscuit

Apr 2, 2020
2/4/2020
|
Dr. Thomas Kamps
CEO CONWEAVER GmbH

‍(The original version of this article was published on LinkedIn.)

In the textbook “Primary Mathematics – Teaching For Understanding” [1] the authors describe “making connections” as integral to the idea of an understanding in mathematics. “The mathematics is understood if its mental representation is part of a network of representations”. These days, as we speak about systems of systems to grasp the complexity of processes in the manufacturing industries, it seems important to me to relate these abstractions to the same concept of ‘understanding’. We can better comprehend processes and then automate them if we create a representation that is similar to the mental model developed by students learning mathematics. I would call such a representation a corporate digital brain.

The corporate brain relates business entities across processes and functions and allows product, asset, data, customer and other lifecycles to be represented in a holistic way. It stands metaphorically for a memory storing knowledge about the overall process. The brain's underlying mathematical structure is a graph. Such graphs comprising linked business entities are often called Knowledge Graphs (KGs) or Enterprise Knowledge Graphs (EKGs). We @CONWEAVER have delivered productive solutions to renowned customers such as General Motors, Bosch, Daimler and others that have been running successfully for more than five years now. But such solutions are still new terrain for many companies. This is reflected by the fact that Gartner classified KGs as emerging technology only in 2018 [2]. Because of the division of labor, the major obstacle for the introduction of corporate brains are disconnected data silos. As we are not the first ones who have recognized this deficit it makes sense to compare existing solutions with graph solutions. To do so I will relate these solutions to graphs along data connectivity and consistency, the main technical features that need to be achieved to overcome the data gap.

The Business Context & the Challenge

Large manufacturers are extending their products to include intelligent services based on data. Data comprise those maintained and stored in existing applications along the lifecycles as well as those gained from the field. This affects not only the products, but also the machines that make the products because at the moment there exists no closed loop covering the entire lifecycle.

This is in contradiction to what Product Lifecycle Management (PLM) once promised: to cover the whole product lifecycle from requirements to delivery. Unfortunately, PLM is still bound to engineering processes, essentially PDM systems around the V-model. As sensors are the building blocks for higher value services software analyzing the impact of field data on product development becomes more and more important. So, if OEMs (or other discrete manufacturers) want to sell services as a new business model, consistent and connected (logically linked) data across all functions are a strong requirement. Otherwise, such use scenarios like flashing software updates over the air or feedback loops from field data analysis will remain pie in the sky.

Technologies, Solutions, Approaches

What arethe technologies and solutions offered so far? I would like to discussdifferent categories:

Data Exchange Solutions are a nearby form of data availability by which one system delivers data to another one. This happens either by direct exchange of data via API or by using a middleware/resource bus that standardizes access to the connected authoring systems. The middleware, however, can only be viewed as a data provider. It does not ensure in any way that data of different application systems are logically connected or consistent. This typically happens by employing, manually crafted mapping tables. Yet, based on such local mapping tables global tasks such as traceability along the value chain are difficult to achieve. But middleware could nevertheless still act as a data provider if “on-premise” solutions are favored. Nevertheless, such data exchange solutions might do a good job if data between two companies just need to be exchanged.

One Storage Place Solutions aim at either storing data in one place or hosting whole applications in one place. A new form that arose in recent years are data lakes. They are designed to store mass data of any kind in one container. The idea is to throw big data into this lake and apply AI techniques to gain valuable insights. However, it turned out that this will not work in many cases due to a problem which is addressed as “data preparation”, see [3]. The author, Alfrick Opidi, estimates data preparation to account for 80% of the challenge. The online journal industryweek [4] report that the International Institute for Analytics estimates that less than 10% of AI pilot projects have reached full-scale production. This is, by the way, not only recognized in discrete manufacturing and the process industries but also in finance where AI techniques are used to gain insights on money laundering or other types of financial crime [5]. Data are stored in silos and are inconsistent across systems and this prevents AI techniques (in fact machine learning) to be applied properly. Thus Allan Frank 6] of Think New Visions considers availability of knowledge graphs as a competitive advantage in AI. From this discussion it follows that mass storage and AI alone will not achieve the desired data insights. Nevertheless, it can be taken for granted that data lakes or similar technologies will store the additional big data because there are no other established storage devices for such data. It also turns out that companies have set up quite a few data lakes for different purposes already which makes it again more difficult to maintain cross-consistency. But there are all these other data systems ranging over the value chain. What about them? Throwing these data into the lakes too?

Cloud Solutions are an outsourcing based service offered by vendors to move the company applications either into private or public clouds. How does that relate to the required connectivity of data and the availability of business contexts? Why would data be better connected if applications were hosted in a cloud? This is not apparent. There is a discussion introduced by Oleg Shilovitsky in his LinkedIn article “Oracle Cloud PLM - Unified Data Model, Frequent Upgrades And Less Integrations?” [7] in which he quotes Avery De Marr's article "PLM - A New Hope" [8]. The citation I find interesting is the following:

Ask yourself - can you pull data from your product, your manufacturing floor, from social media and the greater web in general in order to drive quality resolution faster than your competition? Can you do it real-time? Do you have the foundation to be able to combine information from your operations into your system of record, and intelligently act on these insights? Can you do this without an army of recourses, because your system gives you these insights? PLM4.0 answers these questions.

I understand, it is about availability of data and timeliness. Availability of data might either refer to the pulling of data to store them in the cloud or to pulling them to have access at query time while they are stored somewhere else. The latter does not make a lot of sense from the point of view of a cloud vendor, though. If you pull them to save them then you have them stored in applications in one place. That is the essential meaning of the cloud, to have them in one spot and you as a customer may just consume them. Thus, concerning availability of data along the value chain the cloud argument is similar to the data lake argument: it relates to the storage place even though data are still distributed and disconnected among different applications within this storage place. From the point of view of connectivity and consistency, what is then different as compared to on-premise? As a summary of this discussion I would argue that desired business context established by “making connections” between business objects cannot be gained if we just move the data to the same location.

Data Warehouse Based Solutions were considered for so called “monolithic solutions”. The idea was to use this kind of technology as a general-purpose storage device that would act as the database of integrated data. Integrated data could then be visualized or redistributed to other applications. As data warehouses rely on relational table structures it gets very quickly very complex if you want to extend or modify a data model to cover many uses cases. This is probably the reason why such projects have often failed. The business requirements of projects changed faster than IT teams were ready to implement the solutions. Due to the relation table complexity of such warehouses human experts who managed the data models got lost. This is the reason why warehouses cannot act as global logical linking devices. It is the wrong tool for the job.

However, data warehouses also serve as specific-purpose-driven data aggregation tools, mainly to provide business performance indicators. Data are extracted from one or more application systems and value is added by abstracting data into higher level units of interest. These are stored in a data warehouse and are typically offered for consumption by means of cockpits, e.g., BI applications. Mostly, such solutions are driven by specific analytic use cases and this is reflected in the underlying data models as well as in the UIs. Abstracting typically is the hierarchic relation which is used to implement aggregation. Other types of semantic relations are generally difficult to implement because this again will increase complexity of relation table structures.

All-in-One Vendor Strategy The same with big vendors who propagate a “all-in-one vendor” strategy that would make data gaps disappear because you bought all your solutions from only one vendor. Such strategies are more a type of business approaches than technological ones, that's why I call it 'strategy' not solution. Typically, big vendors are actually conglomerates of smaller companies they have acquired over time. In this way, they may offer rich varieties of products and services but at the same time have challenges with their own heterogeneity which impacts on portfolio applications that do not understand one another. If this is the case connectivity and consistency will never see the light of day. However, it may well be a sound approach for a smaller company to choose one vendor because the application portfolio still is manageable. Bigger companies are typically reluctant to do that due to risk calculations.

Graph as the Natural, Holistic Representation of Connectivity between Discrete Entities

One-storage-place solutions sometimes insinuate that availability of data in one space or hosted by one vendor is already the solution. This is unfortunately not the case as we have argued above. In the same way real-time-availability does not replace logical connectivity and consistency. If you have disconnected and inconsistent data, you cannot gain insight from them even if they are right in front of your nose. The essential features to represent lifecycle data are logical connectivity and consistency. The ability to connect facts across the value chain is the basic tool we need to provide business context and thus the essential feature that helps closing the above-mentioned feedback loop. It is of secondary importance where data are stored just as it is of secondary importance to have them available in real-time. This does, of course, not exclude cases where real-time-availability and logical connectivity/consistency are required.

How can we establish logically connected and consistent data? Looking at the complexity of current data landscapes, even in medium sized companies, the only choice we have is to accept the world in its diversity. Consequently, logical data connectivity must be decoupled from the existing data world. The important business objects relating data in authoring systems including data lakes need to be abstracted and linked. The evolving structures, as I have indicated above are then graphs (EKGs) whose nodes represent business objects and whose edges represent semantic relations between them. Such graphs are the natural representation for the digital thread relating real world objects in manufacturing or those driving on the streets with their digital twin(s). Technically speaking this is my imagination of what Joe Barkai [9] means when he talks about

The connected enterprise enables each stakeholder analyze data from many relevant sources, develop solution and conduct impact analyses that address all product lifecycle functions and phases and drive better decisions as early as possible.

Many researchers such as Jens Göbel [10] argue for the necessity of a meta-PLM. For the reasons outlined above I would claim that graphs are best suited to act as the fundamental data structure for such meta-PLMs.

Hedberg [11] remarks a lack of graph-based research in PLM and this is according to his account because the majority of research is still focused on data management in manufacturing.

A quick trip into the world of music shows that the same type of connectivity is important for understanding an artist's work. It's the context!

…At first things seem unconnected, but slowly a pattern starts to emerge, as more of the pieces seem to interact…

This is a nice depiction of Frank Zappa’s “conceptual continuity” I have found this on Dweezil Zappa’s web page [12]. It says that the artist’s work is depicted in the “Frank Zappa art work context” where the Apostrophe’ ('symbol for conjoining two words together') is the link that glues his art pieces together, the link is “crux of the biscuit”, that gives the combination a higher meaning [13]. This kind of holistic view can be applied analogously to our business context, too. If you think in graphs it appears that a digital twin might be viewed as a relative thing, you can define the digital twin relative to the task of the user or the insights you want to gain. A persistent EKG will enable traceability, an essential feature of systems engineering, and it might, therefore, trigger a discussion on how data-driven vs. model driven development can best be achieved. How can such graphs be created and what are the main requirements for pragmatic solutions?

Very Large Graphs – Scalability of Graph Computation

From the sheer amount of heterogeneous data stored in larger companies it is obvious that EKGs generally are very large graphs, concretely, soon rather trillions of objects together with a multiple of links. Consequently, automatic techniques to compute and update the graph(s) must be brought in place.

There are, of course, also opportunities for the manual creation of domain ontologies, topic maps, and standardization measures using description languages such as OSLC, RDF or OWL, Nicolas Figay [14] discusses such issues in relation with PLM. However, manual editing and maintenance of such structures are laborious, require harmonization effort, and therefore, in my opinion do not scale with company data. Moreover, the major driver of innovation concerning processes is automation and automation requires connected lifecycles. And this in turn requires the brown field as well as the field data to be included as quickly as possible. Who wants to mark the brown field with OSLC tags? And by the way, if you really want to spend the effort to implement OSLC this will result again in standardized local connectivity between systems only. Why? Because the instance graph is not really decoupled from the authoring systems but only the description language is.

Other approaches try to deal with increasing data complexity by introducing standardized object models. In 2019, NIST researchers Thomas Hedberg et.al. [11] proposed

an architecture for making connections across enterprises based on the Lifecycle Information Framework and Technology (LIFT)concept [12]. The definition of the Global Handle Registry, Intermediate Handle Registry, and Local Handle Services are the work of [35] and standardized in accordance with RFC 3650 [36], RFC 3651 [37], and RFC 3652 [38]

The basic idea is to store data by means of a global object model in which the nodes and the edges of the EKG are standardized. This approach would work well based on automatic generation techniques. Our experience in our customer projects has shown us, however, that this approach from an academic point of view makes a lot of sense but confronted with the complexity of data in large companies we have experienced it is almost impossible to achieve. Tweaking the data in such a way that it fits into a rigid standard model to reduce complexity costs a lot of time and effort and is eventually way more complex than accepting the world in its complexity and provide solutions based on intelligent analytics.

Automated Computation of the Graph

Regarding the predicted size of the graph its automatic computation from existing company data is inevitable. However, this process cannot be fully automatic. There are a number of reasons, the most important ones to me:

  1. the creation of the semantic graph model follows an interest profile, a machine cannot decide what is interesting (at least not yet)
  2. Intellectual expertise about the data processing procedures resides in the heads of experts and needs to be turned into analytic rules first.
  3. Deciding which data to consider for the specific case or what kind of information is relevant to the user remains a human decision, etc.

Graph-Based Product Lifecycle Solutions

Deloitte [15] claim Semantic AI is the key to collective intelligence. They see a symbiotic connection between EKGs and AI technology that is needed to establish powerful AI-based decision making. This is what I have argued about above when I talked about the data preparation problem. EKGs provide context, they are the long-term memory of AI. However, apart from AI-based decision making there are a quite number of challenges that can be solved along the product life cycle using Product Graphs (PG), a specific type of EKG connecting all relevant artefacts for a traceability solution along the product lifecycle. In the remainder I will introduce a number of applications that solve particular aspects of traceability without claim of completeness. The important point is, though, that these applications represent tasks already being grappled with, yet with a significant amount of manual effort. This is consistent with the findings of Thomas Hedberg et.al. [11] citing a study of another NIST researcher, Gary Anderson [16] who considers the "Economic Impact of Technology Infrastructure for Smart Manufacturing" pointing at seamless transmission of digital information as a major driver.

For design through production portion of the product lifecycle, one study found that simply transitioning from paper-based processes to (digital)model-based processes would achieve an approximate 75 percent reduction in cycle-time [2]. Further, enhanced sensing and monitoring, seamless transmission of digital information, and advances in analyzing data and trendsPage 2JCISE-19-1116, Hedberg would save manufacturers $30 Billion annually.

Traceability Solution (transparency, replicability, confirmability)

  • Impact Analysis relates to changes and their impact on subsequent processes
  • Coverage Analysis investigates how requirements are met in subsequent processes
  • Project Status Analysis uses the PG's traceability function to provide transparency on status and progress of a development project
  • Reuse of Product Components provides trace links allowing aggregations of requirements together with their connected operational artefacts to be reused for new products
  • Test Optimization allows assessment of tests based on linked requirements, source code, test cases and test results. In case of changes and errors it can be decided which tests need to be performed and what redundant tests can be avoided.
  • Certification allows to prove whether all requirements are met
  • Reengineering makes use of trace links to capture part of what has been learned through reverse engineering of a given system
  • Risk Management secures knowledge by linking of components and reduces risk in case a team member with substantial knowledge leaves the project

Besides traceability solutions covering predominantly engineering tasks EKGs may very effectively be used to bridge the gap between engineering and manufacturing. In manufacturing, however, the focus is not so much on traceability but rather on production planning. Interesting research is carried out by Fraunhofer researchers Oliver Riedel [17] "Trends towards Engineering Excellence" and Rainer Stark [18] who emphasizes on

a complete digitization of product development and planning processes – so that you as a manufacturer or user can consider the later phases of your product's lifecycle at an early stage.

Besides engineering and manufacturing other subprocesses such as product portfolio planning, aftersales and procurement may significantly profit from the linked value chain.

Configurable Solutions

The data signature of a company is individual, thus, the analytics process that computes the graph must be individualized. At the same time the semantic graph model must be easily extensible and modifiable while it is running. This requires a low-code configurable solution. Both the semantic graph model and the analytics that computes and updates the graph based on the model are specified so that the graph can be computed bottom-up from the given data. Configuration is also important because customers want to have productive solutions in weeks, not in years.

Incremental Set Up of the Graph: Graph cases relate to usage scenarios in which a set of user roles include certain information requirements that need to be met. Typically, the data to be connected for an initial case are stored in a defined number of relevant data sources. As the generation of the graph happens according to an extensible semantic data model further use cases can be added by extending the scope of the semantic model as well as the analytics. Such an incremental design of the graph guarantees quick benefits for the customer and thus acceptance by the users. In this way, relevant company knowledge can be continuously expanded to cover additional processes and lifecycles.

Summary & Conclusions

In this article I have addressed the topic of logical connectivity applied to the product lifecycle by claiming that graphs – acting as a lingua franca – are the natural representation to capture corporate knowledge. I have compared existing technologies, solutions, and approaches along the critical technical features connectivity and consistency. Of course, the solution categories cannot be separated accurately in reality as we find hybrid forms. Graphs, for instance, could be very well hosted and run in a cloud. This would make life much easier, e.g., from the point of view of operations. However, an EKG is a logical representation of corporate knowledge, not a storage space. The graph exists as a separate entity, decoupled from the authoring systems, but reconnected to the original data. It acts as a powerful basis for inferencing processes which is important for the automation of business processes as one of the major drivers of innovation. Using such graphs, it is easy to model the digital thread connecting the entire product lifecycle and enable traceability solutions of any kind. Moreover, the employment of Knowledge Graphs go along with a methodology that allows quick realization of use cases due to configurable solutions. This is particularly important regarding the analytics process that needs to be modelled to compute and update the graph instance from the connected authoring systems and databases. Thus, I would like to conclude with a citation I like by Oleg Shilovitsky [19]:

Graphs are beautiful and fascinating. You can think that all successful businesses of 21st century were built on top of network effect. Graph databases hold lot of unrealized potential in the next few years as companies will be moving towards better analysis and data exploration.

Disclaimer

I’m co-founder and CEO of CONWEAVER. With our Linksphere Low Code Big Graph Platform we empower our customers to quickly and flexibly configure Knowledge Graph solutions across different data sources. My opinion can be unintentionally biased.

Bibliography

  1. https://scholar.google.de/scholar?q=Primary+Mathematics+%E2%80%93+Teaching+For+Understanding&hl=de&as_sdt=0&as_vis=1&oi=scholart
  2. https://allegrograph.com/gartner-knowledge-graphs-emerge-in-the-hypecycle/
  3. https://www.topbots.com/data-preparation-for-machine-learning/
  4. https://www.industryweek.com/technology-and-iiot/article/22028709/taking-your-ai-projects-from-pilot-to-production
  5. https://10times.com/aml-forum, Germany, September 12 2019
  6. Frank, Allen, Re-imagining Customer Engagement with AI: An Introduction to AI, https://www.capgemini.com/wp-content/uploads/2018/07/Re-imagining-Customer-Engagement-with-AI.pdf
  7. https://www.linkedin.com/pulse/oracle-cloud-plm-unified-data-model-frequent-less-oleg-shilovitsky/?trackingId=T2uD%2F6PkYpu1hBfyvrv%2Beg%3D%3D
  8. PLM - A New Hope, https://blogs.oracle.com/scm/plm-a-new-hope
  9. http://joebarkai.com/i-dont-do-plm/
  10. Göbel, Jens., http://newsletter.prostep.com/index.php?id=1816&L=1,%202019/03
  11. Hedberg, Thomas D., Bajaj Manas, Camelio, Jaime A., "Using graphs to link data across the product lifecycle for enabling smart manufacturing digital threads",     Journal of Computing and Information Science in Engineering on 19 SEP 2019, available online: https://asmedigitalcollection.asme.org/computingengineering/article-abstract/20/1/011011/975685/Using-Graphs-to-Link-Data-Across-the-Product?redirectedFrom=fulltext
  12. https://www.dweezilzappa.com/posts/1963441-the-crux-of-the-biscuit-is-the-apostrophe
  13. https://groups.google.com/forum/#!topic/alt.fan.frank-zappa/8j5OKC5r_t4
  14. Figay, Nicolas, The emerging landscape for distributed knowledge, ontology, semantic web, knowledge base, graph based technologies and standards, https://www.linkedin.com/pulse/emerging-landscape-distributed-knowledge-ontology-semantic-figay/
  15. https://www2.deloitte.com/de/de/pages/operations/articles/enterprise-knowledge-graphs.html
  16. Anderson, Gary., 2016 “The Economic Impact of Technology Infrastructure for Smart Manufacturing,” Technical Report, National Institute of Standards and Technology, Gaithersburg, MD
  17. Riedl, Oliver, "Trends towards Engineering Excellence", Talk at CONWEAVER's Linked Data Day, 2019/11/11, Darmstadt, https://www.pressebox.de/pressemitteilung/conweaver-gmbh/Linked-Data-Referent-Prof-Oliver-Riedel-ueber-Trends-towards-Engineering-Excellence/boxid/975637
  18. Stark, Rainer, https://www.ipk.fraunhofer.de/en/expertise/product-development.html
  19. http://beyondplm.com/2018/04/18/cofes-2018-plm-graph-databases-round-table/

Ihr Ansprechpartner

Contact

Termin vereinbarenMake an appointment

Contact

Ihr Ansprechpartner

Further Information
Weitere Informationen