In my previous post, I described, in a bit more detail, how to perform graph analysis in the case of Node Ranking. Basically, the key tool that you can use in Oracle Analytics is Data Flows. Graph Analytics step in Data Flows enables users to perform four graph analytics operations. Besides Node Ranking these are Sub Graph, Clusters and Shortest Path.
For easier understanding and visualisation we are using the following Dolphins dataset.
Sub Graph operations find all nodes within a specified number of hops of a given node. Using other words, Sub Graph finds all nodes, neighbours of a given node, if we specify the number of hops is one. If the number of hops is two, Sub Graph returns all neighbouring nodes of a given node and all neighbours of found neighbours, and so on.
This is for example useful in marketing when we can find who are friends of a customer who has bought a specific product. We might assume that customer presented that product to his friends and is also possible that those friends would talk about it to their friends, who are two hops away.
To find out Sub Graphs of a given node are, Sub Graph operation should be selected in a data flow:
Basically, there are only three mandatory parameters required. Source Vertex is a given node. In this case, this is Dolphin 58 from source columns Dolphin1. Dolphin 2 is the Destination Column. And Number of Hops defines how many hops are considered in a Sub Graph.
In our case, we store only these three columns in the data set which is used to visualise a subgraph in a graph.
Clusters is an operation that finds clusters in a graph. In this case, a cluster is defined with connected nodes. Parts of a graph that have no connection with other parts of the graph are considered clusters.
Clusters as a graph operation is relatively easy to define.
As you can see, there are only two parameters, Source Column and Destination Column. In our case, these are DOLPHIN1 and DOLPHIN2.
The operator outputs are ClusterId (randomly created id) and Node_Vertex that belongs to that specific cluster.
The results of graph clusters identification are rather visual:
The shortest path is one of the most known graph analytics problems. The idea is to optimise travel from point A to point B taking into account the cost of the travel, usually defined as weight.
In our case, weight is equal to one, so we are focused only on finding the shortest connection between two dolphins. Otherwise, weight can be the distance between two towns, cost of travel between to airports, ...
As already indicated, the Shortest Path operation requires three parameters:
- starting node, Source Vertex, which is in source column DOLPHIN1 and has value 61;
- end node, Destination Vertex, which is in source column DOLPHIN2 and has value 14;
- Weight Column, which in this case is always equal to 1.
Once the operation is executed, a path between starting and end node will be stored in a new dataset. However, each path can only be stored as a series of steps. These steps are stored as separate rows in a dataset. Each step of the path has a source and destination which then forms a series of steps: A->B, B->D, D-E, for the path between A and E. In this case, three rows will define that path, with sequences 1 through 3 for each of the steps respectively.
The output for this operation are the following attributes:
- Path_Sequence has a value of "Y" if the specific step in the path is part of the shortest path.
- The source is a source column for the specific step in the path.
- The destination is a destination column of the specific step in the path.
- A step is a step number in the path.
If the graph is filtered on the value Y in Path_Sequence attribute, Shortest Path is filtered and displayed:
In this short series on Working with Graphs in Oracle Analytics, I think I was able to present some of the graph analytics functionality built in Oracle Analytics.
It is not much, one would say, especially if we compare this feature set with Oracle Graph analytics available with Oracle Databases. Yes, this is probably true, however, is better than nothing and we have been told this is going to evolve.
What is important is that all these analytics are now available to business users and analysts through the end-user tool Oracle Analytics, which is very similar to what we can observe with Machine Learning.
Bottom line is that for some business problems, we as business users are able to try to resolve them by ourselves, not waiting for IT developers and data scientists to become available to work on our problems. I think there is a lot of value in it.
Previous posts in the series: