Source: healthcareitnews.com
I am a skeptic of machine learning. There, I’ve said it. I say this not because I don’t think that machine learning is a poor technology – it’s actually quite powerful for what it does – but because machine-learning by itself is only half a solution.
To explain this (and the relationship that graphs have to machine learning and AI), it’s worth spending a bit of time exploring what exactly machine learning does, how it works. Machine learning isn’t actually one particular algorithm or piece of software, but rather the use of statistical algorithms to analyze large amounts of data and from that construct a model that can, at a minimum, classify the data consistently. If it’s done right, the reasoning goes, it should then be possible to use that model to classify new information so that it’s consistent with what’s already known.
Many such systems make use of clustering algorithms – they take a look at data as vectors that can be described in an n-dimensional space. That is to say, there are n different facets that describe a particular thing, such as a thing’s color, shape (morphology), size, texture, and so forth. Some of these attributes can be identified by a single binary (does the thing have a tail or not), but in most cases the attributes usually range along a spectrum, such as “does the thing have an an exclusively protein-based diet (an obligate carnivore) or does its does consist of a certain percentage of grains or other plants?”. In either case, this means that it is possible to use the attribute as a means to create a number between zero and one (what mathematicians would refer to as a normal orthogonal vector).
Orthogonality is an interesting concept. In mathematics, two vectors are considered orthogonal if there exists some coordinate system in which you cannot express any information about one vector using the other. For instance, if two vectors are at right angles to one another, then there is one coordinate system where one vector aligns with the x-axis and the other with the y-axis. I cannot express any part of the length of a vector along the y axis by multiplying the length of the vector on the x-axis. In this case they are independent of one another.
This independence is important. Mathematically, there is no correlation between the two vectors – they represent different things, and changing one vector tells me nothing about any other vector. When vectors are not orthogonal, one bleeds a bit (or more than a bit) into another. One two vectors are parallel to one another, they are fully correlated – one vector can be expressed as a multiple of the other. A vector in two dimensions can always be expressed as the “sum” of two orthogonal vectors, a vector in three dimensions, can always be expressed as the “sum” of three orthogonal vectors and so forth.
If you can express a thing as a vector consisting of weighted values, this creates a space where related things will generally be near one another in an n-dimensional space. Cats, dogs, and bears are all carnivores, so in a model describing animals, they will tend to be clustered in a different group than rabbits, voles, and squirrels based upon their dietary habits. At the same time cats,, dogs and bears will each tend to cluster in different groups based upon size as even a small adult bear will always be larger than the largest cat and almost all dogs. In a two dimensional space, it becomes possible to carve out a region where you have large carnivores, medium-sized carnivores, small carnivores, large herbivores and so forth.
Machine learning (at its simplest) would recognize that when you have a large carnivore, given a minimal dataset, you’re likely to classify that as a bear, because based upon the two vectors size and diet every time you are at the upper end of the vectors for those two values, everything you’ve already seen (your training set) is a bear, while no vectors outside of this range are classified in this way.
A predictive model with only two independent vectors is going to be pretty useless as a classifier for more than a small set of items. A fox and a dog will be indistinguishable in this model, and for that matter, a small dog such as a Shitsu vs. a Maine Coon cat will confuse the heck out of such a classifier. On the flip side, the more variables that you add, the harder it is to ensure orthogonality, and the more difficult it then becomes determine what exactly is the determining factor(s) for classification, and consequently increasing the chances of misclassification. A panda bear is, anatomically and genetically, a bear. Yet because of a chance genetic mutation it is only able to reasonably digest bamboo, making it a herbivore.
You’d need to go to a very fine-grained classifier, one capable of identifying genomic structures, to identify a panda as a bear. The problem here is not in the mathematics but in the categorization itself. Categorizations are ultimately linguistic structures. Normalization functions are themselves arbitrary, and how you normalize will ultimately impact the kind of clustering that forms. When the number of dimensions in the model (even assuming that they are independent, which gets harder to determine with more variables) gets too large, then the size of hulls for clustering becomes too small, and interpreting what those hulls actually significant become too complex.
This is one reason that I’m always dubious when I hear about machine learning models that have thousands or even millions of dimensions. As with attempting to do linear regressions on curves, there are typically only a handful of parameters that typically drive most of the significant curve fitting, which is ultimately just looking for adequate clustering to identify meaningful patterns – and typically once these patterns are identified, then they are encoded and indexed.
Facial recognition, for instance, is considered a branch of machine learning, but for the most part it works because human faces exist within a skeletal structure that limits the variations of light and dark patterns of the face. This makes it easy to identify the ratios involved between eyes, nose, and mouth, chin and cheekbones, hairlines and other clues, and from that reduce this information to a graph in which the edges reflect relative distances between those parts. This can, in turn, be hashed as a unique number, in essence encoding a face as a graph in a database. Note this pattern. Because the geometry is consistent, rotating a set of vectors to present a consistent pattern is relatively simple (especially for modern GPUs).
Facial recognition then works primarily due to the ability to hash (and consequently compare) graphs in databases. This is the same way that most biometric scans work, taking a large enough sample of datapoints from unique images to encode ratios, then using the corresponding key to retrieve previously encoded graphs. Significantly, there’s usually very little actual classification going on here, save perhaps in using courser meshes to reduce the overall dataset being queried. Indeed, the real speed ultimately is a function of indexing.
This is where the world of machine learning collides with that of graphs. I’m going to make an assertion here, one that might get me into trouble with some readers. Right now there’s a lot of argument about the benefits and drawbacks of property graphs vs. knowledge graphs. I contend that this argument is moot – it’s a discussion about optimization strategies, and the sooner that we get past that argument, the sooner that graphs will make their way into the mainstream.
Ultimately, we need to recognize that the principal value of a graph is to index information so that it does not need to be recalculated. One way to do this is to use machine learning to classify, and semantics to bind that classification to the corresponding resource (as well as to the classifier as an additional resource). If I have a phrase that describes a drink as being nutty or fruity, then these should be identified as classifications that apply to drinks (specifically to coffees, teas or wines). If I come across flavors such as hazelnut, cashew or almond, then these should be correlated with nuttiness, and again stored in a semantic graph.
The reason for this is simple – machine learning without memory is pointless and expensive. Machine learning is fast facing a crisis in that it requires a lot of cycles to train, classify and report. Tie machine learning into a knowledge graph, and you don’t have to relearn all the time, and you can also reduce the overall computational costs dramatically. Furthermore, you can make use of inferencing, which are rules that can make use of generalization and faceting in ways that are difficult to pull off in a relational data system. Something is bear-like if it is large, has thick fur, does not have opposable thumbs, has a muzzle, is capable of extended bipedal movement and is omnivorous.
What’s more, the heuristic itself is a graph, and as such is a resource that can be referenced. This is something that most people fail to understand about both SPARQL and SHACL. They are each essentially syntactic sugar on top of graph templates. They can be analyzed, encoded and referenced. When a new resource is added into a graph, the ingestion process can and should run against such templates to see if they match, then insert or delete corresponding additional metadata as the data is folded in.
Additionally, one of those pieces of metadata may very well end up being an identifier for the heuristic itself, creating what’s often termed a reverse query. Reverse queries are significant because they make it possible to determine which family of classifiers was used to make decisions about how an entity is classified, and from that ascertain the reasons why a given entity was classified a certain way in the first place.
This gets back to one of the biggest challenges seen in both AI and machine learning – understanding why a given resource was classified. When you have potentially thousands of facets that may have potentially been responsible for a given classification, the ability to see causal chains can go a long way towards making such a classification system repeatable and determining whether the reason for a given classification was legitimate or an artifact of the data collection process. This is not something that AI by itself is very good at, because it’s a contextual problem. In effect, semantic graphs (and graphs in general) provide a way of making recommendations self-documenting, and hence making it easier to trust the results of AI algorithms.
One of the next major innovations that I see in graph technology is actually a mathematical change. Most graphs that exist right now can be thought of as collections of fixed vectors, entities connected by properties with fixed values. However, it is possible (especially when using property graphs) to create properties that are essentially parameterized over time (or other variables) or that may be passed as functional results from inbound edges. This is, in fact, an alternative approach to describing neural networks (both physical and artificial), and it has the effect of being able to make inferences based upon changing conditions over time.
This approach can be seen as one form of modeling everything from the likelihood of events happening given other events (Bayesian trees) or modeling complex cost-benefit relationships. This can be facilitated even today with some work, but the real value will come with standardization, as such graphs (especially when they are closed network circuits) can in fact act as trainable neuron circuits.
It is also likely that graphs will play a central role in Smart Contracts, “documents” that not only specify partners and conditions but also can update themselves transactional, can trigger events and can spawn other contracts and actions. These do not specifically fall within the mandate of “artificial intelligence” per se, but the impact that smart contracts play in business and society, in general, will be transformative at the very least.
It’s unlikely that this is the last chapter on graphs, either (though it is the last in the series about the State of the Graph). Graphs, ultimately, are about connections and context. How do things relate to one another? How are they connected? What do people know, and how do they know them. They underlie contracts and news, research and entertainment, history and how the future is shaped. Graphs promise a means of generating knowledge, creating new models, and even learning. They remind us that, even as forces try to push us apart, we are all ultimately only a few hops from one another in many, many ways.