Statistical Relational Artificial Intelligence. Luc De Raedt
Читать онлайн книгу.3.1 Education modeling In an educational domain (see, e.g., Fig. 1.2), the individuals may be students, courses, tests, or particular questions on tests. We want to condition on the observations of what tests the students took, what answers they provided, and any other pertinent information to predict what they understand.
Example 3.2 Medical diagnosis In a medical diagnosis system (see, e.g., Fig. 1.3), the individuals may be patients, treatments, drugs, tests, particular lumps on a person’s body, etc. A system can condition on a patient’s electronic health record to predict the outcome of some treatment. Having a relational model is sensible since electronic health records are not summaries, but contain a history of physical observations, treatments, tests, etc. They differ greatly in the amount of detail they contain per person.
In both of these examples the models are not about a particular patient or student, but are models that can be applied to any patient or student. The set of observations is not fixed, as for instance an electronic health record may contain an unbounded number of observations and test results about multiple lumps that may have appeared (and disappeared) on a patient.
The main property relational models exploited is that of exchangeability: those individuals about which we have the same information should be treated identically. This is the idea behind (universally quantified) variables; some statements are true for all individuals. In terms of probabilistic models, this means we can exchange the constants in any grounding, and still get the same probabilities for any proposition. This implies that before we know anything particular about any of the individuals, they all should share their probabilistic parameters. It also provides a form of symmetry that can be exploited for representations, inference and learning. De Finetti [1974] shows how distributions on exchangeable random variables can be represented in terms of directed models with latent variables. Note that exchangeability of individuals corresponds to exchangeability of random variables only for properties (relations on single individuals), but not for properties of multiple individuals (but see [Kallenberg, 2005] for results in such situations).
Over the years, a multitude of different languages and formalisms for probabilistic relational modeling have been devised. However, there are also some general design principles underlying this “alphabet soup,” which we will discuss first.
3.1 A GENERAL VIEW: PARAMETERIZED PROBABILISTIC MODELS
Relational probabilistic models are typically defined in terms of parameterized random variables [Poole, 2003], which are often drawn in terms of plates [Buntine, 1994, Jordan, 2010]. A parameterized random variable corresponds to a predicate or a function symbol in logic.
We use a first-order alphabet consisting of logical variables, constants, function symbols, and predicate symbols. We assume that the logical variables are typed, where the domain of the type (i.e., the set of individuals of the type), is called the population. Recall from Section 2.2 that a term is either a logical variable, a constant or of the form f(t1, …, tk), where f is a function symbol and each ti is a term. We treat constants as function symbols with no arguments (and so are not treated specially). Each function has a range, which is a set of values. In the following we treat relations as Boolean functions (with range {true, false}).
A parameterized random variable (PRV) is of the form f(t1, …, tk) where each ti is a term and f is a function (or predicate) symbol. A random variable is a parameterized random variable which does not contain a logical variable. An atom is of the form f(t1, …, tk) = v where v is in the range of f. When the range of f is {True, False}, (i.e., when f is a predicate symbol), we write f(t1, …, tk) = True as f(t1, …, tk), and f(t1, …, tk) = False as ¬f(t1, …, tk). We can build a formula from relations using the standard logical connectives. The grounding of a PRV is the set of random variables obtained by uniformly replacing each logical variable by each individual in the population corresponding to the type of the logical variable.
A lifted graphical model, also-called a template-based model [Koller and Friedman, 2009], is a graphical model with parameterized random variables as nodes, and a set of factors among the parameterized random variables, called parameterized factor. A lifted model, together with an assignment of a population to each logical variable means its grounding: the graphical model where the set of random variables is the set of groundings of the PRVs in the model, and the factor in the ground model is the grounding of the corresponding factor of the lifted model. The details differ in the various representation languages, because what is allowed as factors varies, but the issues can be highlighted by a few examples.
Example 3.3 Consider a model of the performance of students in courses. With such a model, we could, for example, condition on the data of Fig. 1.2 to predict how students s3 and s4 will perform in course c4. Suppose we have the types: student and course. The population of student is the set of all students, and the population of course is the set of all courses. The parameterized random variable grade(S, C) could denote the grade of student S in course C. For a particular student sam, and course cs101, the instance grade(sam, cs101) is a random variable denoting the grade of Sam in the course cs101. The range of grade could be the set of possible grades, say {a, b, c, d, f}. Similarly, int(S) could be a parameterized random variable denoting the intelligence of student S. The PRV diff(C) could represent the difficulty of course c.
If there are n students and m courses, the grounding contains nm random variables that are instances of grade(S, C), n instances of int(S) and m instance of diff(C). Thus, there are nm + n + m random variables in the grounding.
Figure 3.1 gives a plate representation of a directed model to predict the grades of students in courses. In this figure, s is a logical variable that denotes a student and c is a logical variable that denotes a course. Note that this figure redundantly includes the logical variables in the plates as well as arguments to the parameterized random variables. Such parameterized models represent their grounding, where there is an instance of each random variable for each assignment of an individual to a logical variable. The factors of the ground network are the groundings of the corresponding factors of the relational model. Figure 3.2 shows such a grounding where there are three students Sam, Chris, and Kim and two courses (c1 and c2).
Note the conditional independence assumptions in this example (derived from the independence inherent in the underlying Bayesian network): the intelligence of the students are independent of each other given no observations. The difficulty of the courses are independent of each other, given no observations. The grades are independent given the intelligence and the difficulty. Given no observations, a pair of grades that share a student or a course are dependent on each other, as they have a common parent, but a pair of grades about different students and courses are independent. Given observations on grades, the intelligence variables and the difficulty variables can become interdependent.
Figure 3.1: Plate representation of the grades model.