Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
January 25, 2023 07:58 pm GMT

How does AI work? Part 1

Part 1 A gentle introduction

Brain

Image credit: Seanbatty

When most people hear the term Artificial Intelligence they think of the Terminator movies or the general notion of machines that can think the way a biological brain does. Its safe to say that we dont need to worry about robots enslaving humankind for the foreseeable future but chances are youre still curious about what exactly is AI used for in real life and how it works. Some of the foundations of AI include:

  • Search problems Used in games, among other areas.
  • Constraint satisfaction Some examples could be scheduling a universitys offerings for all teachers, courses, rooms, and equipment or designing a factory floor layout.
  • Logic and reasoning Planning the sequence of actions that will achieve a goal such as a plan to take cargo shipments to their destinations in an efficient way.
  • Inference Using probability to answer questions given the available evidence.

Building on these foundations we can create systems and applications for computer vision, natural language processing, voice user interfaces, and self-driving cars. Many of these domains have something in common: pattern recognition through time. And we need to do this in noisy environments, meaning we dont have access to the true information were after (such as the word someone meant to say) but instead we can only make inferences from the data available to our sensors (such as an audio signal).

For this type of problems we want to identify some basic units that combine in sequences to form larger units. These can be sounds that form words, which in turn form sentences. Or image sequences that combine to depict sign language words and sentences. It is this last application that well use to dive deeper into AI. Well see how we can go about building an American Sign Language recognizer. But first, well go through a simpler example.

A very widely used technique for identifying signals is something called a Hidden Markov Model or HMM.

There are different ways to represent HMMs so lets start with a simple one. Lets say youre spending several days inside a lab with no windows, working really hard and youd like to know whether its raining. Not being able to look outside, the only evidence you have access to is whether the advisor coming in every day has an umbrella with her. Lets designate each day as a state, which could be rainy or not. Similarly, well say the umbrella is the evidence, which is the presence or absence of an umbrella.

Well use arrows to show when a node in our diagram influences another. For example, whether it rains today tells us something about the likelihood of getting rain tomorrow and rain influences the probability that the advisor will bring an umbrella that day. So for days 1, 2, and 3 we have:

HHM

Figure 1. HMM for rainy days

State zero at the far left side will be useful to us for bookkeeping purposes later so dont worry about it for now. Remember when we said our observations were noisy? The fact that it rained today doesnt guarantee that it will tomorrow and the fact that it rained doesnt guarantee that the advisor will bring an umbrella either. She can forget it at home or she could bring it on a day when it turns out it doesnt rain after all. Thats where probabilities come in.

Lets now look at the probabilities. Were interested in the probability of:

  • Rain on a given day or time ttt,
  • Rain on the next time t+1t+1t+1, and
  • Rain given that it rained the previous day.

On the table below we can see how the second row tells us that the probability that it doesnt rain ( r-rr) tomorrow given that it rained ( +r+r+r) today is 0.3, or 30%.

RtR_tRt Rt+1R_{t+1}Rt+1 P(Rt+1Rt)P(R_{t+1} | R_t)P(Rt+1Rt)
+r+r+r +r+r+r 0.70.70.7
+r+r+r r-rr 0.30.30.3
r-rr +r+r+r 0.30.30.3
r-rr r-rr 0.70.70.7
Figure 2. Probability distribution for Rain given the previous days conditions

Similarly, we can have the following probability distribution for the advisor bringing an umbrella given the weather conditions on day ttt. For example, theres a 90% probability of seeing the advisor bring an umbrella when it rained and a 10% probability of her not bringing an umbrella on a rainy day.

RtR_tRt UtU_tUt P(UtRt)P(U_t | R_t)P(UtRt)
+r+r+r +u+u+u 0.90.90.9
+r+r+r u-uu 0.10.10.1
r-rr +u+u+u 0.20.20.2
r-rr u-uu 0.80.80.8
Figure 3. Probability distribution for Umbrella given Rain

Notice how we dont really know whether it rains or not but we can make intelligent inferences based on the available evidence, namely the presence or absence of an umbrella. Now, From the arrows in figure 1 we can see that variables are not directly affected by all the other variables, only by the ones at the other end of an incoming arrow. This is important because well think of the probability of an event as something that happens given an event on which it depends. For example, the probability that theres an umbrella on day 2 given that it rained on day 2, or the probability that it rained on day 2 given that it did not rain on day 1.

As each day goes by and we see if there is an umbrella that day, we alternate between two events: incorporating new evidence into our knowledge and accounting for the passage of time.

Lets say we observed an umbrella on day 1 and on day 2. Whats the probability that it rained on day 2? Before you keep on reading try to think about what information you would need to make that calculation.

Lets settle on a nomenclature to use. Well call the rain nodes in Figure 1 our state variables XXX and the umbrella nodes our evidence variables eee. They will all be indexed by day or time frame ttt. Also, well refer to our belief about variable XXX at time ttt as B(Xt)B(X_t)B(Xt) after seeing that days evidence. Our belief before seeing the days evidence will be B(Xt)B'(X_t)B(Xt). The day following ttt will of course be t+1t+1t+1. To wrap things up lets clarify what we mean by a belief: the probability of an event given the available evidence.

This way our belief about variable XXX at time t+1t+1t+1 is the probability of XXX at time t+1t+1t+1 given evidence 111 through ttt. This is expressed like this:

B(Xt+1)=P(Xt+1e1:t)B'(X_{t+1})=P(X_{t+1} | e_{1:t})B(Xt+1)=P(Xt+1e1:t)
Belief before seeing the evidence

Feel free to zoom in or out on your browser (command +, command -) to see the formulas comfortably.

We'll be using Bayes Theorem for calculating the probability of an event given that another event happened:

1) P(AB)=P(BA)P(A)P(B)P(A|B)={P(B|A)P(A) \over P(B)}P(AB)=P(B)P(BA)P(A)

Where did that come from? It can be derived from the conditional probability that tells us in how many cases do both events AAA and BBB happen out of the cases where BBB happens:

2) P(AB)=P(AB)P(B)P(A|B)={P(A \cap B) \over P(B)}P(AB)=P(B)P(AB)

Similarly, P(BA)=P(AB)P(A)P(B|A)={P(A \cap B) \over P(A)}P(BA)=P(A)P(AB)

which means

3) P(AB)=P(BA)P(A)P(A \cap B)=P(B|A)P(A)P(AB)=P(BA)P(A)

and substituting in the expression for conditional probability yields Bayes' Theorem:

P(AB)=P(BA)P(A)P(B)P(A|B)={P(B|A)P(A) \over P(B)}P(AB)=P(B)P(BA)P(A)

All the operations going forward are based on Bayes Theorem to derive the probability of an event given another and go down the path that leads us to the variables were interested in. Figuring out this path takes some intuition but we can also just try all the ways to play with it until we get to where we want to go.

After the evidence for time frame t+1t+1t+1 comes in, we want the probability of XXX including the new evidence, that is, the evidence at t+1t+1t+1 given all the previous evidence.

So from Bayes' Theorem (2) now we have:

P(Xt+1(et+1e1:t))=P(Xt+1,(et+1e1:t))P(et+1e1:t)P(X_{t+1}|(e_{t+1}|e_{1:t}))={P(X_{t+1},(e_{t+1}|e_{1:t})) \over P(e_{t+1}|e_{1:t})}P(Xt+1(et+1e1:t))=P(et+1e1:t)P(Xt+1,(et+1e1:t))
Probability of the variable after the evidence comes in

At this point we can see that nothing on the denominator depends on the XXX variable so we can get rid of it with the understanding that this will no longer be an equality but instead the term on the left will be proportional to the term on the right and thats what the new \propto symbol means:

P(Xt+1(et+1e1:t))P(Xt+1,(et+1e1:t))P(X_{t+1}|(e_{t+1}|e_{1:t})) \propto P(X_{t+1},(e_{t+1}|e_{1:t}))P(Xt+1(et+1e1:t))P(Xt+1,(et+1e1:t))
P(Xt+1(et+1e1:t))P(Xt+1,et+1e1:t)P(X_{t+1}|(e_{t+1}|e_{1:t})) \propto P(X_{t+1},e_{t+1}|e_{1:t})P(Xt+1(et+1e1:t))P(Xt+1,et+1e1:t)

And by Bayes' Theorem (3):

P(Xt+1(et+1e1:t))P(((et+1Xt+1),Xt+1)e1:t)P(X_{t+1}|(e_{t+1}|e_{1:t})) \propto P(((e_{t+1}|X_{t+1}),X_{t+1})|e_{1:t})P(Xt+1(et+1e1:t))P(((et+1Xt+1),Xt+1)e1:t)
P(Xt+1(et+1e1:t))P(et+1Xt+1,e1:t)P(Xt+1e1:t)P(X_{t+1}|(e_{t+1}|e_{1:t})) \propto P(e_{t+1}|X_{t+1},e_{1:t})P(X_{t+1}|e_{1:t})P(Xt+1(et+1e1:t))P(et+1Xt+1,e1:t)P(Xt+1e1:t)

Now take a look at Figure 1 again. All the previous evidence is independent of the new evidence given the state variable. That is, the old evidence doesnt affect the new one except indirectly through the XXX variable at t+1t+1t+1. In other words:

e1:tet+1Xt+1e_{1:t} e_{t+1}|X_{t+1}e1:tet+1Xt+1
Conditional independence. Note that the independence symbol has higher precedence than the conditional symbol |

That means we can eliminate the old evidence from the expression after the conditional bar. That leaves us with our final formula for calculating the probability that its raining today given all the evidence now available:

P(Xt+1e1:t+1)P(et+1Xt+1)P(Xt+1e1:t)P(X_{t+1}|e_{1:t+1}) \propto P(e_{t+1}|X_{t+1})P(X_{t+1}|e_{1:t})P(Xt+1e1:t+1)P(et+1Xt+1)P(Xt+1e1:t)

Finally were ready to calculate the probability distribution over the Rain variable at day 2 when all we have is:

  • The observation that the advisor brought an umbrella on days 1 and 2.
  • The probability distribution table for Rain given Rain the day before.
  • The probability distribution table for Umbrella given Rain.

Since at t=0t=0t=0 we have no evidence yet, lets assume that theres a 50% chance of rain on the first day. From then on, we can use our formula to derive the probabilities as we gather evidence every day.

New evidence

Pasage of time

So, rounding to three decimal places we find that if we saw the advisor bring an umbrella on days 1 and 2, the probability that its raining on day 2 is 0.8830.8830.883 or 88.3%88.3\%88.3%. That implies a probability of 11.7%11.7\%11.7% that its not raining on day 2.

Prediction

Thats it! This wraps up our introduction to applications of Artificial Intelligence. Next, well use a slightly more complex model to use probabilistic inference to recognize American Sign Language given video frame data.


Original Link: https://dev.to/santisbon/how-does-ai-work-part-1-232i

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To