This is an older document, but has what appears to be some pretty foundational material in it. The author, Judea Pearl[1], also wrote a famous book titled Causality[2].
I like my tech and math with history. Knowing where the ideas came from and what previous, wrong ideas they had to beat out is valuable in understanding them. (Plus it reminds you that maybe your new ideas could be the next thing.) But the full book (this is an excerpt) is more caught up in its time that I would have liked - a lot of it seems to be arguing against critics. You probably don't really need such an exhaustive treatment.
In contrast, Norvig's AIMA book has a chapter on the subject (bayes nets) which is confident and compact. Start there.
The ideas contained in this book and Causality have been a huge influence in my decision of what to focus on learning as a data scientist. Very much recommend it even if you feel like the book might be too dense for you. It definitely was for me, but I still got a lot out of it.
I took a step back and read Think Bayes by Downey and watched some of his youtube videos. Then Introduction to Bayesian Statistics by Bolstad is great once you're reading to deal with probabilities. Now I'm reading
* Building probabilistic graphical models with Python (Karkera)
* Mastering probabilistic graphical models using Python (Ankan)
* Probabilistic Graphical Models Principles and techniques (Koller)
Having skimmed (just started reading them) I'm very excited to continue to delve into them.
On the contrary I am a bit shocked that it is only 40 pages long (or is this a fragment of it)? This is on my to-read list since it is always getting referenced one way or another in the LessWrong Wiki, but I was putting off reading it because I figured it would be a bit dense.
[1]: http://bayes.cs.ucla.edu/home.htm
[2]: http://bayes.cs.ucla.edu/BOOK-2K/