Revise Conditional Probability and Bayes Theorem

Conditional probability of events A and B P(A\mid B) = \frac{P(A\cap B)}{P(B)}

Bayes Theorem

\begin{aligned}
P(A \mid B) &= \frac{P(A\cap B)}{P(B)}, \text{ if } P(B) \neq 0, \\
P(B \mid A) &= \frac{P(B\cap A)}{P(A)}, \text{ if } P(A) \neq 0, \\
\Rightarrow P(A\cap B) &=  P(A\mid B)\times P(B)=P(B\mid A)\times P(A), \\
\Rightarrow P(A \mid B) &=  \frac{P(B \mid A)  \times P(A)} {P(B)}, \text{ if } P(B) \neq 0.
\end{aligned}

A couple has two children, the older of which is a boy. What is the probability that they have two boys?

\begin{aligned}
P(B) &= Older one is boy \\
P(A) &= Two boys \\
 \\
 \\
P(A|B) &= \frac{P(B/A) P(A)}{P(B)} \\
&=\frac{P(older~one~is~boy~given~two~boys)  * P(having~two~boys)}{P(older~one~is~boy)} \\
Given Fact, \\
P(older~one~is~boy~given~two~boys) &= 1 \\
P(A|B) &= \frac{1 * 1/2*1/2}{1/2} \\
&= 1/2

\end{aligned}

A couple has two children, one of which is a boy. What is the probability that they have two boys?

\begin{aligned}
A &= having two boys\\
B &= one child is a boy\\
\\
\\
P(one~child~boy) &= 1-P(both~children~not~boys)\\
&= 1- 1/4 \\
&= 3/4\\

P(having~two~boys~given~one~child~boy) &= \frac{P(one~child~is~boy~gicen~both~children~boys) * P(both~children~boys)}{P(one~child~boy)} \\
&=\frac{1 * 1/4}{3/4} \\
&= 1/3\\
\end{aligned}

A family has two children. Given that one of the children is a boy, and that he was born on a Tuesday, what is the probability that both children are boys?

\begin{aligned}
P(BB) &= Probablity~of~having~both~boys~of~two~children \\
P(B) &= Probablity~of~having~one~boy~of~two~children\\
P(BT) &= probablity~of~having~a~boy~born~on~Tuesday \\
 \\
P(BB|BT) &= \frac{P(BT|BB) *P(BB)}{ P(BT)} \\
P(BT|BB) &=P(having~boy~born~on~Tuesday~given~both~children~are~boys) \\
P(BT|BB) &=P(Either~of~two~can~be~boy~~born~on~Tuesday~given~both~boys) \\
 \\
P(BT|BB) &= 1- P(boys~born~on~other~days)\\  
&=1-6/7*6/7 \\
&= 1-36/49 = 13/49\\
 \\
P(BB) &= 1/2*1/2 = 1/4 \\
 \\ 
P(BT) &= P(one~of~the~child~is~a~boy~born~on~Tuesday) \\
&=1-P((both~boys~born~on~non-Tuesdays) + both~girl) \\

&= 1- 13/14 * 13/14 \\
&= 1- 169 /196 = 27/196\\ 
\\

P(BB|BT) &= P(BT|BB) *P(BB)/ P(BT) \\
&= 13/49 * 1/4  /  27/196 \\
&=13/49 * 1/4 * 196 /27 \\
&= 13/27
\end{aligned}

Balls numbered 1 through 20 are placed in a bag. Three balls are drawn out of the bag without replacement. What is the probability that all the balls have odd numbers on them

Three balls drawn with out replacement

Total Probability = P(1st ball) P(2nd ball) P(3rd ball)

= 10/20 9/19 8/18

Revise Permutations and Combinations

In how many possible ways can you write 1800 as a product of 3 positive integers a, b, c ?

  • Prime Factorization of 1800 - 2^3 * 3^2 * 5^2

  • Distribute power of prime factors among digits a, b,c as asked. For example

    If we distribute 2^4 as (2,2,0) and, lets say a,b,c with powers of other prime factors 3 and 5 are (1, 9,25) ,then, with powers of 2 distributed as (2,2,0) (total 4), becomes

    (2^2 *1, 2^2 * 9, 2^0 * 25) = (4, 36,25 )= 4*36*25 = 1800

    Take another example

    Distribute as (0,3,1)

    (2^0 * 1, 2^3 * 9, 2^1 * 25) = (1, 72,25) = 1800

  2^4 can be distibuted like the followings among 3 numbers (a,b,c). Smilarly powers of other prime factors. Together the product is 1800
  4,0,0
  0,4,0
  0,0,4
  3,1,0
  3,0,1
  1,3,0
  1,0,3
  ......

  2,2,0
  2,0,2
  0,0,2
  2,1,1
  1,2,1
  1,1,2
  ...etc. etc

So problem becomes distributing power r (4 here for prime factor 2) among n (3 here) numbers.

  • Distribute r items among n people
D_(r,n)=\frac{(r+ (n-1))!}{r! (n-1)!}

Find this sequence of combinations D_(r,n) in Pascals triangle

Answer is D_{(3,3)} * D_{(2,3)} * D_{(2,3)} = 10 * 6 * 6

Ans: 360


How many ways letters in word ARMOUR can be arranged

n = 6

repeating letters = R -2 times, Combinations of them are indistinguishable so will have to reduce from total permutations

So net permutations = \frac{n!}{r!} = 720 / 2 = 360


Out of 7 consonants and 4 vowels, how many words of 3 consonants and 2 vowels can be formed ?

Number of ways 3 consonants and 2 vowels \ can be selected =

here combination is taken because order of letters not important. However order is important when you form words with these selected letters

=\frac{7!}{(7-3)! 3!} * \frac{4!}{(4-2)! 2!}  \\
=35 * 6\\
= 210 \\

Permutations of words with 3 consonants and 2 vowels

=5! = 120

So total combinations = 120 * 210 = \underline{25200}


How many 3-digit numbers can be formed from the digits 2, 3, 5, 6, 7 and 9, which are divisible by 5 and none of the digits is repeated ?

Divisible by 5 means last digit is 5.

For remaining 2 digits, total permutations without repeat =

= {^5\!P_2}

=\underline20


Evaluate permutation equation 59P3 ?

= 59 X 58 X 57

How many words can be formed from the letters of the word 'AFTER', so that the vowels never come together ?

For 5 letters, total permutations 5!

Consider A and E as one letter , ie. total 4 letters and then permutations would be 4!

AE can have two arrangements, AE and EA= 2!

So total permutations with AE and EA considered one letter = 2 * 4!

So total permutations where A and E doesn't come together

= 5! - 2*4! = 72

A box contains 4 red, 3 white and 2 blue balls. Three balls are drawn at random. Find out the number of ways of selecting the balls of different colors ?

Since there are three colors and three draws, different color should come in each draw, so total number ways three different colors can be draws in three draws = 4 3 2 = 24

In how many different ways the letters of the word "MATHEMATICS" can be arranged so that the vowels always come together ?

Total 11 letters

Vowels - AEAI. A repeats

Consonants M and T repeats

Consider vowels AEAI as one letters =V, since it is asked to find arrangements where they come together. However A repeats so, combination needs to divide by 2!

Total number of letters with AEAI together as one letter = 8. Total arrangements = 8!. However M and T repeats . So arrangements become 8!/(2!*2!)

Now with AEAI, they can have their own arrangements, still being together. So those arrangements are 4!/2! (since A repeats)

total permutations= 8! x 4!/8=7! * 4! = 120960

A box contains 2 white balls, 3 black balls and 4 red balls. In how many ways can 3 balls be drawn from the box, if at least one black ball is to be included in the draw ?

Total Combinations of 9 balls into 3= {}^9C_3

Total combination where black ball doesn't show up = {}^{(number of other balls)}C_3 = []^6C_3

Number of combinations where at least one black ball shows up = {}^9C_3 - {}^6C_3

= 84-20 = 64

How many groups of 6 persons can be formed from 8 men and 7 women ?

{}^{15}C_3

8 men entered a lounge simultaneously. If each person shakes hand with the other, then find the total no. of handshakes?

shake hand is group of 2

No order, A shakes B same as B shakes A

So {}^8C_2

In how many different ways, can the letters of the word 'INHALE' be arranged ?

INHALE has 6 letters, none repeating.

Total arrangements = 6!

In how many different ways, can the letters of the word 'BANKING' be arranged ?

BANKING has 7 letters, N repeating

Total Arrangements =

 \frac{7!}{2!}

In a meeting between two countries, each country has 12 delegates. All the delegates of one country shake hands with all delegates of the other country. Find the number of handshakes possible ?

all x all = 12 x 12

How many can straight lines be drawn from 15 non-collinear points ?

Combination of 2 points. And a line cant be drawn from a point to itself

={}^{15}C_2 - 15

=105

There are 25 students in a class. Find the number of ways in which a committee of 3 students is to be formed?

Combination as order is not important

{}^{25}C_3

In how many ways can 8 Indians and, 4 American and 4 Englishmen can be seated in a row so that all person of the same nationality sit together?

8I, 4A, 4E

Consider each group as whole and find their seating

3 groups, order important, 3P3 = 3!

In each group, you can have arrangements, order

Total =3! 8! 4! * 4!

How many different words can be formed using all the letters of the word ALLAHABAD?
(a) When vowels occupy the even positions.
(b) Both L do not occur together.

9 letters,

Repeats A -4, L - 2,

Vowels = A only, So only one vowel combination

Remaining 5 letters, and arrangements = 5!

Consider both L's together as single letter, then arrangements = 4!

So arrangements where both Ls don't occur together = 5!-4!

120 - 24= 96

Animating Logit function plot in Matplotlib


Animation package in Matplotlib makes it very easy to animate plots. Here I will walk you through animating Logit fuction plot

Steps to make an animated chart is straight forward

  • Prepare data, for example, list X and list Y
  • Decide FPS(Frames Per Second) and Bitrate of animation and initialize an mp4 video writer using ffmpeg. This writer encodes the matplotlib figure to the video file at the set FPS rate.
  • Write a function that takes the frame number as the input. Matplotlib animation calls this function. Within the function draw the plot up to what is necessary for the given frame number. For example if there are 100 data points along X axis, and the total frames chosen are 50, then for frame f, draw for X from 0 to f*2. If the FPS chose is 25 and total number of frame chose in 50, The animation would last for 2 seconds.
  • Start and record animation by calling save function

Import libraries

[code lang=python]
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.animation as animation
[/code]

Calculate X values and Y values. In Logit, probability P of dependant variable is calculated as P=\frac{1}{1+e^{-(a+bX)}}

Take a=0 and b=1.0
Range of X , -10.0 to +10.0 at 0.1 increments
Calculate Y( or P)

[code lang=python]
Xmin = -10
Xmax = 10
X=np.arange(Xmin, Xmax, 0.02)
a=0
b=1
Y= np.fromiter(( (1/(1+np.exp(-xi))) for xi in X), X.dtype)
[/code]

Initialize ffmpeg video writer with FPS and Bitrate

[code lang=python]
Writer = animation.writers['ffmpeg']
writer = Writer(fps=20, metadata=dict(artist='Me'), bitrate=1800)
[/code]

Initialize figure and plot

[code lang=python]
%matplotlib notebook
fig = plt.figure(figsize=(10,6))
plt.xlim(np.min(X), np.max(X))
plt.ylim(np.min(Y), np.max(Y))
plt.xlabel('X',fontsize=16)
plt.ylabel('Y',fontsize=16)
plt.title('Logit Func',fontsize=20)
[/code]

Define animate function

[code lang=python]
def animate(i):
p = sns.lineplot(x=X[:i*5], y=Y[:i*5], color="r")
p.tick_params(labelsize=16)
plt.setp(p.lines,linewidth=5)
[/code]

Launch animation and write video

[code lang=python]
ani = matplotlib.animation.FuncAnimation(fig, animate, frames=200, repeat=True)
ani.save('Logit-basic.mp4', writer=writer)
[/code]

Understanding Logistic Regression

Logistic regression is one of the first classification techniques that one comes across in machine learning. From its name, it even confuses the beginners as a regression technique though it is a classification technique. However the term 'regression' in its names does mean it is a regression technique. So is it Linear Regression behind the play? What is actually Logistic Regression? I came across few blogs on this topic as mentioned in the References. Here, I will share my notes and a simplified explanation to Logistic Regression.

Logistic regression is used for binary classification, however what works behind is a linear regression. Continuous range of linear regression is mapped to a near binary range of logistic regression, more precisely Logit function. Lets try to understand this

Odds ratio,

\frac{\frac{number of success}{total trials}}{\frac{number of failures}{total trials}}

Range of O:

​ = 0 when no occurrence of success

​ = \infty when all are success

Range of Success outcome:

​ = P = 0 to 1

Consider a linear regression

latex Y=a+b_iX_i -\infty \leq Y \leq +\infty  ~and -\infty \leq X \leq +\infty​

Here the outcome variable Y and feature variables Xs range from -\infty to +\infty. However in classification, outcome label Y cant't range from latex -\infty to +\infty. So we need transform the LHS to match the range in linear regression. Consider these options:

  • P=a+b_iX_i 0 \leq P \leq 1 ~and -\infty \leq X \leq +\infty. Here P is the occurrence of outcome variable which is either 0 or 1 or similar in binary classification
  • O= \frac{P}{1-P}=a+b_iX_i 0 \leq O \leq +\infty ~and -\infty \leq X \leq +\infty. Here outcome label O is Odds ratio P(yes)/P(no)
  • ln(O)= ln(\frac{P}{1-P})=a+b_iX_i -\infty \leq O \leq +\infty ~and -\infty \leq X \leq +\infty. Here outcome label is natural log of O

In the third case above, range of LHS matches the range of linear regression's RHS. This LHS is the logit function, ln(O). So we can use linear regression on RHS to estimate ln(O) which is what logistic regression is doing. To get the classification outcome label P, rearrange equation for ln(O)

  • ln(O)= ln(\frac{P}{1-P})=a+b_iX_i

  • \frac{P}{1-P} = e^{(a+b_iX_i)}

  • P=\frac{1}{1+e^{-(a+b_iX_i)}}

Plotting P vs. X with a=0, b=1


References

https://towardsdatascience.com/logit-of-logistic-regression-understanding-the-fundamentals-f384152a33d1

Treasure Hunt – Matplotlib Animation

Player starts from a marked start position(denoted 2 in matrix) and traverses possible open doors(0s) and try to reach the treasure position(4). Player will have to retreat if the door is closed(1) in the path and seek another path through an open door(0).

Positions, start(2), treasure(4), open(0) doors and closed(1) doors are chosen at random in the NxN labyrinth matrix. Choose a value of N and ratio of open to closed doors to start the simulation

import numpy as np
from matplotlib import pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML
#Initial values for simulation

N=15 #size of the labyrinth (N x N matrix)
ratio= 0.7 #Ration of open doors vs closed doors

Build Labyrinth

P=2 #value denoting start position in matrix
D=4 #value denoting treasure position in matrix

#create labyrith matrix
#start with all open doors (value 0)
ar =np.zeros((1,N*N -2), dtype=int)
# add closed positions(value 1)
ar[0,int((N*N)  * ratio):] = 1
# Add start and treasure positions
ar=np.append(ar, [P,D])

#Shuffle all positions
np.random.shuffle(ar)

#Make as N x N matrix
ar = ar.reshape(N,N)
print(f"Labyrinth:\n\n {ar}")
Labyrinth:

 [[1 0 0 1 0 1 0 0 0 1 0 0 0 0 1]
 [1 1 0 0 1 1 1 0 0 0 1 0 0 0 0]
 [0 0 1 0 0 0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 1 1 1 1 1 0 0 0 0 1 0]
 [1 0 1 0 1 0 0 1 0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0 0 0 0 1 1 0 1 0]
 [0 0 1 0 0 0 1 0 0 0 0 1 0 0 0]
 [1 2 1 1 0 1 0 0 0 0 0 1 0 1 0]
 [0 0 1 0 1 0 0 0 1 0 0 0 0 0 4]
 [0 0 0 1 0 0 0 0 0 1 0 0 0 0 1]
 [0 0 1 0 0 0 0 0 1 1 0 0 0 0 0]
 [0 0 1 0 0 0 1 0 1 0 0 0 1 0 0]
 [1 0 1 0 0 1 0 0 1 0 1 0 0 0 1]
 [1 0 0 0 1 0 0 0 0 1 0 0 0 1 0]
 [1 1 0 1 1 0 0 0 1 0 0 1 0 1 0]]
# Find the start position by looking up the position of P
Ppos=(0,0)
Dpos=(0,0)
for i in range(0, ar.shape[0]):
    for j in range(0, ar.shape[1]):
        if ar[i,j] ==P:
            Ppos=(i,j)
            break

print(f"Start position: {Ppos}")
Start position: (7, 1)
#Define function to traverse path

#Find possible next positions around current pos
#Checks boundaries. Looks around in clockwise direction
def find_pos_around(pos):
    r,c=pos[0],pos[1]
    apos=[]
    if c+1 < N:
        apos.append((r,c+1))
    if r+1 < N:
        apos.append((r+1,c))
    if c-1 > -1:
        apos.append((r,c-1))
    if r-1 > -1:
        apos.append((r-1,c))
    return apos

#Is the position already visited
def is_visited(pos):
    return pos in visit_index

#Is the position a closed door
def is_obstacle(pos):
    return ar[pos[0], pos[1]] ==1

def is_treasure(pos):
    return ar[pos[0], pos[1]] ==D

#update new position
def at_new_pos(pos):
    global counter
    visit_index[pos] =counter
    counter += 1
    hotpath.append(pos)

#capture traversed path
def log_trail(pos):
    trackX.append(pos[0])
    trackY.append(pos[1])
    hX=[]
    hY=[]
    for p in hotpath:
        hX.append(p[0])
        hY.append(p[1])
    hotTrack.append([hX,hY])

#retreat from a position
def pop_pos(pos):
    hotpath.pop()

#Move to next position
def moveto_next_pos(pos,opos):
    #print(f"Now:{pos}")
    log_trail(pos)
    at_new_pos(pos)
    found =False
    if is_treasure(pos):
        found =True
    else:
        around_pos = find_pos_around(pos)
        for npos in around_pos:
            if is_visited(npos) or is_obstacle(npos):
                continue
            else:
                found=moveto_next_pos(npos, pos)
                log_trail(pos) ##back from child
                if found:
                    break
    if not found:
        pop_pos(pos)
    return found

Traverse the labyrinth

#
counter =0
visit_index={} #tracks if a position previously visited
hotpath=[] #tracks the last traversed path
trackX=[] #x coordinates of traversed positions including retreated ones; for plotting
trackY=[] #y coordinates of traversed positions including retreated ones; for plotting
hotTrack=[] #coordinates of hotpath; for plotting

#Move to start position P from None and start traversing
found=moveto_next_pos(Ppos, None)
if found:
    print(f"Voila!\n\nFound Path:\n{hotpath}")
else:
    print(f"No Way!\n\n Last Path:\n{hotpath}")
Voila!

Found Path:
[(7, 1), (8, 1), (9, 1), (10, 1), (11, 1), (12, 1), (13, 1), (13, 2), (13, 3), (12, 3), (12, 4), (11, 4), (11, 5), (10, 5), (10, 6), (10, 7), (9, 7), (9, 6), (9, 5), (8, 5), (8, 6), (8, 7), (7, 7), (7, 8), (7, 9), (7, 10), (8, 10), (8, 11), (8, 12), (8, 13), (8, 14)]

Animate the paths

# X coordinates of labyrith
X= np.zeros((N*N,),dtype=int) 
# Y coordinates of labyrith
Y= np.zeros((N*N,),dtype=int) 
# Pos values in the labyrith, line open, closed, start and treasuree)
T= np.zeros((N*N,),dtype=int) 

#Build X, Y and T from labyrith matrix
pos=0
for i in range(0,ar.shape[0]):
    for j in range(0,ar.shape[1]):
        X[pos]=i
        Y[pos]=j
        T[pos]=ar[i,j]
        pos += 1
%%capture
#Plot the labyrinth as scatter
#Animate the traversed paths in yellow lines
#Animate the hot path in green line

#plt.rcParams['animation.ffmpeg_path'] = '/usr/bin/ffmpeg'

colormap = np.array([ 'g','r', 'b','c','m','c']) #color map of positions
sizemap = np.array([150,150,600,600,800]) #Size indicative of positions

def update_plot(frame):
    trailP.set_data(trackX[0:frame], trackY[0:frame])
    hotP.set_data(hotTrack[frame][0],hotTrack[frame][1])
    return trailP

fig = plt.figure(figsize=(10,10))
fig.suptitle("Treasure Hunt")
plt.scatter(X,Y, c=colormap[T], s=sizemap[T])

trailP =plt.plot([], [], color='y')[0]
hotP=plt.plot([], [], color='g', lw=3)[0]

anim=animation.FuncAnimation(fig, update_plot, 
                             frames=len(trackX), interval=500, repeat=False)
Writer = animation.writers['ffmpeg']
writer = Writer(fps=20, metadata=dict(artist='Me'), bitrate=1800)
anim.save("treasure_hunt.mp4", writer=writer)
HTML(anim.to_html5_video())