Why do we need to omit a dummy variable when estimating impact of a categorical variable on a dependent variable?

Posted Leave a comment

The usual explanation for why we need to omit one of the levels of a categorical variable when using dummy coding relies on the mathematics of linear regression. The explanation relies on the fact that including all the levels of the categorical variable along with the intercept results in perfect multicollinearity and thus there is no […]

What you don’t learn in grad school.

Posted Leave a comment

Developing robust software that is error free is one of the most important skills that everyone should learn in grad school but don’t for a variety of reasons or at least that is my experience. Yours may be different. I will illustrate one context where a simple software engineering idea is very useful in eliminating hard […]

A plain English explanation of the difference between Bayesian and Frequentist methods in statistics.

Posted Leave a comment

Context Suppose that you are playing tennis and that upon missing the opponent’s serve the tennis ball lands in a bunch of shrubs behind the court. How would you decide where to start searching for the ball? There are two approaches in statistics to answer the above question: Frequentist and Bayesian. A plain English explanation […]

Deriving the Posterior Distribution for Population Variance

Posted Leave a comment

In an earlier post, we discussed how to derive the posterior distribution for the population mean. In this post, we will focus on deriving the posterior distribution for the variance parameter which is used in different types of Bayesian inference. Background Context In a lot of different Bayesian contexts (e.g., hierarchical Bayesian linear regression, hierarchical […]

Deriving the Posterior Distribution for Population Mean

Posted Leave a comment

Background Context In a lot of different Bayesian contexts (e.g., hierarchical Bayesian linear regression, hierarchical Bayesian estimation of discrete choice models etc), the following situation arises: There are \(n\) respondents whose response can be modeled by a set of independent variables and associated parameters. We will denote these parameters by \(\beta_i\) .   We assume […]

Sentiment Analysis Using Python

Posted Leave a comment

Processing open ended consumer responses used to be time consuming. The usual process involved categorizing the response into several categories and then summarize the themes that emerge. Natural language processing (NLP) libraries can help with the task of analyzing open ended responses to assess consumer sentiment, aspects of the experience consumers are talking about the […]