## Why do we need to omit a dummy variable when estimating impact of a categorical variable on a dependent variable?

The usual explanation for why we need to omit one of the levels of a categorical variable when using dummy coding relies on the mathematics of linear regression. The explanation relies on the fact that including all the levels of the categorical variable along with the intercept results in perfect multicollinearity and thus there is no […]

## Market Research Methods to Determine Pricing

Overview There are many methods to determine prices for products. In this post, we will focus on three main approaches: Conjoint analysis Van Westendorp’s Price Sensitivity Meter Gabor-Granger method

## What you don’t learn in grad school.

Developing robust software that is error free is one of the most important skills that everyone should learn in grad school but don’t for a variety of reasons or at least that is my experience. Yours may be different. I will illustrate one context where a simple software engineering idea is very useful in eliminating hard […]

## A plain English explanation of the difference between Bayesian and Frequentist methods in statistics.

Context Suppose that you are playing tennis and that upon missing the opponent’s serve the tennis ball lands in a bunch of shrubs behind the court. How would you decide where to start searching for the ball? There are two approaches in statistics to answer the above question: Frequentist and Bayesian. A plain English explanation […]

## Deriving the Posterior Distribution for Population Variance

In an earlier post, we discussed how to derive the posterior distribution for the population mean. In this post, we will focus on deriving the posterior distribution for the variance parameter which is used in different types of Bayesian inference. Background Context In a lot of different Bayesian contexts (e.g., hierarchical Bayesian linear regression, hierarchical […]

## Deriving the Posterior Distribution for Population Mean

Background Context In a lot of different Bayesian contexts (e.g., hierarchical Bayesian linear regression, hierarchical Bayesian estimation of discrete choice models etc), the following situation arises: There are $$n$$ respondents whose response can be modeled by a set of independent variables and associated parameters. We will denote these parameters by $$\beta_i$$ .   We assume […]

## Sentiment Analysis Using Python

Processing open ended consumer responses used to be time consuming. The usual process involved categorizing the response into several categories and then summarize the themes that emerge. Natural language processing (NLP) libraries can help with the task of analyzing open ended responses to assess consumer sentiment, aspects of the experience consumers are talking about the […]

## What are confidence intervals?

Using confidence intervals is one approach to identify if the proposed version of an A/B test is achieving business objectives relative to baseline. This post discusses what confidence intervals are, how to interpret them and the impact of sample sizes on the confidence intervals.