Skip to main content

Module 1 - Introduction

This is an introductory module that will familiarize you to some of the data science terms.

Dataset

A dataset is a table/spreadsheet document with historical information. For example, if you want to understand details about the weather: the dataset will include the historical information about the weather in the past few years. Precipitation, Humidity, Temperature, etc. Below is a sample snapshot of the dataset:

Sample Data

Some definitions are based on the following snapshot. You can spend some time looking at the screenshot before moving forward.

workbook

Variables

A variable is like a container. Values can be assigned to them. Precipitation, Humidity, temperature etc are variables here in this above example.

Types of variables

  • Independent variable: Independent variables are not influenced by any factors and impact the outcome variable. Here the outcome variable is ‘chances of rain’. Temperature, Humidity, sunrise, sunset times are independent variables.

    note

    Another name for Independent variable is Predictor variable.

  • Dependent variable: Dependent variables are influenced or dependent on the independent variable. For example – the chance of rain depends on the temperature, humidity etc. Depending on the temperature, the chances of rain may increase or decrease.

    note

    Another name for Dependent variable is Output variable. Mainly because the resulting variable value varies depending on the independent variable.

  • Continuous variable: Any variable with numerical value is considered as continuous. Temperature, humidity, and chances of rain are considered continuous variables.

  • Categorical variable: Any variable with non-numerical value is considered as categorical. Weather is considered a categorical variable.

Descriptive data

Summarize the data or describe the data in a meaningful way to provide insights about the dataset. The example image describes the data overall. Below are some of the key insights that can be provided to describe the data:

  • Highest and lowest temperature
  • Did increase in the temperature lower the chances of rain?
  • Effects of humidity on rain
  • Weather Vs. chances of rain
note

The visual story telling is the best way to understand data. We will learn more about the visualization later in the series.

This is the end of Module 1. We will explore more into each of these terms in next lessons.