Data Analysis in Python

5 Day Course
Hands On
Code PYPDA

Book Now - 2 Delivery Methods Available:

Classroom Virtual Classroom Private Group - Virtual Self-Paced Online

Overview

The Pandas library, with its data preparation and analysis features will be our ultimate focus. After familiarizing ourselves with its two data structures, the Series and the DataFrame, we will use the latter to read, manipulate and generally process tabular data sourced from excel, csv and other file formats. However, before that we will thoroughly familiarize ourselves with the NumPy library, not only because it is the foundation of Pandas but, also because it offers powerful tools for numerical calculations and forms the basis of practically all of Pythons Data Science Libraries. We will explore its vectorized functions, basic linear algebra features and use its random library to demonstrate the sampling of different distributions.

Statistics, at least in the descriptive form must be an integral part of any meaningful data analysis course. So, we will learn various data classifications, applicable summary statistics. We also discuss and explore by example, the strengths and weaknesses of the various statistical summaries.

Visualization is another vital component of data analysis. To paraphrase, a graph is worth a thousand words. In this course we will learn the most appropriate visualisation for any given data set and use Matplotlib (and Seaborn) to produce Bar-charts, Pie-charts, histograms, box-plots, scatter-plots and line-graphs.

Finally, for our programming environment, we will use Jupyter Notebook (or Jupyter Lab according to our preference) on the Anaconda platform. This is the cutting edge of editor technology in Pythons Data Science ecosystem.

We believe in learning-by-doing, so we have taken an integrated and problem-solving approach to delivering our training. The course is broken into sessions, each centred on a few related core concepts and skills. The relevant background is discussed at the beginning of the session, in a just-in-time approach. This is followed by illustrative examples, which includes the introduction of library features, syntax and semantics. For the second half, which is most of the session, the delegates are expected to solve relevant problems of graduated difficulty. Example solutions will be available for the delegates to take away at the end of the course.

This approach is effective as it integrates the learning of statistical theory, library features and Python language syntax, increasing retention by providing meaningful context for each. Immediate practice also helps delegates cement their understanding of concepts on which we build gradually.

Objectives

This course aims to provide the delegate with the knowledge to be able to:

  • Determine the type of data at hand and decide of the most appropriate analysis and visualisation
  • Perform numerical calculations using the Python NumPy library
  • Use Pandas to read, explore, manipulate and process tabular data from various sources, including excel, csv, Json files and relational databases
  • Visualise and generally explore data using Matplotlib and Seaborn
  • Carry out descriptive statistical summaries on data in Python
  • Interpret graphs and statistical results correctly

Target Audience

This course will benefit anyone who requires a solid practical foundation in Data Analysis, including descriptive statistics and visualisation in Python.

Additional Information

We believe in learning-by-doing, so we have taken an integrated and problem-solving approach to delivering our training. The course is broken into sessions, each centred on a few related core concepts and skills. The relevant theory is discussed at the beginning of the session, in a just-in-time approach. This is followed by an illustrative example, which includes the introduction of library features, syntax and semantics. For the second half, which is most of the session, the delegates are expected to solve relevant problems of graduated difficulty. Example solutions will be available for the delegates to take away at the end of the course.

This approach is effective as it integrates the learning of statistical theory, library features and Python language syntax, increasing retention by providing meaningful context for each. Immediate practice also helps delegates cement their understanding of concepts on which we build gradually.

Training Partners

We work with the following best of breed training partners using our bulk buying power to bring you a wider range of dates, locations and prices.

Modules

Hide all

Data Analysis Python (21 topics)

  • Numpy
  • Create and manipulate NumPy arrays and Matrices
  • Generate random numbers from various distributions
  • Use NumPy vectorized functions
  • Red array data from various common file formats
  • Pandas
  • Understand the composition, relation and main features of Pandas Series and DataFrame structures
  • Read Data from cvs, json, the web and relational database into DataFrames and Series
  • Data Cleaning and Preparations
  • Data Wrangling: Join, Combine and Reshape
  • Data Aggregation and Group operations
  • cvs, excel and other format data into Pandas DataFrame objects
  • Clean, group, manipulate and summarise tabular data using Pandas data processing features
  • Visualisation with Matplotlib (and Seaborn)
  • Plot
  • Bar, Column and Pie charts
  • box-plots
  • histograms
  • scatterplots and line-plots
  • Other
  • Use Jupyter Notebook and Jupyter Lab with the anaconda distribution

Statistics (15 topics)

  • Distinguish between different data types
  • Summarize Categorical and Numerical Data
  • Calculate basic descriptive statistical measures such as
  • Measures of Central Tendency:
  • Mean
  • Median
  • Mode
  • Measures of Dispersion:
  • Variance
  • Standard deviation
  • Quantiles
  • Understand the advantages and disadvantages of the various summary statistics
  • Decide on the best visual representation of any presented data
  • Understand Bivariate data and perform Correlation and basic Linear Regression
  • Produce various visual representation (or plots) of data

Prerequisites

This course has two requirements, programming experience and mathematical knowledge.

To fulfil the programming pre-requisites, our Python Programming 1, or its equivalent is required. Exceptions could be made for delegates with extensive experience in a different programming language that includes object-oriented concepts.

To fulfil the Mathematics pre-requisites, ideally A-Levels but at a minimum GCSE level is required, Delegates will be expected to understand simple formulae, percentages, proportions and limits, and interpret simple formulae and graphs.

Additional Learning

The courses below may help you meet the knowledge level required to take this course.

  • Python Programming 1

    This 4-day course provides delegates with the knowledge to be able to produce Python applications that exploit all core elements of the language.

    4 Day Course Hands On Training Course Code PYP1
    Classroom Virtual Classroom Private Group - Virtual Self-Paced Online

Scheduled Dates

Please select from the dates below to make an enquiry or booking.

Pricing

Different pricing structures are available including special offers. These include early bird, late availability, multi-place, corporate volume and self-funding rates. Please arrange a discussion with a training advisor to discover your most cost effective option.

Code Location Duration Price Apr May Jun Jul Aug Sep
Later scheduled dates may be available for this course.

Course PDF

Print

Share this Course

Share

Recommend this Course

Sections