M20773: Analyzing Big Data with Microsoft R

3 Day Course
Hands On
Official Microsoft Curriculum
Code M20773

This course has been retired. Please view currently available Microsoft Virtualization Training Courses.


Hide all

Microsoft R Server and R Client (7 topics)

  • What is Microsoft R server
  • Using Microsoft R client
  • The ScaleR functions
  • Lab: Exploring Microsoft R Server and Microsoft R Client
  • Using R client in VSTR and RStudio
  • Exploring ScaleR functions
  • Connecting to a remote server

Exploring Big Data (8 topics)

  • Understanding ScaleR data sources
  • Reading data into an XDF object
  • Summarizing data in an XDF object
  • Lab: Exploring Big Data
  • Reading a local CSV file into an XDF file
  • Transforming data on input
  • Reading data from SQL Server into an XDF file
  • Generating summaries over the XDF data

Visualizing Big Data (5 topics)

  • Visualizing In-memory data
  • Visualizing big data
  • Lab: Visualizing data
  • Using ggplot to create a faceted plot with overlays
  • Using rxlinePlot and rxHistogram

Processing Big Data (6 topics)

  • Transforming Big Data
  • Managing datasets
  • Lab: Processing big data
  • Transforming big data
  • Sorting and merging big data
  • Connecting to a remote server

Parallelizing Analysis Operations (5 topics)

  • Using the RxLocalParallel compute context with rxExec
  • Using the revoPemaR package
  • Lab: Using rxExec and RevoPemaR to parallelize operations
  • Using rxExec to maximize resource use
  • Creating and using a PEMA class

Creating and Evaluating Regression Models (7 topics)

  • Clustering Big Data
  • Generating regression models and making predictions
  • Lab: Creating a linear regression model
  • Creating a cluster
  • Creating a regression model
  • Generate data for making predictions
  • Use the models to make predictions and compare the results

Creating and Evaluating Partitioning Models (7 topics)

  • Creating partitioning models based on decision trees.
  • Test partitioning models by making and comparing predictions
  • Lab: Creating and evaluating partitioning models
  • Splitting the dataset
  • Building models
  • Running predictions and testing the results
  • Comparing results

Processing Big Data in SQL Server and Hadoop (7 topics)

  • Using R in SQL Server
  • Using Hadoop Map/Reduce
  • Using Hadoop Spark
  • Lab: Processing big data in SQL Server and Hadoop
  • Creating a model and predicting outcomes in SQL Server
  • Performing an analysis and plotting the results using Hadoop Map/Reduce
  • Integrating a sparklyr script into a ScaleR workflow


In addition to their professional experience, students who attend this course should have:

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices.
  • Basic knowledge of the Microsoft Windows operating system and its core functionality.
  • Working knowledge of relational databases.

It is recommended that delegates review this self-pace content to gain an introduction to the R language


Course PDF