tidyverse

Manipulating Data with dplyr

Masumbuko Semba
Before a dataset can be analysed in R, its often manipulated or transformed in various ways. For years manipulating data in R required more programming than actually analyzing data. That has improved dramatically with the dplyr package. It provides programmers with an intuitive vocabulary for executing data management and analysis tasks. Hadley Wickham [-@dplyr], the original creator of the dplyr package, refers to it as a Grammar of Data Manipulation.

Importing data in R

Masumbuko Semba
You can lean R with the dataset it comes with when you install it in your machine. But sometimes you want to use the real data you or someone gathered already. One of critical steps for data processing is to import data with special format into R workspace.Data import refers to read data from the working directory into the workspace. In this chapter you will learn how to import common files into R.

Introduction to tidyverse

Masumbuko Semba
tidyverse While the base R packages includes many useful functions and data structures that you can use to accomplish a wide variety of data science task, the add–on tidyverse package supports a comprehensive data science workflow as illustrated in figure 1. Figure 1: Schematic drawing of the data science workflow Tidyverse is a coherent system of packages designed to address specific component of the workflow. Most of the package in the tidyverse were developed by Hadley Wickham [-@tidyverse], and many other contributors.

Vector Data in R

Masumbuko Semba
Introduction This chapter provides brief explanations of the fundamental vector model. You will get familiar with the theory behind vector model and the disciplines in which they predominate, before demonstrating its implementation in R. Vector is the most basic data structure in R. It is a sequence of elements of the same data type. if the elemenets are of different data types, they be coerced to a commontype that can accomodate all the elelements.

Data types in R

Masumbuko Semba
R is a flexible language that allows to work with different kind of data format [@bradley]. This inluced integer, numeric, character, complex, dates and logical. The default data type or class in R is double precision—numeric. In a nutshell, R treats all kind of data into five categories but we deal with only four in this book. Before proceeding, we need to clear the workspace by typing rm(list = ls()) after the prompt in the in a console.