Simplified Principal Component Analysis in R

Masumbuko Semba
R
Principal Component Analysis (PCA) Principal Component Analysis (PCA) is widely used to explore data. This technique allows you visualize and understand how variables in the dataset varies. Therefore, PCA is particularly helpful where the dataset contain many variables.This is a method of unsupervised learning that allows you to better understand the variability in the data set and how different variables are related. The Components in PCA are the underlying structure in the data.

The Lake Victoria bathymetry

Masumbuko Semba
I was looking for bathymetry dataset for Lake Victoria online and I came across this link. It stores several products of the bathymetry data of the Lake Victoria. Among them products is the gridded TIFF file. This dataset was created by a team from Harvard University in 2017 (Hamilton et al. 2016). They used over 4.2 million points collected over 100-years of surveys. The point data was obtained from an Admiral Bathymetry map and points collected in the field.

A unified Machine Learning Approach in R with tidymodels

Masumbuko Semba
tidymodels tidymodels is a suite of packages that make machine learning with R a breeze. R has many packages for machine learning, each with their own syntax and function arguments. tidymodels aims to provide an unified interface, which allows data scientists to focus on the problem they’re trying to solve, instead of wasting time with learning package syntax. The tidymodels has a modular approach meaning that specific, smaller packages designed to work hand in hand.

Linear and Bayesian Regression Models with tidymodels package

Masumbuko Semba
As a data scientist, you need to distinguish between regression predictive models and classification predictive models. Clear understanding of these models helps to choose the best one for a specific use case. In a nutshell, regression predictive models andclassification predictive models` fall under supervised machine learning. The main difference between them is that the output variable—in regression is numerical (or continuous) while that for classification is categorical (or discrete).

Plotting Heatmaps in R with ggplot2 and metR package

Masumbuko Semba
Heatmaps are powerful data visualization tools broadly widely used with meteorologic and oceanographic data. Heatmaps are excellent at tracking signals that move, like ocean current. These diagrams can be used for many more types of atmospheric features. The concept is to represent a matrix of values as colors where usually is organized by a gradient. This post explains how to create a heatmap of ocean current in R using the geom_tile(), geom_contour_filled from ggplot2 (Wickham 2016) and geom_contour_fill from metR package (Campitelli 2019).