semba

Text Mining and Wordcloud in R

Masumbuko Semba
Word clouds Word clouds visualize word frequencies of either single corpus or different corpora. Although word clouds are rarely used in academic publications, they are a common way to display language data and the topics of texts - which may be thought of as their semantic content. To exemplify how to use word clouds, we are going to have a look at the State of Environment issued in 2019 by the department of environment of the vice president’s office.

Access Open Street Map features programmatically with osmdata package in R

Masumbuko Semba
OpenStreetMaps is a great source of spatial data. Most common programming languages have packages for downloading data from OSM. In this tutorial we are going to see how to download hosptial features data using R’s osmdata (Padgham et al. 2017) package and plot it using ggplot (Wickham 2016), and interactively using tmap (Tennekes 2018). This requires some knowledge of spatial data structures.

Simplified Principal Component Analysis in R

Masumbuko Semba
R
Principal Component Analysis (PCA) Principal Component Analysis (PCA) is widely used to explore data. This technique allows you visualize and understand how variables in the dataset varies. Therefore, PCA is particularly helpful where the dataset contain many variables.This is a method of unsupervised learning that allows you to better understand the variability in the data set and how different variables are related. The Components in PCA are the underlying structure in the data.

The Lake Victoria bathymetry

Masumbuko Semba
I was looking for bathymetry dataset for Lake Victoria online and I came across this link. It stores several products of the bathymetry data of the Lake Victoria. Among them products is the gridded TIFF file. This dataset was created by a team from Harvard University in 2017 (Hamilton et al. 2016). They used over 4.2 million points collected over 100-years of surveys. The point data was obtained from an Admiral Bathymetry map and points collected in the field.

Linear and Bayesian Regression Models with tidymodels package

Masumbuko Semba
As a data scientist, you need to distinguish between regression predictive models and classification predictive models. Clear understanding of these models helps to choose the best one for a specific use case. In a nutshell, regression predictive models andclassification predictive models` fall under supervised machine learning. The main difference between them is that the output variable—in regression is numerical (or continuous) while that for classification is categorical (or discrete).