Good Hands-on Introduction to Apache Spark
Anyone who wants to learn the basics of Spark is well-advised to read the book “Learning Spark”. I particularly liked that the book is very practice-oriented and that you can...
All posts in Big data & data science in chronological order with newest first.
Anyone who wants to learn the basics of Spark is well-advised to read the book “Learning Spark”. I particularly liked that the book is very practice-oriented and that you can...
This article shows how to use k-d-trees with Apache Spark.
The Hadoop ecosystem has grown significantly over time. “Hadoop: The Definitive Guide” provides an overview of the framework’s most important topics and projects.
In “Data Driven - Creating a Data Culture”, the authors explain what they mean by a “data culture”.
MapReduce is a “corset” and forces the developer into narrow boundaries. Therefore, it makes sense to read “MapReduce Design Patterns” to quickly learn the common tricks and techniques. It is...
The small book “NoSQL Distilled:” provides a good overview of various NoSQL databases.
From 2002 to 2006, I worked at a Canadian manufacturer of a column-oriented database.
From 1999 to 2004, I collected information on the topic of ‘Fraud detection’ on my website. When I started this in 1999 as a research assistant at the University of...