Hadoop MapReduce: How to Build Reliable System from Unreliable Components

Posted on 2019-06-27 | Edited on 2019-06-28 | In Big Data

Symbols count in article: 8.4k | Reading time ≈ 8 mins.

This is course note of Big Data Essentials: HDFS, MapReduce and Spark RDD

Unreliable Components

3 different unreliable components in distributed systemsnodes: nodes, links and clock. Distributed computational systems: - built from unreliable components - Cluster nodes can break any time because of power supply, disk damages, overheated CPUs, and so on

3 different unreliable components in distributed systemsnodes: nodes, links and clock.

Hadoop MapReduce Streaming Application in Python

Posted on 2019-06-27 | Edited on 2019-06-28 | In Big Data

Symbols count in article: 9.2k | Reading time ≈ 8 mins.

This is course note of Big Data Essentials: HDFS, MapReduce and Spark RDD

Streaming

In a Hadoop MapReduce application:

Hadoop MapReduce Application Tuning

Posted on 2019-06-27 | Edited on 2019-06-28 | In Big Data

Symbols count in article: 7.9k | Reading time ≈ 7 mins.

This is course note of Big Data Essentials: HDFS, MapReduce and Spark RDD

The world of the efficient MapReduce is based on three whales. Combiner, Partitioner, and Comparator.

Combiner

To change the usage of these IO operations and network bandwidth, you can use combiner to squash several items into one.

Apache Spark: Advanced Topics

Posted on 2019-06-27 | Edited on 2019-06-28 | In Big Data

Symbols count in article: 8.6k | Reading time ≈ 8 mins.

This is course note of Big Data Essentials: HDFS, MapReduce and Spark RDD

Execution & Scheduling

SparkContext - When creating a Spark application, the first thing you do is create a SparkContext object, which tells Sparks how to access a cluster. - The context, living in your driver program, coordinates sets of processes on the cluster to run your application.

Apache Spark: Basic Concepts

Posted on 2019-06-27 | Edited on 2019-06-28 | In Big Data

Symbols count in article: 13k | Reading time ≈ 12 mins.

This is course note of Big Data Essentials: HDFS, MapReduce and Spark RDD

Apache Spark

Apache Spark:a modern distributed fault tolerant computation platform.

History of Apache Spark:

DFS, HDFS, Architecture, Scaling problem

Posted on 2019-06-27 | Edited on 2019-06-28 | In Big Data

Symbols count in article: 17k | Reading time ≈ 15 mins.

This is course note of Big Data Essentials: HDFS, MapReduce and Spark RDD

Scaling DFS

Big data storage:

Scale up (vertical scaling): get yourself a bigger hard drive
Lower latency
Scale out (horizontal scaling)
Higher latency
Problem: one node get out of service 3 years averagely

-> Distributed file system

Machine Learning Q&A: Basic

Posted on 2019-06-17 | Edited on 2019-10-19 | In Machine Learning

Symbols count in article: 18k | Reading time ≈ 16 mins.

1. Can you state Tom Mitchell's definition of learning and discuss T, P and E?

Mitchell (1997) provides the definition “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Deep Learning Q&A Part I: UAT, Motivation

Posted on 2019-06-17 | Edited on 2019-10-19 | In Machine Learning

Symbols count in article: 3.5k | Reading time ≈ 3 mins.

Universal Approximation of neural networks

1. State the universal approximation theorem? What is the technique used to prove that?

Universal approximation theorem (Hornik et al., 1989; Cybenko, 1989) states that a feedforward network with a linear output layer and at least one hidden layer with any “squashing” activation function (such as the logistic sigmoid activation function) can approximate any Borel measurable function from one finite-dimensional space to another with any desired non-zero amount of error, provided that the network is given enough hidden units.

The universal approximation theorem means that regardless of what function we are trying to learn, we know that a large MLP will be able to represent this function.

However, we are not guaranteed that the training algorithm will be able to learn that function. Even if the MLP is able to represent the function, learning can fail for two different reasons.

The optimization algorithm used for training may not be able to find the value of the parameters that corresponds to the desired function.
The training algorithm might choose the wrong function due to overfitting

The universal approximation theorem says that there exists a network large enough to achieve any degree of accuracy we desire, but the theorem does not say how large this network will be.

2. What is a Borel measurable function?

Any continuous function on a closed and bounded subset of \(R^n\) is Borel measurable and therefore may be approximated by a neural network.

Machine Learning Q&A - Part IV: Clustering & Bayesian

Posted on 2019-06-17 | Edited on 2019-10-19 | In Machine Learning

Symbols count in article: 8.3k | Reading time ≈ 8 mins.

Clustering

1. Describe the k-means algorithm.

K-means clustering is a simple and elegant approach for partitioning a data set into K distinct, non-overlapping clusters.

The idea behind K-means clustering is that a good clustering is one for which the within-cluster variation is as small as possible.

The within-cluster variation for cluster \(C_k\) is a measure \(W(C_k)\) of the amount by which the observations within a cluster differ from each other.

Machine Learning Q&A Part III: SVM

Posted on 2019-06-17 | Edited on 2019-10-19 | In Machine Learning

Symbols count in article: 11k | Reading time ≈ 10 mins.

Support Vector Machine

1. SVM v.s. Logistic Regression

SVM Optimization problem: \[ \max_{\beta_0,...\beta_p,\epsilon_1,..,\epsilon_n} M \\ s.t. \sum_{j=1}^p \beta_j^2=1, \quad (9.13) \\ y_i(\beta_0+\beta_1x_{i1}+\beta_2x_{i2},...+\beta_px_{ip})>M(1-\epsilon_i) \quad \forall i=1,..,n. \quad (9.14) \\ \epsilon_i\geq0,\sum_{i=1}^p\epsilon_i \leq C, \quad (9.15) \] Rewrite the criterion (9.12)–(9.15) for fitting the support vector classifier \(f(X) = β_0 + β_1X_1 + . . . + β_pX_p\) as \[ \min_{\beta_0,...,\beta_p}\left\{ \sum_{i=1}^n\max[0,1-y_if(x_i)]+\lambda\sum_{j=1}^p\beta_j^2 \right\} \]

λ is small: few violations to the margin ; high-variance, low-bias; \(\Leftrightarrow\) small \(C\);