Khalid Mammadov

PASSIONATE ABOUT BIGDATA, DATA PROCESSING, OPTIMISATION, AUTOMATION

Occasionally, I write about technologies that interest me on this blog.

GPU processing

Cloud Storage Ingestion Cost Estimation for Big Data using Monte Carlo Simulation over CUDA libraries and NVIDIA Tesla GPU

Installing NVIDIA Tesla K80 GPU on a workstation for Deep Learning

Apache Spark

Running Apache Spark (Client mode) and Jupiter Notebooks on Kubernetes

Running Apache Spark and Jupiter Notebooks on Kubernetes with Helm Charts

Running Spark on Kubernetes using spark-on-k8s-operator, CRDs and scheduling it from microservice [Java]

Apache Spark ML: Using Gradient Boost Classifier to predict MOT test results [Python]

Apache Spark ML: Using Random Forest Classifier to predict MOT test results [Scala]

Adding “hooks” to Apache Spark core to act on various Spark events [Scala]

Spark DataFrameWriterV2 example using Sqlite [Scala]

Building latest (SNAPSHOT) Spark and running on Standalone Docker Cluster

Apache Spark Internals: Executor launch orchestration

Apache Spark internals: Shuffle in detail

Apache Spark internals: RDD creation behind the scenes

Apache Spark Internals: Architecture and lifecycle

Apache Arrow

Loading Delta (Parquet) files into Apache Arrow

Python

Parallel processing (Pandas example)

Deterministic in Python (lru_cache) for function optimisation

Python, Pandas, SQLAlchemy, SQL Server and Docker

Pascal’s Triangle in Python

Python file search for “unlucky” ones

Hadoop (old)

Building Inverted Index with Hadoop MapReduce (for a search engine)

Finding Min and Max FX Rates for every country using Hadoop MapReduce

Apache Hive in action

Apache Pig with examples

Apache HBASE on Docker containers

Setting up Single node HADOOP on docker container

Distributed Hadoop cluster on Docker containers