PASSIONATE ABOUT BIGDATA, DATA PROCESSING, OPTIMISATION, AUTOMATION
Occasionally, I write about technologies that interest me on this blog.
GPU processing
Installing NVIDIA Tesla K80 GPU on a workstation for Deep Learning
Apache Spark
Running Apache Spark (Client mode) and Jupiter Notebooks on Kubernetes
Running Apache Spark and Jupiter Notebooks on Kubernetes with Helm Charts
Apache Spark ML: Using Gradient Boost Classifier to predict MOT test results [Python]
Apache Spark ML: Using Random Forest Classifier to predict MOT test results [Scala]
Adding “hooks” to Apache Spark core to act on various Spark events [Scala]
Spark DataFrameWriterV2 example using Sqlite [Scala]
Building latest (SNAPSHOT) Spark and running on Standalone Docker Cluster
Apache Spark Internals: Executor launch orchestration
Apache Spark internals: Shuffle in detail
Apache Spark internals: RDD creation behind the scenes
Apache Spark Internals: Architecture and lifecycle
Apache Arrow
Loading Delta (Parquet) files into Apache Arrow
Python
Parallel processing (Pandas example)
Deterministic in Python (lru_cache) for function optimisation
Python, Pandas, SQLAlchemy, SQL Server and Docker
Python file search for “unlucky” ones
Hadoop (old)
Building Inverted Index with Hadoop MapReduce (for a search engine)
Finding Min and Max FX Rates for every country using Hadoop MapReduce
Apache HBASE on Docker containers