#spark


This blog is the third part of Apache Spark tips sum-up learnt from my programing and debugging. Have been busy for such a long time, I have some time to carry on my personal blog. Today I’ll show you some tips about some functions in Spark SQL, and there may be some mistakes caused by my misunderstanding. Anyway, thanks if you figure out anything incorrect!

Read More


This article is about the 2nd generation Tungsten engine, which is the core project to optimize Spark performance. Compared with the 1st generation Tungsten engine, the 2nd one mainly focuses on optimizing query plan and speeding up query execution, which is a pretty aggressive goal to get orders of magnitude faster performance. Let’s take a look!

Read More


This article presents some tips of Apache Spark, which is part 2 of the series. All the tips below are based on the real problems which I met. Despite the background, the tips below are of valuable reference. I’ve tried a lot to learn about Apache Spark but can’t know the detail of every part of it. I’d appreciate it if you figure out the mistakes in this article.

Read More


Spark SQL is one of the most important components of Apache Spark and has become the fundation of Structure Streaming, ML Pipeline, GraphFrames and so on since Spark 2.0. Also, Spark SQL provides SQL queries and DataFrame/Dataset API, both of which are optimized by Catalyst. Let’s talk about Catalyst today.

Read More


This article presents the relationship between Spark RDD, DataFrame and Dataset, and talks about both the advantages and disadvantages of them. RDD is the fundamental API since the inception of Spark and DataFrame/Dataset API is also pretty popular since Spark 2.0. What’s the differences between them and how to decide which API to be imported, let’s have a quick look.

Read More


This article is about things I learned about Apache Spark recently. I’ve been struggling with Spark tuning for more than one month, including shuffle tuning, GC time tuning and so on. Honestly speaking, Apache Spark is just like an untamed horse, it could be a beauty if you tune it well while a nightmare if not. So let me show you some tips I’ve learned from my work. This article is part 1 and here we go.

Read More

Hi, all, 最近一直在研究spark tuning方面的问题,深感这是一个经验活,也是一个技术活,查阅和很多资料,在这里mark一下。

Read More

Hello,有一个月没写blog了感觉很自责,必须整起来!最近由于工作上遇到的一些调优困难,让我对Spark有些敬畏,所以集中的研究了下鬼魅玄学Spark,和大家分享一下。首先先来看看spark的基本工作流程。

Read More

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×