ORC stored HIVE table has been very common in my daily work. Having a deep dive in ORC format recently, I realize that there are many awesome features in ORC format, and many of them have never been used before or somehow in the wrong way by me. So I take some time with some little tests and present two of the most exciting features of ORC stored HVE table in this blog. Let’s start this!
BTW, if you are not familiar with ORC, you can take a quick view from my previous blog.

Read More


YARN, the abbreviation of Yet Another Resource Negotiator, is introduced in Hadoop 2.0. Compared with MRV1(MapReduce Version 1), YARN takes over the responsibility of resource management and job scheduling in MRV1, and make non-MapReduce jobs run on the Hadoop, Apache Spark for example. Although there are some rising technology that been treated as alternative of YARN by more and more developers, Kubernetes for instance, YARN is still widely used. Let’s talk about YARN today.

Read More


Dealing with HIVE is one of my daily work with which I read data from and write back to the HDFS. There are many storage formats in HIVE, such as textFile, Avro, and so on. Today we will talk about three popular formats that are widely use in HIVE world, also in Spark and even the entire distributed file system world. Not talking about some classical formats like textFile , SequenceFile, RCFile, doesn’t mean that they are not important or good enough, so you’d better have some look at them to help you make sense of the storage formats in HIVE.

Read More


I’ve been using HDFS as storage for almost 3 years reading data from and writing data to it by HIVE and Spark, but I’ve never learned the detail. Finally I have some time to watch the Big Data Essentials on Coursera, which inspired me to have a deep dive in HDFS architecture. This blog contains so much about HDFS that I spent 3 days to sum up and mark them down. If anything is worng, it’s very nice of you to tell me and I’ll figure it out! Let’s take a look.

Read More


This blog is the third part of Apache Spark tips sum-up learnt from my programing and debugging. Have been busy for such a long time, I have some time to carry on my personal blog. Today I’ll show you some tips about some functions in Spark SQL, and there may be some mistakes caused by my misunderstanding. Anyway, thanks if you figure out anything incorrect!

Read More

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×