ORC Stored HIVE Tips

Jun 22 2019 hadoop 5 minutes read (About 691 words)

ORC stored HIVE table has been very common in my daily work. Having a deep dive in ORC format recently, I realize that there are many awesome features in ORC format, and many of them have never been used before or somehow in the wrong way by me. So I take some time with some little tests and present two of the most exciting features of ORC stored HVE table in this blog. Let’s start this!
BTW, if you are not familiar with ORC, you can take a quick view from my previous blog.

YARN Architecture

Jun 11 2019 hadoop 7 minutes read (About 1113 words)

YARN, the abbreviation of Yet Another Resource Negotiator, is introduced in Hadoop 2.0. Compared with MRV1(MapReduce Version 1), YARN takes over the responsibility of resource management and job scheduling in MRV1, and make non-MapReduce jobs run on the Hadoop, Apache Spark for example. Although there are some rising technology that been treated as alternative of YARN by more and more developers, Kubernetes for instance, YARN is still widely used. Let’s talk about YARN today.

Arvo, Parquet and ORC

May 5 2019 hadoop 8 minutes read (About 1209 words)

Dealing with HIVE is one of my daily work with which I read data from and write back to the HDFS. There are many storage formats in HIVE, such as textFile, Avro, and so on. Today we will talk about three popular formats that are widely use in HIVE world, also in Spark and even the entire distributed file system world. Not talking about some classical formats like textFile , SequenceFile, RCFile, doesn’t mean that they are not important or good enough, so you’d better have some look at them to help you make sense of the storage formats in HIVE.

HDFS Architecture

Apr 28 2019 hadoop 12 minutes read (About 1851 words)

I’ve been using HDFS as storage for almost 3 years reading data from and writing data to it by HIVE and Spark, but I’ve never learned the detail. Finally I have some time to watch the Big Data Essentials on Coursera, which inspired me to have a deep dive in HDFS architecture. This blog contains so much about HDFS that I spent 3 days to sum up and mark them down. If anything is worng, it’s very nice of you to tell me and I’ll figure it out! Let’s take a look.

Spark Tips Sum-up Part-3

Mar 6 2019 spark 4 minutes read (About 642 words)

This blog is the third part of Apache Spark tips sum-up learnt from my programing and debugging. Have been busy for such a long time, I have some time to carry on my personal blog. Today I’ll show you some tips about some functions in Spark SQL, and there may be some mistakes caused by my misunderstanding. Anyway, thanks if you figure out anything incorrect!

ORC Stored HIVE Tips

YARN Architecture

Arvo, Parquet and ORC

HDFS Architecture

Spark Tips Sum-up Part-3

Your browser is out-of-date!