What I was after was to see how easy is to write SQL in Spark-SQL. In this micro-post I will show you how easy is to SQL a JSON file.
For my experiment I will use my chrome_history.json file which you can download from your chrome browser using the extension www.JSON-XLS.com. To run the SQL query on PySpark on my laptop I will use the PyCharm IDE. After little bit of configuration on PyCharm, setting up environments (SPARK_HOME), there it is: It only takes 3 lines to be able SQL query a JSON document in Spark-SQL.
(click image to enlarge)
Think of the possibilities with SQL, the 'cluster' partitioning and parallelisation you can achieve
Links:
Apache Spark: https://spark.apache.org/downloads.html
PyCharm: https://www.jetbrains.com/pycharm/