Dharmesh Kakadia's Blog

March 4, 2022

Crypto Removes Friction

There is a lot of debate recently about crypto either as solution to all the problems in the world (“Bitcoin fixes this”) or as the biggest Ponzi scheme. Instead, of arguing which side is correct, I want to provide an alternative lens of looking at crypto through technological progress.

All technologies remove friction

All technologies are tools for removing friction. Good and bad uses of technology are different side of the same coin and is entirely dependent on use of technology by the peopl...

 •  0 comments  •  flag
Share on Twitter
Published on March 04, 2022 16:00

March 13, 2021

What I Consumed This Week

I believe in “what you consume consumes you”, so here is my last week’s diet.

# Podcasts & Videos

Tim Urban - Exploring Ourselves on Infinite Loops. Great discussion and frameworks.

Jonathan Neman - Building the Modern Restaurant on Invest Like the Best. Good discussion on traditional businesses embracing tech & economics of food delivery. Also, check out Doordash and Pizza Arbitrage

Meghan Oprah Interview.

# ArticlesPrediction Markets: Tales from the ElectionDo Amazon ads bring in m...
 •  0 comments  •  flag
Share on Twitter
Published on March 13, 2021 16:00

January 19, 2020

Internals of Spark Parser

In this post we will try to demystify details about Spark Parser and how we can implement a very simple language with the use of same parser toolkit that Spark uses.

# Intro

Apache Spark is a widely used analytics and machine learning engine, which you have probably heard of. You can use Spark with various languages - Scala, Java, Python - to perform a wide variety of tasks - streaming, ETL, SQL, ML or graph computations. Spark SQL/dataframe is one of the most popular ways to interact with Spark...

 •  0 comments  •  flag
Share on Twitter
Published on January 19, 2020 16:00

December 26, 2019

Verifying links with Github actions & Awesome Bot

Recently I started using github action to automate link checking in all of my awesome repos. I have been using awesome_bot to validate links and checks for duplicates, with travis since past 2+ years. I decided to give github actions try with this very simple automation. Github action is very rich and can automate a lot of chores for developers. There are number of existing actions available in the github market place. However, I couldn’t find one that allows me to verify links in markdown. So l...

 •  0 comments  •  flag
Share on Twitter
Published on December 26, 2019 16:00

December 4, 2018

Versatile RStudio development environment on Kubernetes

R is very versatile language for data analysis and widely used for data science and exploration alongside python. RStudio is a great IDE for exploring data using R. RStudio has a lot of powerful features for writing and debugging R code, but while using it on large data, it can be challenging due to:

ScalabilityPrivacy and security of dataAbility to connect R workflows with other tools (Spark, Tensorflow etc.)Backing up the R code automatically

We solve these challenges by running RStudio o...

 •  0 comments  •  flag
Share on Twitter
Published on December 04, 2018 16:00

December 1, 2018

MXNet tools in docker

How to convert MXNet model to Apple CoreML:

docker run -v "$PWD":/data --rm -it dharmeshkakadia/mxnet-coreml-tools-docker python mxnet_coreml_converter.py

For example, if you want to convert Squeezenet model to coreml, to use with iOS.
Run the following from the a directory containing Squeezenet model files (Params, symbols, labels) and will generate squeezenetv11.mlmodel in the current directory.

docker run -v "$PWD":/data --rm -it dharmeshkakadia/mxnet-coreml-tools-docker python mxnet_coreml...
 •  0 comments  •  flag
Share on Twitter
Published on December 01, 2018 16:00

March 31, 2018

Review - Are Ideas Getting Harder to Find?

This is a review of a recent paper Are Ideas Getting Harder to Find? by Charles I. Jones. Slides are also available.

The central content of the paper is answering the question with the following formula :

Economic growth = Research productivity × Number of researchers

The paper presents evidence and arguments that even the economic growth has been relatively stable over the years, there is a clear downwards trend in the research productivity. This is compensated by more and more people getting...

 •  0 comments  •  flag
Share on Twitter
Published on March 31, 2018 17:00

March 8, 2018

Automate SQL server data pipelines with Kubernetes

Kubernetes provides a great way to run modern infrastructure. SQL server is a widely deployed database. When you combine these two, you get a robust way of running a data pipeline using a modern platform.

Data pipelines are large part of all data infrastructure. The need to move data between different systems, is almost universal and tools/process to achieve this is generally referred to as a data pipeline. In this post we will see how we can leverage Kubernetes jobs API to build and run data pi...

 •  0 comments  •  flag
Share on Twitter
Published on March 08, 2018 16:00

January 9, 2018

Write a Presto query logging plugin

Presto is a fast distributed SQL query engine for big data. I wrote a more introductory and up and running post a while back.

Presto users frequently [1, 2, 3, 4] want the ability to log various details regarding queries and execution information from Presto. This is very useful for operationalizing presto in any organization. Logging query details allows a team to understand the usage of Presto, provide operational analytics and identify on performance bottlenecks. If you want to know how to ac...

 •  0 comments  •  flag
Share on Twitter
Published on January 09, 2018 16:00

December 24, 2017

Analyzing Azure Storage Performance

I work on performance of Big data systems at Azure HDInsight and as part of benchmarking, many times I need to analyze the performance of the cloud storage. Performance of the storage system plays a very critical role in the performance of the cloud big data systems. Even though there are public benchmarks available for theses systems, its important to measure performance for your workload. In that spirit, we will see how to leverage storage logs for benchmarking your big data workload on Azure ...

 •  0 comments  •  flag
Share on Twitter
Published on December 24, 2017 16:00