Spark Streaming With Python and Kafka
Last week I wrote about using PySpark with Cassandra, showing how we can take tables out of Cassandra and easily apply arbitrary filters using DataFrames. This is great if you want to do exploratory …
Deep dives into Apache Cassandra, distributed systems, performance optimization, and software engineering at scale.
Last week I wrote about using PySpark with Cassandra, showing how we can take tables out of Cassandra and easily apply arbitrary filters using DataFrames. This is great if you want to do exploratory …
A few months ago I wrote a post on Getting Started with Cassandra and Spark.
I’ve worked with Pandas for some small personal projects and found it very useful. The key feature is the data …
Just wanted to let everyone know I’m going to be doing a Google Hangout on Air on Thursday, 2pm PT / 5PM ET on Python Performance Profiling. I’m going to be covering several tools and …
I’ve been messing with Apache Spark quite a bit lately. If you aren’t familiar, Spark is a general purpose engine for large scale data processing. Initially it comes across as simply a …
!-->
The webinar from Nov 18, Diagnosing Problems in Production, has been posted to YouTube. I’ve embedded it at the bottom of this post.
The webinar is an extended version of the talk I gave at the …
Yesterday I was pulling down some stock data from Yahoo, with the goal of building out a machine learning training set using Spark and Cassandra. If you haven’t tried Cassandra yet, it’s …
When I moved out of my last place I decided it was time for a grown up desk. I left behind a beat down Ikea that I had used for close to a decade, I think it has more than served it’s purpose. …
It’s important to be able to maximize turnover and confusion while minimizing employee retention. This is by no mean an exhaustive list, but it will, without a doubt, be successful, unlike your …
Last week at the Cassandra Summit I gave a talk with Blake Eggleston on diagnosing performance problems in production. We spoke to about 300 people for about 25 minutes followed by a …
Get the latest insights on Apache Cassandra, distributed systems, and performance optimization delivered to your inbox.
Subscribe to Newsletter