Even though this wasn’t causing any problems, at 80,000 posts per day and growing we knew eventually it would. The above approach works well for us but we noticed our queries were getting slower as our index grew, regardless of how many nodes we scaled out to. It’s mostly a matter of structuring the query and setting up the indexes accordingly. This is why it’s important to stay up-to-date on the Elasticsearch versions and utilize deprecation logging as efficiency upgrades are happening rapidly.Īs you can see, the query DSL is powerful, and most filter queries will be much more trivial than this one. Since there is no easy way for the developer to know the point at which they are over-caching, Elasticsearch has taken that burden off of the developer and into their own hands by making sure that only reused filters are cached. The problem with this was there was a point at which we would be over-caching and taking a performance hit, thinking our efforts would net a performance gain. Before Elasticsearch 2.0, it was up to the developer to provide caching logic for these queries.įor example, if we wanted to cache the following ids that this query uses, we used to have to provide a _cache_key field with the name of the key and expire that cache manually when the list changed. This is pretty straightforward when looking at it from the outside but there is some caching magic Elasticsearch is doing under the hood of this. The size field is used to limit the result set Elasticsearch gives back and the sort field says we want to sort in descending order. It knows the set of following user ids for user 123 because we have specifically told it to route through the friendships index and use the following field for the set of following user ids. We are asking our messages index to give us a filtered set of messages in which the user_id field must be equal to any ids that user 123 is following, similar to an IN query in SQL. Let’s take a deeper look at how this query works. All we have to do is query the messages index with a get request to the url with the JSON query attached: His user_id is 123 and he is requesting his stream of messages from the users he follows. This is pretty straight forward, but how do you query these indexes to get a stream of messages that a single user is following? Each friendship document is better suited by using the user_id value from our database as the _id value for the document. Since each document in Elasticsearch has an _id value, we can assign each message document the id value we have in our MySQL database for easier reference in the future. User_id: id of the user who shared the messageįollowing: Array of user ids of which the user is following One called “messages,” which has a body and user_id field on it, and another called “friendships,” which has a following field that is an array of user id’s that a particular user is following. We were recently forced to rethink how our entire system was architected and realized that it is a perfect use case to implement Elasticsearch. As we added more streams, the complexity snowballed: more code, more maintenance, more points of failure. This worked in the beginning, but was no longer scalable as our user base continued to grow. If you have 100,000 followers, that means every time you post a message, that message would have to get inserted in 100,000 individual sets in Redis so your followers can see your posts. Imagine you are a user and every stream you want to look at is an individual set in Redis. The initial architecture for this use case involved heavy database queries on MySQL and a lot of complex caching with Redis. This needs to be done quickly and at scale, regardless of how much our traffic grows. We also need the ability to query for posts from users that a specific user is following. More specifically, we need the ability to query our collection of posts and get only posts that have a “cashtag” mentioned in them ($AAPL, $GOOG, etc). One of our primary features is the ability to view chronological streams of message posts based on specific filters. StockTwits is the largest social network dedicated to the finance community, with 1.5 million monthly active visitors. In this article, I want to share with you how at StockTwits we overhauled our message sharing system that took us from frequent downtime and general slowness to lightning fast requests and very happy users – all while allowing us to continue to scale in the future as traffic increases.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |