What is a C file?

What happens when you type ls *.c? A C file is a source code file for a C. The C files can be edited in any basic text editor, but the syntax of the code will be not displayed, as do the other…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Building a Serverless Data Pipeline

How I designed my OpenWhisk-based app to work with data efficiently and cost-effectively

As well as writing simple tutorials for serverless technology, I’ve also been enjoying using it in the applications I build inside the team. Today I thought I’d show you around one of these projects. The code is open source, and it might help to give others some ideas on how they can use this technology themselves.

The data pipeline we built with serverless functions feeds a web dashboard that our team uses to monitor Stack Overflow for tags we’re interested in answering.

To achieve this, we run some search queries against the Stack Overflow API every few minutes and store the results in our own database. If any new questions appear in the search results, a notification goes to a bot to share with a particular Slack channel.

There’s nothing in this project that couldn’t have been achieved with a serverside script of more or less any other technology stack. One reason to choose serverless in a project like this is because of the billing model. With a serverless setup, charges are only made for the time spent with the actions actually running. For this situation, where the code is idle for most of the time in between queries, this can be a pretty good use case.

The application itself is pretty simple, which fits well with serverless. It also doesn’t matter if we encounter “cold start” times in serverless. (Cold starts are where an action that hasn’t been used for some time runs more slowly than normal, usually because it has been removed from memory and now has to be reloaded.) That’s because this part of the application is simply moving data around, updating existing records, and creating new ones. It really isn’t critical if we sometimes get the database record or notification half a second later.

Working with FaaS (that’s right — Functions-as-a-Service!) means working with fairly small components. The holy grail is to build a collection of reusable components, which isn’t always possible but is a great thing to aim for! The main points that I look out for when designing for serverless are:

For the Stack Overflow project, I created four components, grouped into two sequences:

The socron sequence:

Next the qhandler sequence operates on each of the question results retrieved:

Serverless functions run in response to an event. In this case, the event is the equivalent of a cron job. The built-in alert trigger will be configured to fire every five minutes, and to pass in the tags to be used in the API call to Stack Overflow.

In fact, we use a whole bunch of these types of triggers on the “real” version of this application, at different frequencies and with different tags. This helps to spread out our API calls and avoid the rate limits on an external API.

The trigger needs a rule to link it to the action or sequence that should be run:

When a rule has been changed it becomes disabled, so the code sample above includes the command to re-enable the rule.

Since the API call will be asynchronous, the request is put into a JavaScript Promise object that is returned. When the promise is completed, it will be either rejected with an error or (hopefully) resolved with the data that it fetched.

The output of this action is a data structure including a list of questions. This becomes the input to the next action.

Sequences are a chain of actions, but in this case one action per data item is needed.

This is the only component that needs to hit the database. The data that was written is also passed along to the final piece in the puzzle: the notifier.

A webhook is simply a POST request, and the first action had an API call in it, so this part probably isn’t a surprise! The request library is used here:

There you have it: one working data pipeline, consisting of four moving parts.

Working with data in this way is a great fit for serverless. The scalable nature of the compute fits well with the distributed approaches that are widely used with big data already (think of MapReduce, for instance). The cost model means that it’s viable to use a platform like this on an as-needed basis, for example when working on importing or cleaning up a large dataset. And crucially the technical barriers of entry are low. My team are a majority of JavaScript developers, but IBM Cloud Functions also supports Python, Java, and Swift as first-class languages. So developers of all stripes can be up and running very quickly on this platform.

With this post I aimed to show off not only the detail of the code, but also give a sense of how manageable the serverless platforms are to develop for. Now I’m hoping that you’ll build something of your own, and let me know what you choose!

Add a comment

Related posts:

Variety of uses of Dexacoin for Daily Life

The security system is one of the ways used by Fintech application manufacturers to attract the attention of the public. Currently the Fintech application is so much in favor of by various groups. If…

Bikin Heboh Sule Ngajak Main Slot

Bikin Heboh Sule Ngajak Main Slot — Halo sobat GnMedrol. Tentu hampir semua orang mengenal komedian papan atas Sule. Namanya sudah puluhan tahun terdengar oleh masyarakat Indonesia melalui…

Vaccine Mandates are Dangerous

For almost two years, Covid has caused havoc around the world. It is a scary thing, because we know so little about it and there are just no solid studies on it, because it’s so new. And so…