Boring is Awesome: Exploring Manufacturing Databases with James Sewell
Join Jeremy Theocharis of United Manufacturing Hub in a riveting talk with Timescale's James Sewell. Discover key database trends, TimescaleDB's role in industrial data, and the value of proven tech in manufacturing.
In an enlightening discussion, Jeremy Theocharis, CTO of United Manufacturing Hub, had the pleasure of conversing with James Sewell, a seasoned expert in the realm of databases, currently making strides at Timescale. This interview was not just a dialogue but a journey through the dynamic and evolving landscape of database technology in the industrial sector. We delved into the depths of database management, unearthing valuable insights and expert perspectives.
- Emerging Trends in Databases: James offered his take on the latest trends influencing the database world across industries, particularly focusing on the shift towards cloud-based solutions and the nuanced debate around serverless technology.
- Programming Languages in Action: We discussed the practical applications of programming languages like Golang and Rust at Timescale and their relevance within UMH, underscoring the importance of choosing the right tools for the task.
- The Role of Developer Advocacy: James shed light on his day-to-day responsibilities as the Senior Director of Developer Advocacy at Timescale, emphasizing the importance of community engagement and developer support.
- Simplicity in Database Selection: One of the key takeaways from our conversation was the strategic importance of simplicity in selecting databases for manufacturing needs. Sometimes, the most straightforward solution is the most effective.
- TimescaleDB's Industrial Application: We explored the role of TimescaleDB in managing large-scale time-series data within industrial settings, praising its compatibility with PostgreSQL and its modern approach as an alternative to traditional historian databases.
- Efficiency and Cost Savings: James highlighted how TimescaleDB stands out in terms of efficiency and cost savings, offering a compelling option for industries looking to modernize their data management systems.
Jeremy: Hi, this is Jeremy, co-founder and CTO of the United Manufacturing Hub. We're here with James Sewell, Senior Director of Developer Advocacy at Timescale. We're going to explore his expertise in databases, his role at Timescale, and his insights into the broader IT industry. Hi, James.
James: Hey, how's it going? Good to be here.
Jeremy: Yeah, nice that you're here. So maybe to get started, what do you do outside of work? What are some of your hobbies and interests?
James: Look, I'd love to say that I've got a super interesting hobby or something that keeps me really busy when I'm not working. But in all honesty, I've got a five-year-old. Keeping up with him and doing the stuff he's into consumes a lot of my time outside of work. And he just found out about Beyblades, which I knew zero about previously.
Jeremy: They still exist?
James: They still exist, and when you're a five-year-old, you have no concept of that being a previous thing. So, I spend a lot of time battling plastic spinning tops at the moment and alarming the speed of time.
Jeremy: Well, I didn't know that the Beyblade thing still exists. I thought it was just a small time period. No, it still exists, and it's brand new when you're a five-year-old. Okay, let's get more into your professional journey. I looked you up, and you have, if I calculated it correctly, over 25 years of work experience in the field of databases and administration. So, I assume there must be some really fascinating stories that stand out. Could you share them for our viewers and listeners?
James: Yeah, I've got a couple. I've predominantly worked with Postgres in my professional life. That's the main database I've worked with. And I suppose the classic scenario that happens time and time again with lots of different customers in Postgres is you get someone with a database. They come to you with a one or two terabyte database. You think, well, that's kind of weird. That's not what they're talking about. And it turns out they've turned off the housekeeping, just the vacuuming process. And it just keeps on growing and growing. And then, like a couple of times for that, there were times it was the biggest differential. It was actually built into an appliance, and it just kept growing. And they actually, when we shrunk it down, had about 10 gigabytes of data. But their queries were just slowing down over a period of about five years. And I suppose the other one that comes up quite often is people who've got really large datasets, but they never really thought it was going to be a large dataset. So they've never done any partitioning. They've never tuned the schema, and then their database slows down to like a halt. And that's what we used to get called in my old company. Like, what's going wrong? Why are my queries really slow? It's quite interesting, looking back on all of those kinds of things and thinking about how Timescale could actually be used to solve those problems. The other big one is like people who just have no idea about operationalizing a database. So they put a database in, you'll come in, they'll give you a presentation on their application. It'll be amazing. Like all of the newest technologies, but the database is just this thing they keep in the corner. No one really knows how it works. They hope they've got backups. No one's tested them. They might have HA that worked when they did a test when they first deployed it. I think that kind of thing, those are the stories that stand out. And I think they're actually, I would love to say stuff like that is getting better in the database world, but I think it's probably actually getting worse because with the advent of DevOps, there are less and less DBAs in the world and more people who are like really solid generalists, but they don't necessarily specialize in database.
Jeremy: Who were those companies that you worked for? What industries were they in? Was it like manufacturing or was it like banking? Why would they pull in a database specialist? How large would they have to be to pull in a database specialist?
James: Yes. I mean, this was across many industries. They were like quite a bit of, it was in Australia at the time. I worked with manufacturing companies, a lot of telco work, so a lot of telecommunications kind of stuff, a lot of banking companies. I can't reveal specific companies obviously, because they'd be very
upset about that, but yeah, it was primarily manufacturing, telco, and banking.
Jeremy: So you have experience in a large set of industries. What are some of the trends that you see currently going on in the IT or database industry? How is it currently moving? What's currently happening there?
James: Yeah, across all industries. Look, I think the big trend at the moment that obviously has been happening for a long time now, but like everyone wants to move everything to the cloud still. So, putting databases in the cloud is very topical. Serverless is still a really hot topic, even though it's fairly polarizing. Some people are very pro serverless, some people are not. And I think a really topical thing is mapping that back to databases, like trying to work out what serverless might look like in a database world. Personally, I don't think it maps to databases very well because you don't really want to scale your production workload down to zero. But you obviously need to have some sort of dynamic quality. So people want to be able to pay less when they're doing less and scale up when they're doing more. We're really going in that direction with Timescale at the moment with our cloud products, we're sort of all in. And so far developers have seemed to be really receptive to that kind of thing. Like, it's not serverless, but it's got the facets of serverless that you're looking for. And I think another one in the database industry is specialization. Like, I feel there's a database for everything now. Oh yeah. No matter how small your niche is, there's a database for it. We're a little bit different there. So we don't actually think you need a database for everything. You just need one really good database that can handle many workloads. So the amount of additional, um, cognitive and operational complexity that goes into some of those databases as admin. They might scale out very well, but they also scale out the complexity of your stack and often you see that they come from like. It's funny if a project at LinkedIn or Google or something like that, which is like really nice on paper, but most companies don't need that. They don't have that kind of scale. People don't think about whether that's something they need to do or not. They're trying to optimize really early by putting in this piece of technology that does a lot more than they need, and they're really trading their current resources for potential future benefits that may or may not show up. So that's the thing at the moment. I think specialization of database is a big thing that my message there would be like, develop the application, don't develop technical debt.
Jeremy: And now getting back to the topic again, you are now working at Timescale. So with your 25 years of experience in databases, what brought you now to Timescale?
James: Yeah, so I've been working with Postgres for basically all of that time, 20 years, maybe slightly less. Before I left, I'd been at my previous job for 20 years. So a large title bet with one employer. I was the chief architect at a consultancy company in Sydney. I loved the job. I loved the technology I was working with. So predominantly Postgres, Kubernetes, Rust. That wasn't sort of driving change in those areas. I was just helping people with problems, which is fulfilling, but I wanted to get onto the product side and see what I could do. So, I always like to keep a list of companies that I'm sort of interested in, that are up and coming, that I could see myself working for. Timescale was top of that list. They were always in the PostgreSQL space, which I loved. I've been using them from day zero on side projects and then putting them into custom workloads. Then they built a Kubernetes cloud and then they started developing hyper functions in Rust. So it was just like a really nice parallel to what I was doing in my professional journey. Emailed Mike, our CTO, and said, what have you got going? And I think I just happened to do it around about the time when they did have a bunch of job openings and the rest is history.
Jeremy: So the hyper functions are actually developed in Rust?
James: Our hyper functions are written in Rust, yeah.
Jeremy: Oh, nice. Yeah, we're typically a Golang company because it's much easier to learn for new developers or if you're coming from an entirely different field to come in. And the learning curve for Rust is just like really high. It's when you start doing it, it's like running against the wall and the wall will never stop and after a year I'm not through that yet, but apparently you start to really love the language because all people I heard that are programming in Rust, they are really, really excited about it.
James: I used to develop in Python before Rust. And I suppose I was more at a stage in my development career where I would write... You can make an argument either way, but in reality, they were more like throwaway things or maybe like a simple daemon or something like that. And for me, once I worked out how to use Rust, I think it just came at a time when it really revolutionized the way I think about programming, especially in terms of handling errors and stuff like that. I probably could have got that just from taking a step back and thinking about Python more, but now I look at all the old stuff I've written, and I'm just like... Oh my gosh, I didn't think about any of these edge cases. I have no idea what's good or how to win when this fails. I have no idea what these types are, even though I'm comparing them to each other. Rust is really... I know it's not for everyone, and I know it's got a steep learning curve, but it just really suits the way I think. And it really helps that you can integrate Rust directly into Postgres because obviously, Postgres is in C, and Rust is really good at talking across the FFI boundary.
Jeremy: Okay. Getting back to the topic. You started as a developer advocate and now you are Senior Director of Developer Advocacy at Timescale. So what does it mean? What is it that you do in your daily work? Because every time I ask you, you seem to be in a different place on earth. What is it that makes it so dynamic?
James: So I kind of feel like a fraud answering this question. I obviously run the developer advocacy team, but it's my first role under that umbrella. Before, I was like a chief architect, and then I came here, and all of a sudden, I was in a marketing team. I report to the VP, actually, the Chief Marketing Officer now. So, developer advocacy, in my perspective, in a nutshell... We meet developers where they live and help them get to know Timescale. We want to get them excited about the product, show them what the product can do, but we do it on a very personal level. So, we speak at conferences, write blog posts, answer questions in the community, answer questions in Slack. We're part of the open-source community and constantly gather feedback from developers, bring it back to engineering and product, and then iterate. So, greater marketing talks to developers with the Timescale brand voice, whereas developer advocates talk to developers personally with their own voice. We want to understand what developers are having trouble with, be part of their lens, and make sure that our product going forward has input from those developers. So, we're kind of like their voice back into the company as well.
Jeremy: So now, to get back to the technical field. One of the books I read and that personally had a huge influence on me is "Designing Data-Intensive Applications." Do you know it?
James: Yeah, I do.
Jeremy: I'm asking because it personally helped me a lot to get an understanding of what's actually behind databases, how they work in the background, what are the tools and techniques, not brands, that power them, how new are they, how proven are they. It personally helped me a lot. But I can understand it's like a 600 pages and above book, not everyone has the time to read that. It took me like a year. So maybe, in your own words, what are some general concepts and strategies that you could give to someone from, for example, automation, that is new to this field, so that they can filter out whether a solution is good or not? Is it just fancy material or is it fundamentally sound and based on well-established best practices?
James: I'm not sure if this is the direction you're trying to take this question, but some very pragmatic advice I would give to people is to just make sure you come up with your requirements. Then evaluate the databases that you think might fit those requirements with a very clear sheet. As you said, there's a lot of hype out there. There's a lot of historical products out there. But you should really be looking at what each solution gives you, whether it can deliver on those things. Do you need each one of those things that it's going to deliver to you right now? And what new technologies or languages or frameworks does your team need to then learn in order to use this new technology you're bringing in? So, there might be the best database in the world, but if it's got some esoteric language that's going to take your developers a year to learn, it's just not pragmatic. I don't think anyone should be choosing a database based on any sort of absolute value. So, the absolute fastest, the absolute lowest storage. That's not how this world works. There's never going to be a database that outperforms anyone else by so much that that's going to become a factor. There are going to be a lot of contenders. You should be looking at whether it's more cost-efficient, more performant, more stable, how long it's been around for, how you get support for it, how many developers are out there that use it, what kind of language does it use, does that language talk to the other products you've got in your stack, all of that kind of stuff, which is quite boring. These kinds of evaluations are not like an exciting thing. It's just making sure that when you tie yourself to something, you can be happy with that decision and it's not gonna pull you down. You can create some insanely high-performing complex solutions, and similarly, you can get some insanely expensive ones, but you really want to ride the middle ground. So you want something that's probably cheaper than the more expensive ones, more stable. Often, the middle ground is more performant than you think. So what I'm trying to say is like, you don't need to get absolute performance. You don't need to pay mega bucks. You need to find something that's in the middle. I think it's a very good approach to first write down the requirements and not let yourself be guided through all this hype stuff. Like, you need a graph database because it's super hyped and it can do some very certain small things very well. I like to try to be conservative also in terms of the number of technologies that you use. That should be the first question. Like, do I need what this thing is selling me? Can I just model my API using REST? Do I need a graph database? Perhaps you need a graph database. Do I need the absolute best graph database out there that's going to be really complex to run? Or can I get away with bolting a graph database API on top of my existing database? And you can answer that question again and again with Postgres, but it obviously depends on what your tolerances are. I mean, there are other databases where you can do that as well. I think you should get a very long way with a relational database. And I think lots of companies don't need to look beyond that.
Jeremy: So now, all of the principles that we talked about, could you apply them to Timescale? I think most of our viewers and listeners are already familiar with Timescale because I've written a lot of blog articles about it, and they might know it from using the United Manufacturing Hub. But maybe in your own words, what is it that you offer? And where is your sweet spot? Where would you say it really fits well into a company's architecture?
James: Yeah. So Timescale's goal in life as a company is to give developers a modern, friendly cloud database that not only performs well but scales well. We're based on Postgres, which is like proven, I mean, some would say boring, stable technology. It's been around for a very long time. Our sweet spot is excelling at any workload which has a sort of high volume of data. Um, so that could be sensors or measurements, could be stock prices, could be event streams, anything where time's important, I suppose. And you've got data coming in. That's our sweet spot at the moment. So we make that data easy to ingest, easy to manage, easy to query. And then we also help with analyzing that data and sort of life cycle management. So deciding how long you want to keep the data for, where do you want to compress it, when you want to send it off to S3, all of that kind of stuff. I know I just mentioned this, but we're based on Postgres. We're not like a fork of Postgres. We're actually set on top of the open source Postgres project. And we're unashamedly anti-silo. So we don't think you need another stack with another database technology to do something that Postgres can do, especially if you've already got relational databases in your stack, which most people do. You sort of just need a way to get the most out of the most popular database on the planet, which is Postgres now, according to Stack Overflow. So you don't need silos. SQL, we believe, is the best language for querying most data out there. And we believe that you should be able to join between all of your data.
Jeremy: And what about other solutions? I know that in manufacturing, the databases are very often 90s technologies or maybe even... self-written databases. I've seen really a lot of stuff that's happening there. And what would be reasons to use something different to it? I mean, I've seen it. What's really best practice, for example, in pharmaceuticals or oil and gas? The big production lines or big parts of a production that are delivered together with SCADA systems, they actually store everything in Microsoft SQL databases. Or similar technologies.
James: It's funny, like historians are almost like the opposite of what I was talking about before, right? Like I was talking about specialization becoming very common. Historians are like something that's been specialized from the start. So they're like the inverse of what I was talking about, but like, the inverse message remains true as well. Like, why would you be paying a lot of money for this product? If a commodity database can do it. Um. In most cases, I know you've written a lot of articles, we find that the commodity database can compress just as effectively as a historian. In a lot of cases, it compresses without loss, whereas historians actually lose data when they compress a lot of time. You can connect to a lot of other things. So historians are quite limited in terms of what you can get in and out because they're like this, I suppose, proprietary bolt-on layer on top of SQL Server or something else. And... Their whole game is to not let you get your data out, not interact with other things. They don't need to do that. They've got the market, right? So I think being able to then connect all of your time series data through to any other product, be it OT or IT, is just a massive boom.
Jeremy: So we're including Timescale by default in the United Manufacturing Hub. So most of the audience would already know it, but when would it be beneficial for them to actually upgrade to your cloud offering? You said that you are cloud only and most historians are, I would say, on-premise only. So what would be reasons for companies to change that?
James: So I suppose one reason. So for a start, United Manufacturing Hub includes TimescaleDB, which is like the Arkansas product that's sort of at the core of Timescale's offering. So we sort of built the Timescale cloud product around TimescaleDB and now it includes a lot of other features that aren't included in TimescaleDB. You might want to start thinking about a cloud product for the reasons we talked about before. When the operational complexity of running a database becomes too high, then you might want to look to a cloud vendor to run that for you, like Timescale. If you want to make sure that you've got backups, high availability, read replicas, connection pools, all of those things set up with a single click. If you want to be able to look at query stats, all the kind of stuff that you would... possibly be able to do by yourself, that would be like quite a lift. And you'd have to employ operation people to do that. Obviously a lot of it depends on what your company's cloud policy is. Some companies in manufacturing aren't allowed to send data to the cloud. That's fine. Often we see people wanting to send their final data or their archival copy to the cloud. That's also fine, you can do that. So I suppose it really comes down to how your particular company is able to use a cloud product. But if you can use a cloud product, using something like Timescale gives you a lot of benefits. So I don't want to run through all of our features because this is really what this is about. But we've got stuff like usage-based storage. We just pay for the storage you're using so you don't have to overallocate 20% in case you run out of disk. You don't even have the concept of volume size anymore. You just know that you'll get charged the amount of storage you use. We have data tearing so you can send archival data off to Amazon S3. But then the good thing about our data tearing feature is you can still query it transparently. So you don't even have to change the query. The data is always there. It's just slower when it comes back to your archive. We're pretty soon going to have dynamic compute, which we talked about before where you can have a range of instance sizes almost. So like you have a two CPU to eight CPU or two CPU to four CPU machine, and you'll be able to float around within that range. So you won't scale to zero, but you also know that you've got headroom when you need it. And you're certainly not paying the top dollar when you're sitting a bit more idle. And I suppose support, like on top of that, we've got an amazing support team. If your organization doesn't specialize in running databases, then our support team will definitely be able to help you with getting into Timescale, running in Timescale effectively. We answer architecture questions. We answer schema questions. We're not a strictly sort of Blake Fix kind of support platform.
Jeremy: So, trying to summarize, if you have a good internet connection and you want to have data from multiple production sites and you want to store it and make it accessible and you have a lot of operational overhead going on there. So it could be a way of, of doing a trade-off. Like perhaps you, maybe you spend a little bit more money on cloud, but then you would save a lot of time in the maintenance thing because you would get this entire data storage thing for this. You could reduce the effort of the entire data storage thing. What about companies that are cloud-native, but they want to run it in their cloud, their private cloud, whatever that means. How do you typically set that up?
James: Yes. We don't actually have a bring-your-own-cloud option today. So we are just our cloud product. We can do VPC peering to your cloud. What does it mean? So that means that you have a secure tunnel between our cloud and your cloud, and then you store the data in Timescale, but it's not going out on the public to get to your operation. That's quite popular, quite a few people use that. But we don't have a bring-your-own-cloud in the strictest sense where we operate our infrastructure within your cloud.
Jeremy: All right. I think this was a really interesting conversation where we looked at where you're coming from, your database expert, and what brought you now to Timescale, what Timescale can provide, and when you should use it and when you shouldn't. And I think it was really interesting. And I hope our listeners and viewers liked it as well. And if you have any questions, feel free to put them in the comments, wherever you're listening or watching this. Feel also free to go into our Discord channel and ask questions there or contact James directly. Maybe as a last question, what would be your recommendation for listeners and viewers that are currently hearing this and they want to get started with Timescale? What would be the next step for them?
James: Obviously, you can have a look at our website. If you want something a bit more interactive, which I would encourage, then you can join our Timescale Slack. We have a Timescale Slack where you can join and talk to the developer advocates, ask questions, anything goes. We can give you advice on anything to do with Postgres or Timescale or time series. I'm also lurking in your United Manufacturing Hub Discord. So, if you mention Timescale in there, or if you @ me, then I should hopefully show up fairly quickly. I'm happy to take questions in there. And obviously, if you want to try out Timescale, then go to timescale.com. It's not short, is it? Well, timescale.com, and then you click on the 'Try for Free' button. You can try out our Cloud product for 30 days. You don't need to enter a credit card. You can just give it a go and spin up instances.
Jeremy: OK, nice. Nice to have you here. Thank you.