[Music] Welcome to another episode of Tech Unhinched where tech gets human. Joining us today is our tourist podcast in engineering manager and data platform lead at Kayak. With over 15 years in software and data engineering, our tour specializes in cloud migration, scalable data platforms and integrating advanced technologies like Trino, Spark, Kubernetes and machine learning ops. We are excited to have you on the show today Arturus. Rabia, thank you for having me. Before we get technical, I was curious what caught you into the world of data in the first place. Was there a this is it moment for you? Yes, I think so. As you mentioned, I started my career in the software. Well, now it’s more than 20 years ago, but uh from the day one uh I think I found the databases somewhat uh magical and ability to store the data and basically be able to pull the stuff. I still remember my first query which was like my SQL query which a bunch of like my SQL injections in that which is you never run on production. But uh fast forward I always was working around data not even realizing that things like you know like data is a thing to do at all until around I guess like 2015 where my title actually became a data engineer and it was like very clear the moment where it was like this is it. I think this is where I realized that uh you can provide so much value to the business from the data you know like pulling data and then sort of like realizing that there is much more around the field that you initially thought well Arturus you’ve gone from coding in PHP and Java to helping shape how entire data systems work. If you had to describe that journey in one sentence, how would you sum it up? Crazy. But yeah, but like if if to go in one word. So as as I mentioned, right? So there was a stage in my life when I was working in the accountant company and we were creating accountant app which was built on top of the database. It’s just sort of like was this old times where you have a database but then you sort of like has like a forms and it was like all accountant applications just built on that. And the reality is just like I didn’t really realize that you know like the structured data sets actually changed the way how my thinking like even right now I’m realizing that I’m required to have a structure in my life otherwise I feel unsafe and this is just like reality I think what I do find you know like uh the experience from like PHP Java and all over software engineering was is very helpful because the data world doesn’t really follows majority of the software engineering principles or patterns and the reason for that I actually had a conversation with my friend the reason why you don’t see many patterns in data engineering now in the computer sorry in the data science these days or like LLMs is because the value which the data engineers back then and even now and the value that the data data sciences brings now is way way out overweights the value which you get from you know like making things correct and optimal like no one really cares if your code optimal or not optimal when your change will you know like earn thousands of or millions of dollars. So I think the important piece here is that we have these cycles and this is what I actually seeing and I know that I’m going away way off with your question. I’m sorry about that. It’s just like let me just finish that. So what I want to say is just like I do see a cycles which is repeating over and over again. So me being in the software engineer helped me a ton to sort of like learn the principles. Then I was able to get into wild west of the data engineering and slowly apply those principles in there. That’s what made me successful. Now I see a lot of like machine learning and you know like the data science things which is now again is the wild west. I’m hoping to be able to in you know like be a part of that and bring some structure to it and make it more successful as well. For those who are listening what does data engineering actually mean today in 2025? Oh yeah that’s the world is changing. It’s not only changing for the data engineering. It’s changing I think for all the for everyone who works in the computer science or any you know adjusted field. I mean it’s it’s sort of like I don’t know it’s AI is not a new thing. We always had AI you know like if you look back how to say it is that we had you know like sort of like machine learning how long like 50 years ago like initial starts of sort of like doing stuff like that. I I don’t really you know like not really a big expert on those things like now these days AI is a little bit like overused phrase. Let’s call for the things what they are. So today if you know like the Chibbt and OpenAI showed us something. They showed that if you take a huge amount of data and you spend a ton of money you actually can train this huge you know like large language model and then you apply a bunch of transformance that’s what you know T stands for as I understand correctly you actually can get something very useful. I remember a couple years ago the first time I tried GGBT free I believe I was like that’s stupid and then suddenly it was like oh my god what’s what’s that and then a little bit later I did a little bit more research I was like oh that’s amazing and then I remember having a conversation with my co-workers like look like this is amazing and I show them and they’re like oh you stupid that’s all that’s that’s nothing that’s like look how bad it is I was like it doesn’t matter how bad it is you need to look how great it is of what it can do and basically like I think uh now what I’m hearing which I’m not really happy about it’s sort of like it’s this push for like hey you know like the AI will replace everyone maybe it will maybe it won’t I don’t know I mean what I do know for a fact is that the tools which is powered by the AI is actually can improve your productivity so fast like I personally for my personal projects back at home I bought like one of the idees which is powered by the AI I use that as a advanced autocompleter it is amazing I remember back in the day 20 years ago People would be like oh we write Java code on the notepad and then we compile that and we so great and you using IDE you stupid you don’t know how to write Java code I was like yeah but like I don’t need to learn everything from mine I just want to be productive so this is the current word of the data engineering right is it’s the same thing like many companies are pushing for a tools which will change everything and like fully automation but I think what we actually should be looking for is an a tools which is s tools improves our performance and we can actually deliver much more than we ever had before. And this is what you know like yeah so saying that what is very important to note is that you as a data engineer now need to more understand the principles of how the things works in the background. You need more to understand the SQL and the Python than snowflake or the datab bricks because the snowflake and the datab bricks is a layer on top basics which you know like you can do. So if you now use open AI or you know like any other tool uh you actually can very quickly learn data bricks without learning data bricks. You can just say like hey I want to do this thing and it will do. If you don’t know what you need to do or how you need to do you’re screwed. Even with these great tools like the you know like at least for now like none of those tools will be able you to build a pipeline if you don’t know how the pipeline should be built. So as you see I may be a little bit more you know like bold-fashioned I still believe the power in humans and our judgment to decide what makes sense and what doesn’t and this is what I stand behind. from your decades of experience um arours what’s one of the biggest misconception people have even within tech um about data platforms or pipelines right so I think today if you speak with a lot of like AI experts or you know like or or just experts in general everybody thinks of this data platform as some magical box where you just throw any data and then you just run something and you’re getting a data out I mean I remember I I hope this will answer the question but I think this is still true. So I remember once I had a lunch a conversation with my old friend and he just joined new company and he brought the uh lead of data analytics or lead of data science I don’t remember that was like five six years ago and he’s basically coming to me and saying like look we have a problem we have this person who will be changing you know like will be lead of data analytic and now he wants something from me and I don’t understand what he want and I’m speaking with this person and this person said like look you know like back at the company X we have this data warehouse to which I will be able to connect and I was able to run these queries and I was able to pull all of these insights and here we have enough and I’m looking back into it like hey man you don’t have any data engineers to actually build all of that infrastructure for this person to do the job and this is what I think is also the situation some companies might go and hire you know like the data sciences or AI experts but if they don’t have underlying infrastructure or the pipeline available there it’s just wasting of resources on the people who will never delivered that and it’s nothing against any of those people. It’s just not their job. Their job is to rely on the infrastructure which it is. So this is what I think I want to say. It’s just like when you thinking of any data project, you need to make sure that you have the underlying data infrastructure available or data platform or pipeline. So any of these keywords you know should be is there anyone who understands how the data flows like what you Yeah. And that’s I think sort of like it’s the biggest uh thing in my head is that people very often thinks that they can buy a data scientist the data analyst or they can buy a tool and that will fix the problem which is not really the case. You still need to have this understanding of how the data need to flow which database is good for which workloads and that type of things. Art tourist when you know people talk about scaling AI most of the focus is on models and outputs right what part of the data pipeline do you think is mostly uh most dangerously overlooked in that entire time I remember having this seeing this meme which basically describes you know like the data quality and then sort of like the output and there is this you know like common phrase you know like bad data in bad data out and like it doesn’t matter what you do and uh I think one word to summarize all of That is the data quality. Now I had a conversation with person who started his career when my mom was like 10 years old or something like that. So okay yes he is you know he is elderly person and we had lunch and I just really enjoyed talking to him. So he started his career very early ‘ 70s or something like that. I haven’t been born by back then. And we would talked about the data and he was like saying like his one of the first projects he worked was with the US postal service where they needed to recognize the handwritten addresses and basically you know like convert that into something so they would know where the email should be sent and they sort of like did all of that pretty well. What he then sort of like did was sort of like scanning a certain documents not even scanning like you know like they had he was saying like they would be the system which I don’t even know the name and how it works where they would enter you know like the customer’s name and the address and then you sort of need to match which customer it is and he explained me a very simple example he said like look think about this way you have two documents in one it’s written IBM with the spaces between and in another it’s written IBM without the spaces between you as a human understand that the same company. Machine will never understand that that’s the same record. Machine that’s completely different because machine interprets it doesn’t have this knowledge which we do. And the biggest problem we encountering and why sort of like the data quality is important is not because of the machines is stupid. It’s because of us. It’s because we are not predeterministic. We always change our minds how we describe things. We always change our mind how we do things. And what’s happening is that you know like the machine is expecting one result and then like well crazy human changed his mind again like I don’t know to which customer should attach that and this is what is the main problem and this is the main thing which is overlooked right now like why do you think we have this hallucination in there and the reason is because the data quality is not meant for these things so we run and we sort of like and and that’s was sort of like credits to him he was talking about I don’t I don’t I will tell you this way I don’t understand half what he’s saying but he said like one of the things to make LLMs great is the semantic layer and I was like what’s that it’s basically like you need to need to have a layer which would not only be used as ability to learn to predict tokens one after another but also add a knowledge around which tokens is associated with which so actually you would improve your context so you would not you know sort of like add stuff which is you know like crazy or or not realistic and he gave some crazy examples in there and just like you know like that sort of thing. So speaking of you know like the data quality another example which I really want to add. So I don’t know if you know or not. So our organization recently launched Kayak AI and I sort of like uh spoke with you know like engineers back then and one of the thing was like okay so the data quality how can we make sure you know that the user like when it’s writing it’s accurate and this is something what we you know like back at kite also understand as an issue and what we basically trying to do is like you know like we pulling from you know from our partners’ websites all the information and we sort of like trust on the quality of that data which we bring and then we provide it for the user. However, that’s the problem. If the partner’s data is inaccurate, the AI model cannot really anyway you know like validate that’s the case and that’s comes back to the same principle is that at the end of the day is we who screwing up the machines to have an easy ride by our constantly changing our mind not having like the data accurate or in the form which would be useful for machines and that is what in my opinion is the most overlooked thing in all the AI infrastructure is the data quality and sort of like how do you deliver this data that that again makes um sort of a lot of sense. But also our tourists you’ve seen um you know decisions like um schema enforcement right skipping it or um making it a part of something can impact flexibility on a massive scale. So in a world where AI might start generating pipelines and we are not actually far away from that. Where should we draw the line between structure and speed? Yeah. So uh I like what you said you know like the AI can generate a pipeline. So I’ll just reflect a little bit back as I see like a little bit hike hyped about our Kayak AI stuff. So if you ever use it, think of this way, you know, doesn’t matter how great it is, would you allowing this chat model to book you a trip without you validating that that’s what you want? At a certain point I might with how fast is right going. Yeah. Right. But like today like well I don’t know like I myself even in the future I would want to ask with my wife like hey do you think that is the right thing right now to achieve the goal where we could be like oh I fully trust you know just book the trip we need to provide more context and that’s you know like possible in the future I think you still need to have the user you know like it’s this is the thing right so with all of the tooling okay so let’s sort of like go this way so when we’re talking about the you know like the AI and all this stuff around it when we’re talking about the tooling powered by AI or powered by the LLM models or the agents or everything like that. It’s actually very clear to understand where is the responsibility for the committed code. Where’s the responsibility for the committed you know like pipeline? It’s on the developer who uses the tools to improve the performance of his work but he you know like delivers the responsibility of that what he committed is actually accurate and true. Now if we going into the world where the AI is able to create everything like who is taking the responsibility for truth of what is done you could say like well if if they are doing that then we AI is responsible I mean but like I don’t know like I see the world where this is possible because you build you know like the pipeline and then you have a bunch of checks to make sure that you’re not like losing money or stuff like that but can you ever imagine the world where there is no human at all interacting and validating what is deployed and making sure how to overcome these things. I mean maybe the future is there right now. I don’t think that’s is the thing. And the most important part is if you have a company I think you still want to be responsible for what you delivering and you want to have someone to be accountable and knowledgeable about the things it is delivering like I like the vibe coding memes and stuff around. I think that’s great idea. It’s just like what you find is that you know AWS codes committed in the GitHub and now you know the buckets is open. So you could say like well I didn’t do that AI did. So like should we fire AI now? This is the thing. So I don’t know what the future will be. I mean I think that I see much more value from the companies which concentrates in providing the tools to improve the person’s speed executing the task offiscating the manual not interesting labor allowing person to use his mind to came up with more greater ideas to do what he wasn’t able to do because he didn’t had a technical skill but he always had an imagination than rather you know like take that all away and try to sort of like push that to AI eventually AI will be just limitless because of the imagination lack of the imagination well again maybe the future will be difference I mean for now if we’re talking of the world where I can use AI which can enhance my performance I love if we talk about AI which basically eats all my fun from my work and then I need to do something unfund then I don’t like that future yeah because I’ve seen a lot of videos rolling out generated by AI if you’ve seen certain results out there it is close to impossible to figure out if it’s a real video or an AI generated video or it takes an eye for detail to actually pick that detail up. You know, with the with the fires in the Arizona forest, right, and things happening around in the world, there are certain fake videos that are out there with the most likes and the most engagement. Yes. What I think is very you know like interesting or useful for the future is that like even with this uh fake videos what you are enabling is people who never had means to create something you know like nice now can do that by just using their imagination and they can use this models to say like hey build me this new cool stuff which I never was able to build and that’s I think that’s good at the same time you will have a lot of people using that and abusing that for the bad things, but that’s just like I guess how the world is. Well, Archurus, while going through your career highlights, I um did see that you’ve made bold decisions to, you know, move away from Snowflake, a choice that many teams might hesitate to make. What triggered that decision for you and your team? If you can give our listeners some context. So, first of all, what worked for us not necessarily would work for any other, you know, like any other organization. So I think this is again getting back to what we you know like spoke earlier is the data engineering world is changing in the way that we have so many tools to be so efficient and to do the things. So what we need to concentrate on is to making the right decisions and basically figure outing what makes sense for us. So this is what was not making sense for us with you know like with this concept of the snowflake but at the end of the day it’s sort of like you know like tools which we use is an opinion on how organization should operates you know like sometimes opinions are not good for them and that’s the thing right like we sort of like had our own ideas of how we want to operate and we just realized that that tool doesn’t work for us and we found a different tool which that’s about it. Well, Arturus for our listeners is it is it usually you know the cost the performance issues or something deeper about how you know the data systems need to evolve how would you put it um together? So I think this is something deeper, right? It’s it’s sort of like it’s as I mentioned like if the tool doesn’t solve you a problem and just causes you a headache. I mean what’s the point of using the tool maybe you should change that right? I think the good example is you can always use you know like an pickaxe to put the nail down but the hammer is much better at doing that job. I mean sledgehammer is a little bit too overkill for doing that job. So and I think this is what is again I will repeat myself like what is important for the data engineers in the future and now is understanding the difference between what each tool is doing and what type of the problem you’re trying to solve and which tool is actually will be the one of solving this problem. What additionally you need to understand is that well the deeper thing which I mentioned this is something deeper it’s it’s actually I just realized it’s people to whom you worked with like the people who uses the tools which you brought to organization or the tools which is like hey from now on we’ll use that the people is saying if they want to use that or not like in our case well not in our case like in general like I see that there is two types of organizations the data organizations there is an data organization which would write applications so would be like pispark or spark group. So that’s the data bricks for you or you would see an organization which is very for writing SQL and they want to do everything in the SQL and that’s would be sort of like snowflake for you if you want like two spectrum. So now if you’re coming to organization which all the time was writing SQL and say like hey from now on we’ll be doing data bricks and we’ll doing the spark because it’s much more cost effective. Yes, maybe the technology might be more cost effective, but the people will be completely against that and they will not have knowledge of running those things and you basically will just completely screwing up. And that’s the thing what it comes to. It’s just like you need to understand your organization. You understand what your people needs to have and then the tools needs to solve a certain problem. You shouldn’t bring the tool first and then figure out what problem it can solve. You should sort of like go the opposite way. If I just may, I remember I I I was at this presentation and one presenter from the big company, he was like, “Hey, we have this great amazing new technology like we can put the CPUs like free layers. We can make them sort of like, you know, like the 3D CPU. It can have a huge amount of memory.” And then like okay, okay, so what can we do with that? And the phrase was like and if you have some problem where you think this could help you know like tell us and we will be more than interesting to provide you this hardware to test that. And I was like so this is an example where we have a hardware built for something where the problem doesn’t exist and then you going and say like hey we have this nice tool can we figure out what problem with that bunkers. It’s first should be a problem and then the tools fixing the problem not the way around. The story really doesn’t end there. That’s where Trino comes in and I’m going to talk a bit about it by is because you know um you mentioned somewhere that it wasn’t just you know a tool but a bridge right yes so there was a lot of analysis that went down that lane for that particular tool so you know for our folks who may not know about it how would you explain you know what that tool is and how did it click for your team while the previous one did not yeah so I think you know like I don’t want to go too much into details uh but I I think the main important thing here is uh the trino is open source. When it’s an open source, you actually control everything. It’s both, you know, like I don’t know the English word for that. So it’s basically both, you know, the curse and the opposite of what the curse is. Uh okay. So a blessing in disguise. Yes, exactly. So it’s basically both the blessing and the curse because it’s a blessing because you know everything how it works. There is no proprietary uh technology in there. You’re able to see all the metrics. you’re able to see how the CPU is working and it’s so with the open source but also it’s basically it’s it’s a curse because like now if you need to optimize for something you’re basically all alone or the community and it’s very hard to achieve something like that. So Trino is something like that. It’s it’s a blessing and a curse is that it’s amazing in the way that you can you see everything what is happening but also it’s like very hard to do the things right. Right. So you need to have sort of like a talent available to be able to do that but if you have the talent you can go you know can go and do that. Now I think if if you are a small organization which doesn’t have talent or nor they should have talent if it’s like a small data organization then don’t don’t even look at trino go with something already pre-built snowflake or data bricks whatever doesn’t matter and you will be way way better off. Trino is great uh because we our data organization is much bigger and when I’m talking about the data organization I’m talking of the size of the data we are dealing with not the size of the people right so we needed to be able to see where we spending all our resources in and that’s sort of what what the trino allows us to do like not only like any open source would allow us to do where the snowflake is able to solve a problem in certain way but then if the way how the solving problem doesn’t fit us then it sort of like wasn’t the case now to your question. What is what is Trino? So, Trino is the virtual warehouse. So, what does it mean virtual warehouse? Well, aka query engine. What does it mean? Query engine. If you think of database, you have a data storage. So, on the disk there is a data storage. Then you have the CPU and RAM where parts of the data is pulled from the storage into an active memory into active CPU and the calculations is done and then the result is speeded up and this is all happening in one computer. So now with the rise of the you know snowflake data bricks all of the hive I think hive was the first one who started that the idea was like well can we scale you know like the storage independently on the compute and what they figure out is that well we actually can and this is where the S3 was very helpful so basically like what the query engines came in so like hey we don’t have any storage of our own we rely on some other storage what we do have is the CPU and RAM and we just you just load whatever data you want and then we do the calculations in memory and we sort of like give the results. So this is what the trino is. Trino is a virtual is a query engine first where it sort of like utilizes that. Additionally what the trino engine is so what the trino guys thought about that and that was like back at the Facebook was like well it’s already doesn’t really rely on the storage can we use any other tool or any other database as a storage as well and the answer was yes. So now the trino allows you to query data from your data lakeink in S3 query data in snowflake query data in your my SQL bring all of that together in memory close to CPU run the joins run the aggregations and deliver the data so this is basically what the trino is is it’s it’s basically say like hey because we don’t have a storage because we open source we actually allow you to use any technology out there for your storage or for your compute and we just bring things together and that’s just makes life so much easier. Yeah. Yeah. I know I think that was very uh nicely put for our listeners and thank you for this thorough answer. Well, Arturus, you know, while while going through your um career timeline and some of the work that you’ve put out there, I found that you know, you try to put a philosophical angle, you know, to the tech side of the work and you once said that every tool has an opinion about how data should flow. Well, would you like telling us that what does it mean in practice and how has it shaped the way you evaluate new technologies? I think uh I even this time I use that I I I just this is what it is like it’s it’s opinion and I early in my career I start seeing that with initial frameworks right so for example in the software engineering we have a frameworks JavaScript has like a bunch of frameworks all the frameworks is relying on the base technology and they are just opinion how this base technology is used to perform one another you know like example now so imagine so for example like for the going and evalating think the different technology is is let’s put it this way like let’s put like a very simple example like you have a car right car is very universal thing for everyone now however if you in K or in US or in some other countries there is a difference with the driving wheel is right like where like where you control the car is it on the left side or is it on the right side and now because of that the infrastructure of the roads is also different the rules of the driving license when you pass is also different like you have you know like is it like left hand or is it right hand like do you turn this way or this way it’s all depends where the wheel is right or the like of the control of the car and that’s the thing like if you just buy any car without understanding where the car will be driving you can be in very big trouble and and that is with everything if you first just jump in on the just you know like the just jump in and just like hey we’ll buy this tool because it’s amazing you might be in big trouble the reality reality is that in the computer science none of the applications is able to change how the physics works and why I’m talking about that because at the end of the day this is just like you know like electrical impulses back in the CPU going and doing all of the calculations there is a you know like the speed how fast they can do if someone comes in and say like hey this new software is now t times faster can’t be ts faster by just being 10 times faster it needs to basically make some corners or need to make some things where they say like well we’ll prioritize this but in the face of that so it’s always a balancing act and that is for the older technologists as well. So this is why it’s so important is to understand the problem which you’re trying to solve. Understanding how the tool which you’re looking for is is trying to solve this problem and then trying to figure out what is you know like what is the side effects which you will get by using this tool and this is usually how we you know like validate the things of or evaluating the new technologies and how to use that. data platforms are the engine rooms behind AI and you know as you have built uh one from the scratch and have overlooked many do you buy this idea or do you think that we are just overselling the concept of it so I think it’s again will depend from the point of you looking at if you’re looking into building a new LLM model building new AI model building anything new basically building a model you need to rely on the massive uh well not massive not necess necessarily like you can use your laptop and that’s usually enough to start you know like somewhere but you need to rely on your eventually on the data infrastructure or your data to be able to train the models and that’s usually what it is just like tons of training data to you know like increase the value of your of your model LM model that included if you using any of that if you’re using any of the build models you don’t need any data infrastructure whatsoever I mean as you said yourself like you look you can’t just write a prompt and that will generate you a video right so for this to happen you need a lot of data to build the model the moment when the model is done you don’t need any data infrastructure whatsoever so again it depends you know like at which angle you are like if you if your job is to build a new model yes that is important it’s not necessarily might be cold data platform but you will need to be able to have some pipelines at least to be able to feed into but when it comes to you know like usage of that like no you don’t need any of that like even if you’re building your business relying on someone else’s model you don’t need any data infrastructure at all unless you want to do your own analytics afterwards. Let’s just go to 5 years ahead of us. It’s like you’ve got 15 years of good experience and um what if 5 years from now or 10 years from now the you know AI is able to generate pipelines, deploy it, do everything by itself. What do you think would happen to the role of data engineers? Would we just be the curators, overseers or just go obsolete? What do you think? Yeah. So I think when was the open AAI released the chaj that’s I believe was already like 2 years ago right so sort of like when it’s sort of like hit the news and like were like oh my god back then you know like openai said in the next two to three years we’ll have ag right like the agi I think that’s that’s the term right like we’ll have like the general intelligence artificial general intelligence and then they changed their mind what does it mean to have artificial general intelligence and then they changed their mind once again I think it’s actually a very big problem to solve. First of all, LLM model it’s like and it’s no, you know, like limiting to anything. It’s a great technology, don’t get me wrong, but like just you need to understand the following things. So, you know your phone, right? And when you’re typing something, it’s basically suggest the next word for you to type, right? So, that’s is LLM in your phone. That’s what it is. That’s the source of the open AI. That’s the source of the chatbt. It’s a little bit more expanded. It’s sort of like generating, you know, like a next token in your mind. However, if you think about the human and how we talk, how we communicate, we don’t know which next word will be thrown out of our mouth like it’s just somehow does, right? Like it’s just like get. So you can think of that it’s also like we are token generators like we’re word generators which we’re spitting out and then if I’m talking possible enough you will believe that I’m saying true but I full of you know like and like nothing like nothing like that. Important thing here is the trust in the unit which is spitting these words and the trust right and unless we have this trust we always will need to oversee that. Now what the LLM models are great really really great is translating from human language into the machine language and then reverse and this is what we see with all the agents bumping up because this is like a very useful thing where I could say like hey I want to generate the video for I don’t know kid on the skateboard then this LLM model will hit a certain agent which is like okay now we need to generate video and that’s the all the parameters which you need to do and this is all the other parameters which you need to do and this is all the extra information you just push that into the agent and the agent will do this own thing and we’ll see will spit the results right so that’s the thing like it’s amazing when you’re looking into that but if you will take away the agent out the LLM by its own will not be able to do that so for example at the same time LLM by itself will not be able to drive the car maybe LLM using an agent will be able to write another agent will be able to drive the self car itself I think if the artificial general in intelligence would be easy. We would already had it. We talking about that for definitely 50 years. It’s a hard problem to solve. I think I have no idea. Yeah. So, getting back to your question, I think in the next 5 years, finger crossed, nothing changes because otherwise all of us will be out of the work. Well, with the exception of you because you know like uh that that’s a different story. However, what I will believe is that there will be more tools coming out which will change the way how we work which will require for us to better understanding underlying principles and just will have like much you know like nicer IDs autocompleters or like other tools and we’ll need to be the ones who oversee as you see what the LLM models produces. So we would be take the responsibility of pushing this code, shipping the code, shipping the product and we know what the product is actually doing and it does what we asked it to do and not trust you know like blindly arous given that you if you had the power and assuming that you know AI has already sort of taken over if you had the power what part of you know data platforms or generating pipelines would you never just give away to the AI or would you would never sort of want it to be automated in any way. You know, if you if you had a control over it, what would you choose? So, if let’s assuming that it can do anything and it’s actually very very good, right? So, then I assume it’s dealing with people. No, I’m joking. Of course. Yeah. I think I think what is there is but that’s also the true as well. So, there is two things which is very important. So the first thing is cleaning the data from what we messed up into the data structures and the sort of like the principles you know like which would make you know like the data as it should be and then dealing with all the requests which you’re getting you know like from the any you know like analysts like what about this or what about that what about this what about that I mean it takes so much time it’s such annoying work to have I mean I don’t do that much these days but like back in the day it’s just like come on like this is the dash just go and check there but like what about that part is like you need to write this a query and it’s like this what the data says like oh what does the data says and like oh the and it’s just like that so this is I think where the AI would be beneficial where you could say like okay like the AI can give you an answer from the data which is provided the problem with that again it’s if if the AI is great the problem with that today we actually try to do something like that the problem with that today is that you can lie to AI and you can actually get the data which you want and then you can change the narrative about the data and basically tell any story you want and then you’re having like a person coming to you and say like hey look I checked with the AI this is the number it’s provided to me explain that to me and then you’re like what and that’s basically the the scary part but it will be fun if we look ahead what’s one development in data infrastructure that truly excites you where do you think the next big shifts will happen would it be around AI or something else I think what I’m start seeing now is like and what I’m really really impressed of is the ids you know sort of like the tools which we use to write code is getting so much better by utilizing the power of the LLM models this is just amazing so I think what I would like to see is more specialized tools and I’m very excited about is allowing humans to use the knowledge of the different domain when they would ever would or was able in the past. Now I can ask anything about you know like chemistry. The problem is is that like I don’t know how truthful that is. It will definitely be plausible but like I don’t know if that’s true and this is I think the biggest problem but nevertheless I mean that’s the good way to you know like to have your imagination explored and and well Arurus for those getting into data engineering today you know in 2025 what advice would you give um to help them stay adaptable and focused in the fast moving space like this? Yeah. So I think with the AI and especially with the tools it will be less important that you know snowflake or data bricks or you know like any actual tool where it will be more important understanding the principles because you could then say like hey I need to move data from this API which is written with the rest. I will be using Python. Then I want the data to be converted into the CSV files or any other format. And then this data needs to be loaded in our data warehouse and I need to build you know like the pipeline like that using data bricks and that will generate you the code. But you need to be very specific what you want to do. So understanding the principles not the specific tools will be very very important and not only for the data engineers. I think that will be for every engineering profession is you need to get back to the basics. Well, Arjur is one of uh my favorite questions now and the last question for this conversation. What is that one hard learned lesson that you’ve had in life? I think one of the things which I learned hard way is that it’s sort of also a little bit reflecting to the to the previous questions is that there is a balancing act of everything and early in my career I was like I want to have a freedom and I want to sort of have unstructured data if you like. I want to have to be able to change schema the way how I want. I don’t want to be structured and pushed into anything. And then after the fact, you’re realizing that this unstructured approach is actually costing much more on speed on performance and other things. And you’re like, if only I would push myself into this more constrained world where the structures there sort of like get into the speed. And this is what I’m noticing. And that’s sort of the life lesson is that sometimes the short wins now will hurt you down the line. So I think it’s much better sometimes take you know a hit now but then be much better down the line in the future. Yeah. No, I think that was that was also a great advice for our listeners here. Thank you um everyone for tuning in to Tech Unhinged. Big thanks again to our tourists for sharing such valuable insights with us. Again, thank you so much for your time our tourists. It was the time. Take care of you. Take care. [Music]