The One Billion Row Challenge with Florian Engelhardt

Mathias Hansen (00:10)
Hey everyone welcome back to Countdown to Laravel Live Denmark. Today I'm speaking with Florian Engelhardt who is one of the speakers coming to the conference soon. Florian, how you doing?

Florian Engelhardt (00:19)
I'm doing fine, thank you. How are you doing?

Mathias Hansen (00:21)
I'm doing great. And thanks again for taking some time out of your busy schedule to talk with us today.

Florian Engelhardt (00:27)
You're welcome.

Mathias Hansen (00:28)
So Florian, before we get into your talk, I would like to dive into your sort of origin story. So what are your earliest memories of using computer or programming or things like that?

Florian Engelhardt (00:41)
I think the earliest memory of using a computer is destroying one of a friend of mine, a C64. Yes, exactly. I don't even know what happened, but it fell down on the floor and it was my fault. And my dad could repair it. So that was the time you could actually repair hardware and computers. Like you can't do this nowadays anymore. And then in 1993 I got my first computer. It was an IBM PS2 286.

Mathias Hansen (00:47)
Yeah, a Commodore 64? Sweet.

Right.

Florian Engelhardt (01:12)
with MS-DOS 5 on it and I started programming with GW Basic.

Mathias Hansen (01:17)
Yeah, so basic, yeah. Cool. So did you get into programming right away once you got access to computer? You were like, let me make this computer do stuff.

Florian Engelhardt (01:18)
Those were other times.

Yes, I got a like that was like back then you didn't have Facebook the internet whatever all of these things just weren't there So you have this computer running and now you have to do something with it And it's not like there's a gasoline of programs on the machine. It's just there's GW basic and I had a book Was a German print. I think the title roughly translates to basic for bloody beginners There was a German print and it teached me how to let the computer do things. So the computer did things for me

Mathias Hansen (01:48)
Hahaha

Yeah, and I bet that was a struggle too. I also remember the pre-internet days, essentially, where you didn't have Stygowillflow, you didn't have any forums or online tutorials or whatnot. So were you basically using that book as your Bible for a while there? Yeah. Do you remember, what was one of the first programs you wrote? Was it games or was it tools, utilities or?

Florian Engelhardt (02:10)
Exactly.

Now the first thing I wrote was a small calculator program that I could give math homework and it would solve it. I mean it was basically super easy. I had two inputs, I gave it two numbers, I needed to choose the operation, it would just do the calculation and give you the number. That was the first basic program that I wrote for myself to double check on my math homework.

Mathias Hansen (02:24)
Yeah!

That's

awesome. So at a bit of some point, you moved on to a desktop environment. Was that moving on to Windows 3 or whatever it was called back then?

Florian Engelhardt (02:49)
Yeah, it was Windows 3.11 for workgroups. You had to write. And then from there on I did an upgrade to Windows 95 with disks. I think it was 30-something or 40-something floppy disks you had for the upgrade. That was horrible, it took ages and you couldn't go away. You couldn't go away from the computer because it took ages and then you had to switch the disks. So you couldn't just go away.

Mathias Hansen (02:52)
yeah, with networking right?

Floppy desks, yeah, yeah.

Yeah, kept demanding insert next desk now, right?

Florian Engelhardt (03:17)
Yeah, yeah. But from Windows 95, that was also the time that I discovered that the internet is a thing. That I had a short encounter with IBM's OS X, not OS 2, that was it called. That was also a fun operating system. And from there, I got back to Windows 98 by that time and then moved to Linux.

Mathias Hansen (03:29)
Okay?

Okay, so you're a Linux user today?

Florian Engelhardt (03:43)
I have to use a Mac. I could also choose a Linux operating system. When I joined DataDoc, there was no choice. There was just a Mac, so I used the Mac. But we would be able to switch to a Linux operating system, but I really like the hardware.

Mathias Hansen (03:50)
Hmm.

Yeah, yeah, the Mac hardware. Yeah.

Florian Engelhardt (03:58)
Yeah,

the silicon, the Apple silicon chip is really powerful.

Mathias Hansen (04:03)
Yeah, I really feel it changed the game. really took us light years into the future in terms of performance and speed and even battery consumption, right?

Florian Engelhardt (04:11)
Right.

Mathias Hansen (04:12)
But

speaking of, you mentioned how hard it is to upgrade things. I have a first generation Mac Mini with the chip, and I just realized that I can't even upgrade the memory on that, because it's soldered into the chip, right? ⁓ So I was like, it's a desktop computer, and in the past, I've upgraded memory on a Mac Mini. But no, no, no, that's not happening anymore. It's a shame. It's a shame.

Florian Engelhardt (04:23)
Yes. Yes.

Yeah, it is. It was also when you could just, with a little bit of soldering, fix everything. Nowadays it's a black box.

Mathias Hansen (04:41)
Yeah, yeah, exactly.

Yeah, but it's also everything's getting more and more compact, right? So even the stuff that's soldered on is you need to use reflow tools and all this stuff. It's like tiny, tiny bits of solder. it requires like, you can't just take your soldering iron to your computer anymore. That's not going to go anywhere. So going back to Linux a little bit, what's your favorite distro?

Florian Engelhardt (05:04)
I was using a lot of distributions. I think the distribution that teach me a lot was Slackware. Yep, that teach me a lot about operating systems. I once even did a Linux from scratch. There's an LFS, which basically is just a documentation of how to build Linux from nothing. That's fun. I would not recommend using that in a day to day. But it was really fun and it teach you so much about how the operating system actually works.

Mathias Hansen (05:10)
Yeah?

Florian Engelhardt (05:29)
what dependencies you have and actually also how little you need to run a computer. That was really fun.

Mathias Hansen (05:33)
Yeah, that sounds like an

awesome learning experience because knowing where everything fits in. Yeah, so what does it take to boot a computer?

Florian Engelhardt (05:37)
Yeah. Yeah. But for, for my...

you need a boot manager in your kernel and that's basically it.

Not much.

Mathias Hansen (05:47)
I mean you make it sound so simple when you say it like that.

Florian Engelhardt (05:50)
Nah, takes, it It took me ages. I think it took me an entire weekend to get the system up and running. Because you have to compile everything. Like the main idea is also you don't get binaries. You just get the URLs where to download stuff and then you compile all of this and then you make a bootable disk and then you boot your computer with that bootable disk and then you need to get internet connection somehow and...

Mathias Hansen (05:55)
Yeah.

yeah, there's a whole nother level, just having internet connection, yeah.

Florian Engelhardt (06:08)
But for

Yes, but for a day to day usage I was using Fedora Linux. That was the easiest for me.

Mathias Hansen (06:17)
So, I'm not super familiar with the history here. Is Fedora... Does that have any relation to Red Hat? Do they have any... is it like a fork or something or no?

Florian Engelhardt (06:24)
Yeah, they have relations. That's

good question. It's related to Red Hat Linux, but I don't know how.

Mathias Hansen (06:31)
Okay,

yeah. There's also so many distributions out there, especially now. It's like pretty insane. But a lot of them are, you know, forks of other distros with like slight different philosophies and whatnot, right? Yeah. Cool. But do you, so do you still use, you doing any, you know, systems administration? Do you run any Linux servers these days to get you to scratch your Linux itch or are you mostly,

Florian Engelhardt (06:42)
Yes.

I have a friend of mine as a photograph and he is a Linux server. He's a server running and I'm maintaining it. Sometimes SSHing into the box to start upgrades. Not really too much complicated stuff anymore.

Mathias Hansen (07:10)
Nice, nice.

Yeah. So you're doing a talk about scaling with PHP. Do you want to give us a little teaser of what your talk is about?

Florian Engelhardt (07:26)
Yeah, definitely. one and a half years ago there was a challenge in the Java community, the one billion row challenge. And the main idea of that was to prove or showcase also that Java is not IO-bound. So if you want to process one billion rows from a text file, that's not IO-based. ⁓

Mathias Hansen (07:46)
Yeah.

Florian Engelhardt (07:48)
That's in the Java community and some of my co-worker here at Datadark managed to get into the leaderboard into the top 10 of that by the time back then and they bragged about this obviously in the daily in the stand-up and I was thinking like you should probably be able to do this with PHP as well right and it's probably not as fast as Java but we could tune it there and then I made our first naive implementation of the 1 billion road challenge and it ran.

Mathias Hansen (07:56)
Wow.

Florian Engelhardt (08:14)
but it was not as fast. So I tuned it and that's basically the story I'm gonna tell. How to get it really really fast.

Mathias Hansen (08:21)
So did you get anywhere near Java speeds? Is it somewhat comparable or?

Florian Engelhardt (08:27)
not really, not really. Completely near, but pretty close. I don't want to tell too much. But the, I think the Java implementation is running in 1.3 seconds. We're not getting there, but pretty close.

Mathias Hansen (08:31)
yeah.

That's really, really insane.

Wow. Yeah, I actually recently had a,

an experiment where I wrote some code with Go, or basically had some PHP code to process a lot of data. And we wrote it in Go, a little bit of help from Claude Co to speed up the process. But I actually struggled to go to perform as fast as PHP. And in my case, I have lot of regular expressions running and apparently the regular expression engine that ships with Go is not as performant as the one shipping with PHP. So I was kind of surprised that

Florian Engelhardt (09:11)
Okay.

Mathias Hansen (09:13)
I thought Go would be way faster immediately, but it required quite a few tweaks to get it close to PHP speeds.

Florian Engelhardt (09:20)
mean PHP for, specifically as you mentioned, regular expression. PHP is a high level language and it does a lot of stuff magically for you. So in other languages like in Java, if you, I know in Java if you want to build a regular expression and match it, you actually have a class and you build your regular expression and then you match it against something and you can compile the regular expression down and then match it, which makes it faster. And in PHP you don't have this. You just have Prerog Match and it'll magically do all of these optimizations for you. You don't need to care about this.

Mathias Hansen (09:27)
Mm-hmm.

Mm-hmm.

Florian Engelhardt (09:48)
the perks of having a high-level language. You don't fall for these kind of traps. In other languages you do. Coming from PHP, you go to another language, you're just like, yeah, let me do this regular expression and then everyone's like pointing at you, why are you not compiling this? Why are you doing this in the loop? And like, well, that's not a problem in PHP.

Mathias Hansen (09:49)
Yeah, then you can get up. Yeah.

Yeah, I'm used to just doing it just in time. Yeah, exactly. Cool, so was it really challenging to try to do this with PHP? I don't want to spoil your talk, but did you learn some new things along the way?

Florian Engelhardt (10:20)
I got

a bit into the implementation details of some PHP functions to find out why it's slow and where it's taking a lot of time.

Mathias Hansen (10:29)
So like in the actual implementation of the core PHP language or, that's sweet. So you went deep down into like, okay, why, how does this actually work under the hood?

Florian Engelhardt (10:32)
Yes. Yep. Yep.

Yeah,

especially if you do it on that scale, on one billion rows, there's even simple functions like substring, take time, and you're going to look into why it's taking so much time because you're spending all your time in a specific function and then you're wondering like, there's this extra argument that I can give, how does it behave? And you can see on this scale, you can see a big impact. And then you learn something about it.

Mathias Hansen (11:00)
Did you have to write a bunch of benchmarks for different functions then? Like, you're out, know, comparing different functions to each other, like, figuring out which ones are faster. Because there's lots of different ways of doing the same thing in the end, right?

Florian Engelhardt (11:15)
Yeah, I was in the end I was just running the the 1 billion road challenge benchmark then my PHP implementation and checking the time it took. And that scale you could see like minutes of improvements just by running it again with some different implementation.

Mathias Hansen (11:21)
Right.

That's incredible. So if you tweak a teeny tiny thing and it makes a tiny difference, it adds up times 1 billion. So it makes a potentially big difference, right? Wow. I'm super excited to hear your talk. That's really fascinating. We work with a lot of data and a of it is in PHP. being able to process things faster is always something we're working on.

Florian Engelhardt (11:38)
Yes, exactly.

You

Mathias Hansen (11:53)
Always wanna make it faster. So that's really cool. Cool. Yeah, definitely. And Florian, you live in Germany, right? So you're not, no matter where in Germany you're in, you're not too far from Denmark. It's after all the neighboring country. ⁓

Florian Engelhardt (11:56)
I'm looking forward to it.

Yeah

Yeah, but

I'm on the other end of Germany. I'm living in the southwest.

Mathias Hansen (12:12)
yeah,

that sounds really nice though, that's a really nice part of Germany to live in too.

Florian Engelhardt (12:16)
It is, yeah.

Mathias Hansen (12:17)
But have you been to Copenhagen before? No? Awesome, first time. Oh my lights are going crazy here, I think I have ghosts in the building. Oh well, just turning on and off. Well, I'm not sure if you've heard about what you have heard about Copenhagen, but is there anything in particular you'd like to see or experience or eat while in Copenhagen?

Florian Engelhardt (12:20)
No, never been to Copenhagen.

Yes.

I've not heard or seen a lot about Copenhagen. I'd love to see the, it's called Nyhavn, those colorful buildings. Love to see that. I checked on Google Maps from the hotel to the venue. Basically I'm taking the ferry and it should pass by, right?

Mathias Hansen (12:46)
Yeah, yeah. Yeah.

Yeah, I believe so. think Nyhavn is basically on the other side, so the venue is basically on an island, right? So if you take the bus ferry across from the island and over to central Copenhagen, I think it's going to be right around Nyhavn. It goes by or it even docks over there. But that's a really lovely place to go. Of course...

It can be a little bit busy at that time of year, but it's also the prettiest time of year to go visit and enjoy. Get some good food and see all the colorful buildings.

Florian Engelhardt (13:30)
And in regards to food, something local. Happy to try out whatever you have.

Mathias Hansen (13:34)
Yeah.

We're going to figure something out for you then to try out. Yeah. Cool.

Florian Engelhardt (13:36)
Yeah, awesome.

Awesome, that sounds good.

Mathias Hansen (13:40)
Well, is there anything else you want to share while we're here?

Florian Engelhardt (13:44)
no, that's... I don't have anything else to share. I'm looking forward to meeting you all. Yeah, I'm looking forward to meeting you all at the conference.

Mathias Hansen (13:46)
Alright, we'll have to wait and Likewise! Yeah,

likewise. Well Florian, it was so great to talk to you. Have a wonderful day!

Florian Engelhardt (13:55)
Likewise.

Thank you. You too. Bye bye.

Mathias Hansen (13:58)
Thank you.

The One Billion Row Challenge with Florian Engelhardt
Broadcast by