Between the radiation, the microgravity, and the extreme constraints on power and cooling, space is just about the worst possible place to put a computer. Oh, can you guys let me in? That’s better. Now, if you follow space research and exploration, you probably know that space is just full of computers, like this one. So… How do they do that? Well, the short answer is by replacing them a lot. The ISS takes regular shipments of dozens of laptops at a time, which get, and this is great, this is a direct quote, absolutely destroyed. But not every computer can be disposable. And in 2017, the madlads at NASA, HP enterprise and Kioxia, who sponsored this video, collaborated to create Spaceborne One, the first edge computing server that was intended to run for an extended period of time on the International Space Station. Of course, it being their first attempt, some, uh, let’s say "learning" took place, and it turned out that the super-capacitors in the SSDs were prone to radiation-related failure. Who knew? But, since all it was ever meant to do was run benchmarks anyway, the mission was considered a huge success, and in 2021, they launched Spaceborne 2, whose purpose was to move beyond proof-of-concept, and explore practical applications for on-station compute, especially data analysis using AI. But the story doesn’t end there. Behind me is new Spaceborne 2. For administrative reasons, it has the same name and core specs as last time, but it just took off earlier this year, and it features more storage than ever, over 130 terabytes, which is an incredible feat when you consider the design challenge. I mean, where do they even install these things? Look up. Oh, right. I guess they don’t really need a ladder, do they? No, but you might. To illustrate why edge computing is needed on the ISS, let’s look at a use case that’s focused on astronaut safety. These are the EVA gloves that the crew members wear during space walks. And according to this article from 2016, they were responsible for half of all spacesuit injuries. So, to ensure their integrity between uses, NASA requires the crew to take hundreds of photographs of them from every angle, and then beam them back to Earth, where machine learning is used to analyze them for scratches or other hazards. Except for one small problem. That data transfer takes five days! But with Spaceporn? Oh, I don’t know. How about 45 seconds? Not only is this a huge time saver, but with only a handful of pictures needing to go to Earth for further analysis, Spaceporn can free up a significant amount of the crew’s limited network bandwidth for other, more interesting things. With such obvious benefits, then, you gotta be wondering… Why did no one ever try to put a server on the ISS before? The short answer is, after seeing how the crew laptops fared, many people thought that they just plain wouldn’t work. And, even if they did, there were a host of other hurdles to clear. Like the launch. Okay, this is really cool. Rocket companies like SpaceX and Northrop Grumman have shake test machines that are programmed with profiles that will simulate the launch conditions of their respective rockets. And if you’ve seen that viral video of the machine that disassembles hard drives by vigorously shaking them, you’re going to know that surviving that kind of treatment is no mean feat. Well, these machines managed it both in the simulation and in the real world. They actually lifted off at the end of this January. Every piece of equipment must also pass an acoustic chamber test and a User-friendliness evaluation to ensure the station crew can install and manage it and uh, oh, here’s a good one apparently all equipment sent up to the ISS goes through what’s called a white glove test, which thankfully is not what it sounds like basically you put on a pair of white gloves and then you just Manhandle the crap out of it if the gloves snag or tear on anything Yeah, that’s a potential source of injury I’m gonna need you to file that down which Fun fact they actually do on-site and then repeat the test. I just hope they weren’t filing any ram sticks Speaking of let’s take a closer look at these machines machines Because it’s not just one in here. I know it was kind of the point of this whole experiment But it still weirds me out that these are just bog-standard HP enterprise systems that you could order on their site today They don’t even have lead armor or anything in this case We’re looking at an edge line 4000 which is a multi-blade system and a DL 360 dual socket server We asked why these specific machines and the answer we got was shockingly relatable We sorted the HP server catalog by depth power draw and GPU support and These were the ones we were left with all right fair enough as for why two different machines Well, here’s the thing in a perfect world multiples of the same machine would have been better But due to power constraints they chose to have one with more CPU cores for more traditional scientific applications And one with pure CPU cores, but with a GPU for deep learning and AI one thing they needed for both However is ample storage Kioxia generously sponsored this and brought us out here, so let’s take a look at the Let’s call them unique choices that they made for their storage configurations first up Obviously gone is any trace of super capacitors, so Kioxia can proudly say that their SSDs are space ready I guess but What’s less obvious is why they chose a SAS interface drive rather than NVMe for their high-speed bulk storage I mean you would think this is space-age technology They’d want the fastest thing possible, but these drives were selected for their balance of performance reliability and Especially power efficiency when you’re looking at a shared power budget across two servers That is less than a typical gaming rig Every single watt counts. Oh, right, and that’s even under ideal conditions at any given time to conserve power for other priorities on the station The team can be asked to operate in half power mode or even to shut down entirely for large operations like docking So the new DL 360 server this guy right here gets four 30.72 terabyte PM6 enterprise drives totaling 120 terabytes of raw bulk storage for scientific data and for backups then for application drives we’ve got a really wild config again they went with four drives but this time it’s their RM6s again they’re using SAS for lower power but this time two of the drives are operating in a data redundant mirror and the other two are basically just chilling there ready to be put into action in the event of a failure two warm spares out of a four-drive array would sound like crazy paranoia on earth but I assure you that in space where bit flips from random radiation are much more common it’s perfectly reasonable I mean other than the overkill drive config and the 28 volt power conversion that they need to run it on the ISS Not much to say about this thing. It’s pretty much a bog standard server. There is one cool demo that they said we could run though Yeah, they offered to let us pull one of the drives out of this dummy machine and live swap it into this running machine To show that no data loss will occur. I want to do the honors sure All they asked is that you put it in bay 8. See you can see the drives actually operating in there right now. So I shouldn’t use bay 7. That works. Look at that. I mean, that’s good. RM6, that would be bad if it wasn’t Kioxia. Awkward. And then let’s check the size. Beautiful. 3.8 terabytes. Exactly what we want. They’re not the exact drives that are on the space station, but we wanted a different capacity to show you that it’s working. And those are expensive. Look at that. Status rebuilding. I mean, this seems like a lot of extra steps. We could have just looked at the light. Yeah, the light. Yeah, it’s going. Success! And whether you’re looking for a SAS drive, an NVMe drive, high capacity or high performance, we’re gonna have a bunch of Kioxia’s great enterprise grade drives linked in the description down below. I think I’ll let you take this one apart. It’s appropriately lined this size. Okay, let’s take a look at the second server that’s packed into each locker. The EL-4000 is a blade chassis, so the servers are basically these little slide-in cards that… Yeah, I know. Aren’t these cute or what? Wait, pull it out. And they go in on the side. Look at that! Instead of from the front. Wow. And they managed to sneak four Kioxia XG6 NVMe drives into each of these blades. Well, when I say each of these blades, I should say… They had the power budget for four drives, but they didn’t have the power budget for four blades. In the flight configuration of this system, they ship with just one of the four blades installed. Though, it should be noted, they do fly up a spare blade per system in the event of a failure. There’s just no way that that poor locker can support both of these blades and the other server running concurrently. Let’s put you away. And shift our focus to the locker. Now, obviously, there’s no real up or down on the ISS as it whizzes around the earth at around 28,000 kilometers an hour. But to improve comfort for the astronauts, they tend to mount directional items like plants in a fixed orientation, which would put our lockers, the drawers that hold our servers, in the ceiling. There are two of these lockers, each containing an identical system loadout for workload sharing and redundancy. And these lockers present some serious design challenges, starting with the fact that they use a standard that quite literally doesn’t exist on Earth. Express Rack. To pack the servers in then, HP Enterprise had to get kind of creative. They found the shortest servers they could and then they stuffed them in sideways and they’re using a combination of air cooling and water cooling. The air cooling uses a system on the ISS called AAA. So at the back of these lockers, there’s two cold air supplies and then two hot air returns. That handles about 20% of the cooling for the servers. Obviously, 20%, not 100%, they’re gonna need some more. And that’s where this water cooling comes in. This isn’t a one-to-one for how it would be deployed on the ISS. For one thing, these fittings, 3D printed mock-ups. Real fitting, $800 a pop if you were even allowed to buy them. There’s tubing. Cheap vinyl from Home Depot. Real tubing must be made of stainless steel. In fact, any wetted surface, so anything that comes in contact with water, is supposed to be made out of stainless steel. But we can still illustrate how it’s supposed to work. So, on this side, these go into a heat exchanger much like this one. This is actually from the first generation Spaceborne, but functionally it’s the same. It pumps cold water into here. Chills the air inside the system and then takes the warm water out to be dissipated to the photovoltaic heat exchangers that are plumbed up with liquid ammonia coolant and mounted to the exterior of the station to sink that heat into space. You need these kinds of special heat exchangers because while we think of space as cold and we see people, you know, blast it out of airlocks and they freeze over or whatever in movies. The truth is that for traditional methods of heat dissipation, you need air, and in the near vacuum of space, well, it ain’t th-there… th-air? Get it? Cheesy jokes aside, the two cooling systems together are good for removing about 400 watts of heat from each locker, but that’s a combined budget, so if this GPU server kicks into high gear, well, these CPUs better just chillax for a little bit. Now let’s talk about one of my favorite subjects, networking. There’s four standard RJ45 ports on the front of the… Okay, I’m gonna show you on the real one. Ah, as I was saying, four ports on each locker. Two of them connect both of the ISS’s internal gigabit networks to a separate redundant switch inside the locker, and then the other two links are gonna go between the two lockers at 10 gigabit. Why 10 gig? Well, because for either backups or for multi-node workloads, that is a heck of a lot better than gigabit, and the power budget didn’t allow for anything faster. Cool, I guess I’m starting to notice a pattern here. Anyway, that’s all pretty standard, but things become less so when you look at the station side of these network cables. This is a 37-pin military spec locking connector. These are designed for power and data, but in this application, just eight of the pins would be used, and it is $220 for just this part. Now on the space station, NASA provides these cables for you, but for testing sake, really? Here on Earth, HP Enterprise had to make their own. Fantastic. What’s really gonna blow your mind, though, is for all of their expensive networking, these machines do not have a normal internet connection, just a private link back to Earth that NASA not only limits to a mighty one megabit per second, but that they also encourage folks not to make full use of. Also, even now in 2024, it doesn’t have 24/7 connectivity. Pretty much every hour or two, there’s a period of downtime that can be anywhere as short as a few minutes. Or as long as 45 minutes. And that’s because they have to prioritize generating enough power for the station. And when the giant solar arrays point toward the sun, they can block line of sight with the satellites that provide connectivity, which, oh, that’s a fun fact. Even though the ISS orbits less than 500 kilometers from the surface of the Earth, our ping times to the ISS, and yes, we got to ping the ISS, which was pretty cool. But our ping times were atrocious, reaching nearly a second as we uploaded some of the dankest memes that Earth had to offer. LTTstore.com Now, we asked why that is, and the answer was twofold. One, it’s really old. Okay? Fair enough. But also, two, the station’s internet relay is in geosynchronous orbit, over 35,000 kilometers from the Earth’s surface. And, uh… Well, what I said was, "Hmm, well, there’s your problem right there. It’s just really far." And you might be wondering, "Well, why don’t they just use Starlink?" That’s a good question! Um, someday they might, but for now they don’t. And HP and the team on the ISS have to work around the constraints of the current setup. I mean, for crying out loud, it took them four years to validate that, you know, we can even just run a normal computer up here and actually expect this thing to be reliable. They can’t just switch to something and go, "I don’t know, I hope it works." Oh, by the way, here’s another fun one. There’s no API to determine if their connection is up or down, so instead what they do is buffer all their communications. In basic terms, that means that they ping every second, and if the ping succeeds, they send data and then hope that the connection stayed up during that time. It’s a pretty good system. Okay, not a perfect one, but certainly enough for us to learn a lot from the Spaceborne project. Even though Spaceborne 2 has been in action for three years, there’s still so much to learn. Two of the servers have hardware raid cards for their drives, for example, costing both mass and power consumption, while two of them use software raid, which obviously doesn’t consume any mass, but could impact power consumption even more depending on the loads. And one of them could be more or less reliable than the other. We won’t know until we try, which is kind of a recurring theme here. So if you want to learn more about the Spaceborne Computer Project, we’re going to have some resources linked for you down below, and we’re also going to have a link to some great enterprise storage options from our sponsor, Kioxia. We’re truly grateful for this unique opportunity to get, realistically, as close as I ever will to the real ISS. A prop in a sound stage in LA! But hey, thanks Kioxia for the opportunity, and for your long-term partnership.
source