There is no doubt that the Hololens is a magical and compelling experience for anyone who uses it, but for onlookers all they see is people looking into thin air. Within weeks of getting our first Hololens back in August 2016 we had identified this as a problem we’d like to solve.We also wanted to showcase our awesome mixed reality experiences with the wider world so the first target was to produce high quality videos we could share on the internet.

photo credit: Lars Opstad @LarsOpstad

Why build our own solution?

The Hololens has build-in functionality for recording videos from the user’s perspective called Mixed Reality Capture (MRC). This is a great tool for quick and dirty sharing of ideas, but we felt that the quality coming from the inbuilt forward facing camera didn’t do our work (or the Hololens itself) justice. MRC also has the downside of being computationally expensive so the framerate gets clamped to 30 fps. If you’re already pushing the hardware to the limit the framerate can drop even more, further reducing the quality of your video output.

More recently Microsoft released something called spectator view which is a combination of software and hardware which is designed to help developers create better videos of their content. This wasn’t available when we started our quest for better videos so we couldn’t have used it anyway, however it still doesn’t quite match our requirements. At the moment the spectator view only works for a locked off camera. We found when doing early tests this doesn’t sell the 3D nature of the content quite as well as a moving camera.

The only system which matches all of our requirements is the Microsoft in house setup which uses a Red Dragon camera (tens of thousands of dollars). They only have a few of these setups and they aren’t for public consumption.


Solution

Our solution has been refined and improved over time, but the basic principle has remained the same. At the core is a Unity library we developed for recording a Hololens session on the device itself. You can think of it as a replay system. During a filming session we record all the relevant information about what the user is doing, what objects have been placed where, buttons tapped, etc. This data is then saved to file which can later be imported into the Unity editor. This allows us to playback everything the user did and render it out from the perspective of the real world camera and composite it on top of the footage.This diagram shows all the steps in the pipeline which we will cover in more detail below.

Recording Hololens Session Interactions

This is the core to our solution. The key is knowing what the user has done and when. Out first thought was to stream the data live over the network to the unity editor and render it out in real-time. This would have required us to write networking code for any app we wished to capture, even if it was only a single user experience.The much simpler solution was just to record the data locally on to the Hololens, which also removed a lot of complexity during the shoot (i.e no need to have networked computer there).

The process of preparing a scene for recording involves tagging up only the objects that respond to user input so the system knows what to record. This lightweight approach keeps memory and performance overheads low and means we can prepare a scene (depending on complexity) in under a day – Compare that to fully networking a Hololens app!

With the app prepared we’re ready to start filming. To start the Hololens session recording on location it’s as simple as using a voice command to start and stop for each take. In addition to the replay data we also write out the spatial mesh of the room, this helps when matching up the real-world camera movement with the Hololens coordinate system.

Filming Hololens Session

The filming itself is fairly straight forward, but there are few things we do to make sure things go smoothly.

We use a GoPro as it has a nice wide angle lens which we find allows us to get both the holograms and the user in shot without needing to be too far away. It also allows us to film in smaller locations when necessary.

We don’t want to have to do too much rotoscoping so we try to avoid the user moving in front of the content. To help minimise this we usually storyboard out and practice the shots before filming on location.

We like to use quite minimalist spaces as backdrops so there is not so much noise behind the holograms. This can make it difficult to track the camera later so we’ll often add tracking markers to the space.

Camera Tracking

Camera tracking in 3DE

This is the most time consuming part of the process, but it’s also the most standardised so there isn’t much to say about it. Camera tracking is the process of extracting the cameras movement from feature points in the image so that CG content can be composited into it. We use 3DEqualiser to perform our camera track. If you want to learn more about camera tracking in 3DE you can find some tutorials here.

The only key part to this process for us is to ensure we have real world measurements of an object in the shot (e.g. a table, a doorway, etc…). This ensures we produce a camera track at the same scale as the real world. The motion of the camera is then exported to an FBX animation (along with a proxy of the known object from the shot) which can then be imported into Unity.

Rendering out from Unity

With our Hololens session recording and our real world camera track we can then start to render out the footage from Unity. There are 3 keys aspects to this which are worth calling out:

Firstly, matching up the Hololens and imported camera coordinate systems – This can be done by eyeballing it, but where possible we use the known object proxy from 3DE on top of the spatial mesh we saved during filming. This gives us a rock solid match.

Secondly, lens distortion – The GoPro has quite a distinctive look, part of this look is created by lens distortion. The problem is that Unity doesn’t do distorted rendering out of the box, even more of a problem was that the GoPro distortion isn’t your standard fish eye or barrel distortion, it is rather complex.

Fortunately, if you provide 3DE with an image sequence and choose the right lens profile it can apply the lens distortion for you. However, a distorted lens captures more than just a rectangular shaped window onto the world – The image below illustrates this nicely, on the left is a shot of a checkerboard taken with the GoPro, on the right you can see the same shot undistorted by 3DE. The undistortion process unwraps the original making the checkerboard lines straight, but in doing so the image is stretched outside original image resolution, this is called overscan.

This means in order to do the reverse (i.e distort our renders) we need to have Unity render out with overscan. This is done by increasing both the resolution and the FOV which is being rendered.

3DE does a good job of all this, but depending on shot length this could take upto 30 mins to process for each shot. To speed this up we now apply the lens distortion at the time of rendering in Unity. Rather than implementing the complex formulas 3DE uses to create this distortion, we wrote a plugin for 3DE which exports the lens distortion profile as a warped mesh that Unity can use to render out the images with the correct distortion. The benefit of this is that if we use a different camera/lens which requires a different formula to produce the distortion all we need to do is export a new lens profile from 3DE.

Finally, render speed – When we first got the system running we were rendering out 3450 x 1992 images at 60 fps as PNG. We used PNG so that we had the alpha channel and some compression so the file sizes were a bit more manageable, but the Unity implementation of PNG encoding is slow. Render times could be in the order of 30 mins just for a 30 sec shot on a high spec PC.

We looked at a few solutions on the asset store, but only one ticked all the boxes.  AVPro Movie Capture by Renderheads – We use a number of Renderheads plugins and they are all super performant native implementations and this is no exception. It will render out 1080p at 60fps using a lossless codec supporting alpha in near real time. So combining both the lens distortion and the rendering improvements we’ve taken a processing time of an hour per 30 sec shot down to about 30 secs. The saved time allows us to iterate more which ultimately results in a higher quality output.

Compositing and Post

Similarly to the filming and camera tracking, this is a standardized process using your favourite compositing package. We perform marker removal, camera stabilisation and add a little transparency and glow to emulate the look as viewed from Hololens. Then cut it together and that’s about it. 

 

Conclusion

Spending the time developing this tech has been well worth it. Not only are we able to use this to promote our own work, but it is increasingly becoming part of our standard client offering.

More often than not our clients will want to promote the use of HoloLens too, if not globally, then at least internally. Aside from offering this on top of a full app production we can retrofit our solution to existing apps that have already been developed.

Real-Time

The technique outlined above is limited to offline video, but we’ve recently extended the system to real-time to allow a live view from a static camera onto the HoloLens users’ world. This piggy backs on to the same lightweight approach for recording HoloLens sessions to disk, but instead of writing the data to disk we stream it over the network to unity. Similarly to spectator view we then composite the render and video feed direct in unity.

 

This has only been used a few times in a “live” environment so we’re still learning and improving the workflow. With the release of spectator view we are keen to compare our real-time approaches. From what we’ve read there are advantages and disadvantages to both solutions. Our main advantage is that we don’t require an additional Hololens. We’ve implemented our own computer vision solution to sync up the users Hololens world with the position of the camera.

There is much more to talk about regarding the real-time filming of Hololens, we’re exploring using other camera types as well as the spectator view rig, but we’ll save that for another post.

Please get in touch if you have any questions about any of the above.

Leave a Reply

Your email address will not be published. Required fields are marked *