AR Face Tracking in an Agora iOS Conference Call

Using AR face tracking to make multiple-member calling much more engaged than the average conference call

Using AR Face Tracking to augment the reality of a real-time virtual house party conversation is a good way to experiment virtually. It’s also a great way to play beer pong with your buddies online with virtual party masks or engage in other Internet shenanigans.

Setting Up Agora’s iOS SDK

Step One: Cocoapods

Setting up Agora’s iOS SDK is really straightforward. At the core is a Singleton you toss around to enable Agora functionality. In a nutshell: you run Cocoapods on the directory, add an AppID to the AppDelegate.swift, and add camera and microphone permissions (don’t forget this step!). Then, import the AgoraRtcEngineKit to the view controller, add functions for setting up local and remote video, join a channel, and set up the delegate. You can go through the integration step-by-step.

Step One: Cocoapods

Navigate to the root directory. If you haven’t already brewed, right now is a great time to start. If you do not have Cocoapods installed, run: brew install cocoapods. If you have, run:pod init.

Next, add the following to your Podfile:

platform :ios, ‘9.0’
use_frameworks!
target ‘Your App’ do
pod ‘AgoraRtcEngine_iOS’
end

Update your local Cocoapods library:

pod update

With the pods updated, install:

pod install

Open up your app:

open YourApp.xcworkspace

Now, add permissions for camera and microphone.

Step Two: App Delegate & Singleton

Add a constant at the top of your AppDelegate.swift:

let AppID = “”

Add your AppID there.

Then, go to your view controller file. There, you want to add the following at the top:

import AgoraRtcEngineKit

Agora’s Singleton design pattern for invoking an instance of the AgoraRtcEngine is triggered by calling the sharedEngine:withAppID:delegate method where agoraKit.enableWebSdkInteroperability(true).

func initializeAgoraEngine() {
agoraKit = AgoraRtcEngineKit.sharedEngine(withAppId: AppID, delegate: self)
agoraKit.enableWebSdkInteroperability(true)
}

Step Three: Setting up Videos

  • Enable video:
func setupVideo() {    agoraKit.enableVideo()
agoraKit.setVideoEncoderConfiguration(AgoraVideoEncoderConfiguration(size: AgoraVideoDimension640x360, frameRate: .fps15, bitrate: AgoraVideoBitrateStandard, orientationMode: .adaptative) ) // Default video profile is 360P }
  • Join channel:
func joinChannel() {
agoraKit.setDefaultAudioRouteToSpeakerphone(true)
agoraKit.joinChannel(byToken: nil, channelId: “demoChannel1”, info:nil, uid:0){[weak self] (sid, uid, elapsed) -> Void in
// Join channel “demoChannel1”
}
UIApplication.shared.isIdleTimerDisabled = true
}
  • Call methods inviewDidLoad():
override func viewDidAppear(_ animated: Bool) {           
super.viewDidAppear(animated)

initializeAgoraEngine()
setupVideo()
showJoinAlert()
}func showJoinAlert() {
let alertController = UIAlertController(title: nil, message: “Ready to join channel.”, preferredStyle: .alert)
let action = UIAlertAction(title: “Join”, style: .destructive) { (action:UIAlertAction) in
self.joinChannel()
}
alertController.addAction(action)
present(alertController, animated: true, completion: nil)}
  • Delegate extension:
extension VideoCallViewController: AgoraRtcEngineDelegate {
func rtcEngine(_ engine: AgoraRtcEngineKit,
firstRemoteVideoDecodedOfUid uid:UInt, size:CGSize, elapsed:Int) {
if (remoteVideo.isHidden) {
remoteVideo.isHidden = false
}let videoCanvas = AgoraRtcVideoCanvas()
videoCanvas.uid = uid
videoCanvas.view = remoteVideo
videoCanvas.renderMode = .hidden
agoraKit.setupRemoteVideo(videoCanvas)
}internal func rtcEngine(_ engine: AgoraRtcEngineKit, didOfflineOfUid uid:UInt, reason:AgoraUserOfflineReason) {self.remoteVideo.isHidden = true
}func rtcEngine(_ engine: AgoraRtcEngineKit, didVideoMuted muted:Bool, byUid:UInt) {remoteVideo.isHidden = muted
remoteVideoMutedIndicator.isHidden = !muted
}
}

Conference Calls

ARKit

Connecting the Agora SDK to ARKit enables augmented reality in real-time conference calls. As described on the ARSCNView documentation: “The view automatically renders the live video feed from the device camera as the scene background." In our implementation of AR Face Tracking, we are going to setup ARKit to track our face locally rather than remotely.

ARKit

To enable AR Face Tracking for local video, configure the local video method to the ARSCNView(). Then, configure the view controller to check whether there is support for AR in the following way:

func setupLocalVideo(uid: UInt) {


let videoView = ARSCNView()
videoView.tag = Int(uid)
videoView.backgroundColor = UIColor.orange

// let videoCanvas = AgoraRtcVideoCanvas()
// videoCanvas.uid = uid
// videoCanvas.view = videoView
// videoCanvas.renderMode = .hidden
// agoraKit.setupLocalVideo(videoCanvas) self.sceneView = videoView
self.sceneView?.delegate = self

if isARSupported {
let configuration = ARFaceTrackingConfiguration()
sceneView?.session.run(configuration)
}

let tapGuesture = UITapGestureRecognizer(target: self, action: #selector(handleTap(_:)))
tapGuesture.numberOfTapsRequired = 1
self.sceneView?.addGestureRecognizer(tapGuesture)stackView.addArrangedSubview(videoView)
}

To enable the ARSCNView() to display emojis, there are a few additional steps that you must take. The steps are outlined in detail in Ray Wenderlich’s blog post.

Briefly, let’s review those steps here:

  • Import ARKit:
import ARKit 
  • Add a configuration:
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)

// 1
let configuration = ARFaceTrackingConfiguration()

// 2
sceneView.session.run(configuration)
}

override func viewWillDisappear(_ animated: Bool) {
super.viewWillDisappear(animated)

// 1
sceneView.session.pause()
}
  • Add mesh mask:
// 1
extension Bling: ARSCNViewDelegate {
// 2
func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {

// 3
guard let device = sceneView.device else {
return nil
}

// 4
let faceGeometry = ARSCNFaceGeometry(device: device)

// 5
let node = SCNNode(geometry: faceGeometry)

// 6
node.geometry?.firstMaterial?.fillMode = .lines

// 7
return node
}
}
  • Add the following code to viewDidLoad():
sceneView.delegate = self
  • Update the mesh mask:
// 1
func renderer(
_ renderer: SCNSceneRenderer,
didUpdate node: SCNNode,
for anchor: ARAnchor) {

// 2
guard let faceAnchor = anchor as? ARFaceAnchor,
let faceGeometry = node.geometry as? ARSCNFaceGeometry else {
return
}

// 3
faceGeometry.update(from: faceAnchor.geometry)
}
  • Add emoji bling:
let noseOptions = ["👃", "🐽", "💧", " "]
  • Add emoji nodes:
// 1
node.geometry?.firstMaterial?.transparency = 0.0

// 2
let noseNode = EmojiNode(with: noseOptions)

// 3
noseNode.name = "nose"

// 4
node.addChildNode(noseNode)

// 5
updateFeatures(for: node, using: faceAnchor)

Build and run the application. If everything works out properly, then you have a mesh mask over your local video. Tapping it allows you to add a gesture from one of the elements in the emoji array! If you need an exact copy of the codebase, then check it out on Pastebin.

Additional Face Tracking Integrations for ARKit

Add new options for your eyes, mouth or head with the following arrays:

let eyeOptions = ["👁", "🌕", "🌟", "🔥", "⚽️", "🔎", " "]
let mouthOptions = ["👄", "👅", "❤️", " "]
let hatOptions = ["🎓", "🎩", "🧢", "⛑", "👒", " "]

In your renderer(_:nodeFor:) add the details:

let leftEyeNode = EmojiNode(with: eyeOptions)
leftEyeNode.name = "leftEye"
leftEyeNode.rotation = SCNVector4(0, 1, 0, GLKMathDegreesToRadians(180.0))
node.addChildNode(leftEyeNode)

let rightEyeNode = EmojiNode(with: eyeOptions)
rightEyeNode.name = "rightEye"
node.addChildNode(rightEyeNode)

let mouthNode = EmojiNode(with: mouthOptions)
mouthNode.name = "mouth"
node.addChildNode(mouthNode)

let hatNode = EmojiNode(with: hatOptions)
hatNode.name = "hat"
node.addChildNode(hatNode)

To index these features, add the following:

let features = ["nose", "leftEye", "rightEye", "mouth", "hat"]
let featureIndices = [[9], [1064], [42], [24, 25], [20]]

Update the updateFeatures(for:using:) method with the following:

// 1
for (feature, indices) in zip(features, featureIndices) {
// 2
let child = node.childNode(withName: feature, recursively: false) as? EmojiNode

// 3
let vertices = indices.map { anchor.geometry.vertices[$0] }

// 4
child?.updatePosition(for: vertices)
}

If you correctly implemented the code snippets from this section, you should be able to add eyes, a mouth, or a hat. Here is the code for the view controller:

What’s Next?

AR Face Tracking is a great way to make multiple-member calling much more engaged than the average conference call. If you’re into beauty products, clothing, or other kinds of shopping, then you and your friends might like trying AR makeup, hair coloring, skin smoothing, gloss, lipstick, teeth-whitening, or face-shaping. There are a lot of different ways to integrate AR Face Tracking for real-time engagement.