1 year ago

#303811

test-img

eshirima

Reality Composer Extract Depth Frames

I recorded an ARSession video using Reality Composer and it is saved in a .mov file. I am able to extract the RGBA frames with a Playground script but cannot find a way to extract the depth data as well. The code below is what I used to extract the RGBA frames

let movieURL = URL(fileURLWithPath: moviePath)
let movieAsset = AVAsset(url: movieURL)

func getFirstFrame()-> UIImage
{
    let reader = try! AVAssetReader(asset: movieAsset)

    let videoTrack = movieAsset.tracks(withMediaType: .video)[0]
    
    let trackReaderOutput = AVAssetReaderTrackOutput(track: videoTrack, outputSettings:[String(kCVPixelBufferPixelFormatTypeKey): NSNumber(value: kCVPixelFormatType_32BGRA)])

    reader.add(trackReaderOutput)
    reader.startReading()
    
    while let sampleBuffer = trackReaderOutput.copyNextSampleBuffer()
    {
        if let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
        {                
            return UIImage(ciImage: CIImage(cvImageBuffer: imageBuffer))
        }
    }
    
    return UIImage()
}

Using hachoir.metadata (in Python), the video file's metadata shows that there is depth images of dimensions 640x480.

Metadata:
- Duration: 49 sec 211 ms
- Image width: 1920 pixels
- Image width: 640 pixels
- Image height: 1440 pixels
- Image height: 480 pixels
- Creation date: 2022-02-24 21:17:45
- Last modification: 2022-02-24 21:18:34
- Comment: Play speed: 100.0%
- Comment: User volume: 100.0%
- MIME type: video/quicktime
- Endianness: Big endian

Following this WWDC 2017 tutorial on editing depth images, it states that in order to create a depth map we need the asset's auxiliary/metadata (around the 10:30 timestamp). Upon executing movieAsset.metadata, this is what I get but not sure of how to use it.

[<AVMetadataItem: 0x60000339c260, identifier=mdta/com.apple.framework.state.MOVStreamIO, keySpace=mdta, key class = __NSCFString, key=com.apple.framework.state.MOVStreamIO, commonKey=(null), extendedLanguageTag=(null), dataType=com.apple.metadata.datatype.raw-data, time={INVALID}, duration={INVALID}, startDate=(null), extras={
    dataType = 0;
    dataTypeNamespace = "com.apple.quicktime.mdta";
}, value class=__NSCFData, value length=751>, <AVMetadataItem: 0x60000339c420, identifier=mdta/com.apple.framework.state.MOVKit, keySpace=mdta, key class = __NSCFString, key=com.apple.framework.state.MOVKit, commonKey=(null), extendedLanguageTag=(null), dataType=com.apple.metadata.datatype.JSON, time={INVALID}, duration={INVALID}, startDate=(null), extras={
    dataType = 82;
    dataTypeNamespace = "com.apple.quicktime.mdta";
}, value class=__NSDictionaryI, value={
    CFBundleIdentifier = "com.apple.RealityComposer";
    OSBuildVersion = "15.3.1 (19D52)";
    ProductType = "iPad8,9";
    extrinsicsSWToW =     (
        "0.9999345541000366",
        "-0.004342526663094759",
        "0.010582538321614265",
        "-11.96997165679932",
        "0.0044228383339941502",
        "0.9999614953994751",
        "-0.0075775277800858021",
        "-4.6730286307195e-09",
        "-0.010549225844442844",
        "0.0076238368637859821",
        "0.9999153017997742",
        "-1.47887710966188e-08"
    );
    extrinsicsToJasper =     {
        "AVCaptureDeviceTypeBuiltInUltraWideCamera.1" =         (
            "0.00013852206757292151",
            "-0.9999991655349731",
            "-0.0012816637754440308",
            "12.42925262451172",
            "0.9999954104423523",
            "0.00013466033851727843",
            "0.0030126951169222593",
            "-8.268698692321777",
            "-0.0030125211924314499",
            "-0.0012820751871913671",
            "0.9999946355819702",
            "0.077870555222034454"
        );
        "AVCaptureDeviceTypeBuiltInWideAngleCamera.1" =         (
            "0.0044674728997051716",
            "-0.9999503493309021",
            "-0.0089068468660116196",
            "12.48272800445557",
            "0.9999613165855408",
            "0.004534644540399313",
            "-0.0075357104651629925",
            "3.700810432434082",
            "0.0075757256709039211",
            "-0.0088728368282318115",
            "0.9999319314956665",
            "0.1685517877340317"
        );
    };
    "hw.model" = J417AP;
    "mdta/com.apple.arkit.arkitversion" = "364.24.1";
    "mdta/com.apple.arkit.arsensordatatypeinformation" = 15;
    "mdta/com.apple.arkit.osversion" = 19D52;
    movKitVersion = "1.1.0";
    version = "151.27.0.1";
}>, <AVMetadataItem: 0x60000339c460, identifier=mdta/com.apple.recordingEnvironment, keySpace=mdta, key class = __NSCFString, key=com.apple.recordingEnvironment, commonKey=(null), extendedLanguageTag=(null), dataType=com.apple.metadata.datatype.JSON, time={INVALID}, duration={INVALID}, startDate=(null), extras={
    dataType = 82;
    dataTypeNamespace = "com.apple.quicktime.mdta";
}, value class=__NSDictionaryI, value={
    CFBundleIdentifier = "com.apple.RealityComposer";
    OSBuildVersion = "15.3.1 (19D52)";
    ProductType = "iPad8,9";
    extrinsicsSWToW =     (
        "0.9999345541000366",
        "-0.004342526663094759",
        "0.010582538321614265",
        "-11.96997165679932",
        "0.0044228383339941502",
        "0.9999614953994751",
        "-0.0075775277800858021",
        "-4.6730286307195e-09",
        "-0.010549225844442844",
        "0.0076238368637859821",
        "0.9999153017997742",
        "-1.47887710966188e-08"
    );
    extrinsicsToJasper =     {
        "AVCaptureDeviceTypeBuiltInUltraWideCamera.1" =         (
            "0.00013852206757292151",
            "-0.9999991655349731",
            "-0.0012816637754440308",
            "12.42925262451172",
            "0.9999954104423523",
            "0.00013466033851727843",
            "0.0030126951169222593",
            "-8.268698692321777",
            "-0.0030125211924314499",
            "-0.0012820751871913671",
            "0.9999946355819702",
            "0.077870555222034454"
        );
        "AVCaptureDeviceTypeBuiltInWideAngleCamera.1" =         (
            "0.0044674728997051716",
            "-0.9999503493309021",
            "-0.0089068468660116196",
            "12.48272800445557",
            "0.9999613165855408",
            "0.004534644540399313",
            "-0.0075357104651629925",
            "3.700810432434082",
            "0.0075757256709039211",
            "-0.0088728368282318115",
            "0.9999319314956665",
            "0.1685517877340317"
        );
    };
    "hw.model" = J417AP;
    "mdta/com.apple.arkit.arkitversion" = "364.24.1";
    "mdta/com.apple.arkit.arsensordatatypeinformation" = 15;
    "mdta/com.apple.arkit.osversion" = 19D52;
    movKitVersion = "1.1.0";
    version = "151.27.0.1";
}>]

I am aware that the tutorial is for image editing but if there's a way of applying it on videos I'd appreciate any guidance. I'm also open to different solutions/ideas.

python

ios

swift

arkit

reality-composer

0 Answers

Your Answer

Accepted video resources