1 year ago
#303811
eshirima
Reality Composer Extract Depth Frames
I recorded an ARSession video using Reality Composer and it is saved in a .mov file. I am able to extract the RGBA frames with a Playground script but cannot find a way to extract the depth data as well. The code below is what I used to extract the RGBA frames
let movieURL = URL(fileURLWithPath: moviePath)
let movieAsset = AVAsset(url: movieURL)
func getFirstFrame()-> UIImage
{
let reader = try! AVAssetReader(asset: movieAsset)
let videoTrack = movieAsset.tracks(withMediaType: .video)[0]
let trackReaderOutput = AVAssetReaderTrackOutput(track: videoTrack, outputSettings:[String(kCVPixelBufferPixelFormatTypeKey): NSNumber(value: kCVPixelFormatType_32BGRA)])
reader.add(trackReaderOutput)
reader.startReading()
while let sampleBuffer = trackReaderOutput.copyNextSampleBuffer()
{
if let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
{
return UIImage(ciImage: CIImage(cvImageBuffer: imageBuffer))
}
}
return UIImage()
}
Using hachoir.metadata (in Python), the video file's metadata shows that there is depth images of dimensions 640x480.
Metadata:
- Duration: 49 sec 211 ms
- Image width: 1920 pixels
- Image width: 640 pixels
- Image height: 1440 pixels
- Image height: 480 pixels
- Creation date: 2022-02-24 21:17:45
- Last modification: 2022-02-24 21:18:34
- Comment: Play speed: 100.0%
- Comment: User volume: 100.0%
- MIME type: video/quicktime
- Endianness: Big endian
Following this WWDC 2017 tutorial on editing depth images, it states that in order to create a depth map we need the asset's auxiliary/metadata (around the 10:30 timestamp). Upon executing movieAsset.metadata
, this is what I get but not sure of how to use it.
[<AVMetadataItem: 0x60000339c260, identifier=mdta/com.apple.framework.state.MOVStreamIO, keySpace=mdta, key class = __NSCFString, key=com.apple.framework.state.MOVStreamIO, commonKey=(null), extendedLanguageTag=(null), dataType=com.apple.metadata.datatype.raw-data, time={INVALID}, duration={INVALID}, startDate=(null), extras={
dataType = 0;
dataTypeNamespace = "com.apple.quicktime.mdta";
}, value class=__NSCFData, value length=751>, <AVMetadataItem: 0x60000339c420, identifier=mdta/com.apple.framework.state.MOVKit, keySpace=mdta, key class = __NSCFString, key=com.apple.framework.state.MOVKit, commonKey=(null), extendedLanguageTag=(null), dataType=com.apple.metadata.datatype.JSON, time={INVALID}, duration={INVALID}, startDate=(null), extras={
dataType = 82;
dataTypeNamespace = "com.apple.quicktime.mdta";
}, value class=__NSDictionaryI, value={
CFBundleIdentifier = "com.apple.RealityComposer";
OSBuildVersion = "15.3.1 (19D52)";
ProductType = "iPad8,9";
extrinsicsSWToW = (
"0.9999345541000366",
"-0.004342526663094759",
"0.010582538321614265",
"-11.96997165679932",
"0.0044228383339941502",
"0.9999614953994751",
"-0.0075775277800858021",
"-4.6730286307195e-09",
"-0.010549225844442844",
"0.0076238368637859821",
"0.9999153017997742",
"-1.47887710966188e-08"
);
extrinsicsToJasper = {
"AVCaptureDeviceTypeBuiltInUltraWideCamera.1" = (
"0.00013852206757292151",
"-0.9999991655349731",
"-0.0012816637754440308",
"12.42925262451172",
"0.9999954104423523",
"0.00013466033851727843",
"0.0030126951169222593",
"-8.268698692321777",
"-0.0030125211924314499",
"-0.0012820751871913671",
"0.9999946355819702",
"0.077870555222034454"
);
"AVCaptureDeviceTypeBuiltInWideAngleCamera.1" = (
"0.0044674728997051716",
"-0.9999503493309021",
"-0.0089068468660116196",
"12.48272800445557",
"0.9999613165855408",
"0.004534644540399313",
"-0.0075357104651629925",
"3.700810432434082",
"0.0075757256709039211",
"-0.0088728368282318115",
"0.9999319314956665",
"0.1685517877340317"
);
};
"hw.model" = J417AP;
"mdta/com.apple.arkit.arkitversion" = "364.24.1";
"mdta/com.apple.arkit.arsensordatatypeinformation" = 15;
"mdta/com.apple.arkit.osversion" = 19D52;
movKitVersion = "1.1.0";
version = "151.27.0.1";
}>, <AVMetadataItem: 0x60000339c460, identifier=mdta/com.apple.recordingEnvironment, keySpace=mdta, key class = __NSCFString, key=com.apple.recordingEnvironment, commonKey=(null), extendedLanguageTag=(null), dataType=com.apple.metadata.datatype.JSON, time={INVALID}, duration={INVALID}, startDate=(null), extras={
dataType = 82;
dataTypeNamespace = "com.apple.quicktime.mdta";
}, value class=__NSDictionaryI, value={
CFBundleIdentifier = "com.apple.RealityComposer";
OSBuildVersion = "15.3.1 (19D52)";
ProductType = "iPad8,9";
extrinsicsSWToW = (
"0.9999345541000366",
"-0.004342526663094759",
"0.010582538321614265",
"-11.96997165679932",
"0.0044228383339941502",
"0.9999614953994751",
"-0.0075775277800858021",
"-4.6730286307195e-09",
"-0.010549225844442844",
"0.0076238368637859821",
"0.9999153017997742",
"-1.47887710966188e-08"
);
extrinsicsToJasper = {
"AVCaptureDeviceTypeBuiltInUltraWideCamera.1" = (
"0.00013852206757292151",
"-0.9999991655349731",
"-0.0012816637754440308",
"12.42925262451172",
"0.9999954104423523",
"0.00013466033851727843",
"0.0030126951169222593",
"-8.268698692321777",
"-0.0030125211924314499",
"-0.0012820751871913671",
"0.9999946355819702",
"0.077870555222034454"
);
"AVCaptureDeviceTypeBuiltInWideAngleCamera.1" = (
"0.0044674728997051716",
"-0.9999503493309021",
"-0.0089068468660116196",
"12.48272800445557",
"0.9999613165855408",
"0.004534644540399313",
"-0.0075357104651629925",
"3.700810432434082",
"0.0075757256709039211",
"-0.0088728368282318115",
"0.9999319314956665",
"0.1685517877340317"
);
};
"hw.model" = J417AP;
"mdta/com.apple.arkit.arkitversion" = "364.24.1";
"mdta/com.apple.arkit.arsensordatatypeinformation" = 15;
"mdta/com.apple.arkit.osversion" = 19D52;
movKitVersion = "1.1.0";
version = "151.27.0.1";
}>]
I am aware that the tutorial is for image editing but if there's a way of applying it on videos I'd appreciate any guidance. I'm also open to different solutions/ideas.
python
ios
swift
arkit
reality-composer
0 Answers
Your Answer