-
Notifications
You must be signed in to change notification settings - Fork 37
AirPlay2 Protocol
AirPlay is a proprietary protocol stack/suite developed by Apple Inc. that allows wireless streaming between devices of audio, video, device screens, and photos, together with related metadata. Originally implemented only in Apple's software and devices, it was called AirTunes and used for audio only.
-- Wikipedia
This guide is for educational purposes only.
You don't need to follow this guide to use airplay with compatible devices.
This guide is not meant to incite hacking.
Protocols used | Enc/Dec Algorithm used | Audio and Video foundation used |
---|---|---|
mDNS | ed25519 | H264 |
HTTP | AES CBC | AAC |
RTP | AES CTR | ALAC |
RTSP | curve25519 | PWM |
NTP | - | - |
AirPlay can find devices thanks to mDNS protocol.
In a local network, the receiving device advertises two services (AirTunes service and AirPlay service) publishing 'A', 'TXT', 'PTR' and 'SRV' records.
The caller, on the other hand, sends an IP multicast query message to identify the receiver.
AirTunes service is used to exchange informations between devices.
AirPlay service is used to send/receive audio and video streaming.
If we try to sniff the local network traffic with WireShark we can find the DNS records published by the receiver.
Filter used on WireShark: ip.src == 192.168.1.197 && mdns.
PTR: retrieve info about available receiver's services
TXT: retrieve info about functionality available on receiver (ex: supported encryption types and other similar metadata)
SRV: retrieve info about services ports
TXT records, contains informations about receiver functionality, here a list of possible values:
Service | Key | Value | Description |
---|---|---|---|
AirTunes | txtvers | 1 | TXT record version |
AirTunes | ch | 2 | number of audio channels |
AirTunes | cn | 0,1,2,3 | audio codecs |
AirTunes | et | 0,3,5 | supported encryption types |
AirTunes | md | 0,1,2 | supported metadata types |
AirTunes | pw | false | speaker require password |
AirTunes | sr | 44100 | audio sample rate |
AirTunes | ss | 16 | audio sample size |
AirTunes | da | true | ???? |
AirTunes | sv | false | ???? |
AirTunes | ft | 0x5A7FFFF7,0x1E,0x4A7FFFF7 | available features |
AirTunes | am | AppleTV5,3 | device model |
AirTunes | pk | hex string | public key |
AirTunes | sf | 0x4 | ???? |
AirTunes | tp | UDP | supported transport (UDP, TCP) |
AirTunes | vn | 65537 | ???? |
AirTunes | vs | 220.68 | receiver version |
AirTunes | vv | 2 | ???? |
AirPlay | deviceid | 00:00:00:00:00 | mac address |
AirPlay | features | 0x5A7FFFF7,0x1E,0x4A7FFFF7 | available features |
AirPlay | flags | 20 bit hex number | bitfield of status flags |
AirPlay | model | AppleTV5,3 | device model |
AirPlay | pk | hex string | public key |
AirPlay | pi | aa072a95-0318-4ec3-b042-4992495877d3 | PublicCUAirPlayPairingIdentifier |
AirPlay | srcvers | 220.68 | receiver version |
AirPlay | vv | 2 | ???? |
value | description |
---|---|
0 | PCM |
1 | Apple Lossless (ALAC) |
2 | AAC |
3 | ELD (Enhanced Low Delay) |
value | description |
---|---|
0 | no encryption |
1 | RSA (AirPort Express) |
3 | FairPlay |
4 | MFiSAP (3rd-party devices) |
5 | FairPlay SAPv2.5 |
value | description |
---|---|
0 | text |
1 | artwork |
2 | progress |
Features bit values (source)
bit | name | description |
---|---|---|
0 | Video | video supported |
1 | Photo | photo supported |
2 | VideoFairPlay | video protected with FairPlay DRM |
3 | VideoVolumeControl | volume control supported for videos |
4 | VideoHTTPLiveStreams | http live streaming supported |
5 | Slideshow | slideshow supported |
7 | Screen | mirroring supported |
8 | ScreenRotate | screen rotation supported |
9 | Audio | audio supported |
11 | AudioRedundant | audio packet redundancy supported |
12 | FPSAPv2pt5_AES_GCM | FairPlay secure auth supported |
13 | PhotoCaching | photo preloading supported |
14 | Authentication4 | Authentication type 4. FairPlay authentication |
15 | MetadataFeature1 | bit 1 of MetadataFeatures. Artwork. |
16 | MetadataFeature2 | bit 2 of MetadataFeatures. Progress. |
17 | MetadataFeature0 | bit 0 of MetadataFeatures. Text. |
18 | AudioFormat1 | support for audio format 1 |
19 | AudioFormat2 | support for audio format 2. This bit must be set for AirPlay 2 connection to work |
20 | AudioFormat3 | support for audio format 3. This bit must be set for AirPlay 2 connection to work |
21 | AudioFormat4 | support for audio format 4 |
23 | Authentication1 | Authentication type 1. RSA Authentication |
26 | HasUnifiedAdvertiserInfo | |
27 | SupportsLegacyPairing | |
30 | RAOP | RAOP is supported on this port. With this bit set your don't need the AirTunes service |
32 | IsCarPlay / SupportsVolume | Don’t read key from pk record it is known |
33 | SupportsAirPlayVideoPlayQueue | |
34 | SupportsAirPlayFromCloud | |
38 | SupportsCoreUtilsPairingAndEncryption | SupportsHKPairingAndAccessControl, SupportsSystemPairing and SupportsTransientPairing implies SupportsCoreUtilsPairingAndEncryption |
40 | SupportsBufferedAudio | Bit needed for device to show as supporting multi-room audio |
41 | SupportsPTP | Bit needed for device to show as supporting multi-room audio |
42 | SupportsScreenMultiCodec | |
43 | SupportsSystemPairing | |
46 | SupportsHKPairingAndAccessControl | |
48 | SupportsTransientPairing | SupportsSystemPairing implies SupportsTransientPairing |
50 | MetadataFeature4 | bit 4 of MetadataFeatures. binary plist. |
51 | SupportsUnifiedPairSetupAndMFi | Authentication type 8. MFi authentication |
52 | SupportsSetPeersExtendedMessage |
Flag bit values (source)
bit | name | description |
---|---|---|
0 | Problem has been detected | Defined in CarPlay section of MFi spec. Not seen set anywhere |
1 | Device is not configured | Defined in CarPlay section of MFi spec. Not seen set anywhere |
2 | Audio cable is attached | Defined in CarPlay section of MFi spec. Seen on AppleTV, Denon AVR, HomePod, Airport Express |
3 | PINRequired | |
6 | SupportsAirPlayFromCloud | |
7 | PasswordRequired | |
9 | OneTimePairingRequired | |
10 | DeviceWasSetupForHKAccessControl | |
11 | DeviceSupportsRelay | Shows in logs as relayable. When set iOS will connect to the device to get currently playing track. |
12 | SilentPrimary | |
13 | TightSyncIsGroupLeader | |
14 | TightSyncBuddyNotReachable | |
15 | IsAppleMusicSubscriber | Shows in logs as music |
16 | CloudLibraryIsOn | Shows in logs as iCML |
17 | ReceiverSessionIsActive | Shows in logs as airplay-receiving. Set when Apple TV is receiving anything via AirPlay. |
Filter used on WireShark: (ip.src==CLINET_IP && ip.dst==RECEIVER_IP)
When you find the correct request/response, you can click Analyze->Follow->TCP Stream to see the full request/response on WireShark.
These are the time-ordered requests and responses.
You can also understand this from the CSeq header as it is incremental.
------ REQUEST GET /info ------
GET /info RTSP/1.0
X-Apple-ProtocolVersion: 1
Content-Length: 70
Content-Type: application/x-apple-binary-plist
CSeq: 0
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
bplist00...Yqualifier..ZtxtAirPlay..................................."
Request is 'x-apple-binary-plist', after decoding:
<plist version="1.0">
<dict>
<key>qualifier</key>
<array>
<string>txtAirPlay</string>
</array>
</dict>
</plist>
Useful information:
'x-apple-binary-plist' is a special binary encoded plist (apple's binary property list format).
To understand how to decode 'x-apple-binary-plist' format, read this amazing article by Christos Karaiskos.
------ RESPONSE GET /info ------
RTSP/1.0 200 OK
Content-Length: 1689
Content-Type: application/x-apple-binary-plist
Server: AirTunes/220.68
CSeq: 0
bplist00.......YaudioType........
.....$&(*.... .
...%')+TtypeXdisplaysTuuid_..audioInputFormatsXfeatures[refreshRate.. "..!!._..aa:54:01:af:c3:c1...dUmodel.<VheightZAppleTV2,1]sourceVersion_..keepAliveLowPower.-/123456(9;<.0!!!0.78:!=]widthPhysicalV220.68.......[overscanned[widthPixelsO. .w'...n....R^....R..h?.!....$eT.ZmacAddress...,.....\audioFormatsTname.Rvv.....Z..._..inputLatencyMicros[statusFlagsWAppleTV.. "..!!.Wdefault_.$2e388006-13ba-4041-9a67-25dd4a43d536......._..outputLatencyMicros^audioLatenciesXrotation..\heightPixelsVmaxFPSXdeviceID_..audioOutputFormats_.$e0ff8a27-6738-3d56-8a16-cc53aacee925_..keepAliveSendStatsAsBody^heightPhysical.eUwidthRpiRpk..#..8............R.C...".d...j.N.....g.....W.T...+.
.:...M...............v.i...v... .....a.m.?.P.....................j...........H.@...............>................
Response is 'x-apple-binary-plist', after decoding:
<plist version="1.0">
<dict>
<key>sdk</key>
<string>AirPlay;2.1.1-f.1</string>
<key>sourceVersion</key>
<string>377.17.24.6</string>
<key>statusFlags</key>
<integer>580</integer>
<key>pi</key>
<string>2A:1B:57:36:38:D4</string>
<key>name</key>
<string>Samsung 7 Series (43)</string>
<key>build</key>
<string>17.24.6</string>
<key>model</key>
<string>UNU7400</string>
<key>txtAirPlay</key>
<string>BWFjbD0wGmRldmljZWlkPTcwOjJBOkQ1OjI0OkIyOjkzG2ZlYXR1cmVzPTB4N0Y4QUQwLDB4MzhCQ0I0Ngdyc2Y9MHgzCmZ2PXAyMC4wLjELZmxhZ3M9MHgyNDQNbW9kZWw9VU5VNzQwMBRtYW51ZmFjdHVyZXI9U2Ftc3VuZxxzZXJpYWxOdW1iZXI9MEJQWDNTSUs5MDQ5MjBODXByb3RvdmVycz0xLjETc3JjdmVycz0zNzcuMTcuMjQuNhRwaT0yQToxQjo1NzozNjozODpENChwc2k9MDAwMDAwMDAtMDAwMC0wMDAwLTAwMDAtMkExQjU3MzYzOEQ0KGdpZD0wMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0yQTFCNTczNjM4RDQGZ2NnbD0wQ3BrPWIyYmI2YzAyOGM4MjgxOTczMDU2YzYyYzNmMzk4NmFhODVjNjhhOWJhZjgzYzBiYjViMzA1NzA4NWI2MzdiZjc=</string>
<key>PTPInfo</key>
<string>OpenAVNU ArtAndLogic-aPTP-changes Commit: 17f0335 on Sep 22, 2018</string>
<key>protocolVersion</key>
<string>1.1</string>
<key>audioLatencies</key>
<array>
<dict>
<key>inputLatencyMicros</key>
<integer>0</integer>
<key>type</key>
<integer>100</integer>
<key>outputLatencyMicros</key>
<integer>0</integer>
</dict>
<dict>
<key>inputLatencyMicros</key>
<integer>0</integer>
<key>audioType</key>
<string>default</string>
<key>type</key>
<integer>100</integer>
<key>outputLatencyMicros</key>
<integer>0</integer>
</dict>
<dict>
<key>inputLatencyMicros</key>
<integer>0</integer>
<key>audioType</key>
<string>media</string>
<key>type</key>
<integer>100</integer>
<key>outputLatencyMicros</key>
<integer>0</integer>
</dict>
<dict>
<key>inputLatencyMicros</key>
<integer>0</integer>
<key>audioType</key>
<string>telephony</string>
<key>type</key>
<integer>100</integer>
<key>outputLatencyMicros</key>
<integer>0</integer>
</dict>
<dict>
<key>inputLatencyMicros</key>
<integer>0</integer>
<key>audioType</key>
<string>speechRecognition</string>
<key>type</key>
<integer>100</integer>
<key>outputLatencyMicros</key>
<integer>0</integer>
</dict>
<dict>
<key>inputLatencyMicros</key>
<integer>0</integer>
<key>audioType</key>
<string>alerts</string>
<key>type</key>
<integer>100</integer>
<key>outputLatencyMicros</key>
<integer>0</integer>
</dict>
</array>
<key>pk</key>
<data>
sHcn1vbNbgi1jt5SXsPN6qJSrZ9oP+shLviiBSRlVOc=
</data>
<key>features</key>
<integer>255521305393072848</integer>
<key>displays</key>
<array>
<dict>
<key>height</key>
<integer>1080</integer>
<key>width</key>
<integer>1920</integer>
<key>rotation</key>
<false/>
<key>widthPhysical</key>
<false/>
<key>heightPhysical</key>
<false/>
<key>widthPixels</key>
<integer>1920</integer>
<key>heightPixels</key>
<integer>1080</integer>
<key>refreshRate</key>
<integer>60</integer>
<key>features</key>
<integer>14</integer>
<key>maxFPS</key>
<integer>30</integer>
<key>overscanned</key>
<false/>
<key>uuid</key>
<string>e0ff8a27-6738-3d56-8a16-cc53aacee925</string>
</dict>
</array>
</dict>
</plist>
WARN: This response was sniffed by WireShark from my Smart TV (Samsung 7 Series 43) and not from an AppleTV.
------ REQUEST POST /pair-setup ------
POST /pair-setup RTSP/1.0
Content-Length: 32
Content-Type: application/octet-stream
CSeq: 1
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
...............d.......?..1...Rt
------ RESPONSE POST /pair-setup ------
RTSP/1.0 200 OK
Content-Type: application/octet-stream
Content-Length: 32
Server: AirTunes/220.68
CSeq: 1
....M.r..Ek!S.......b...r...s.P3
Client (iOS Device) send this request to ask for our Ed25519 public key.
It send a body of 32 bytes and we must return 32 bytes.
You can ignore the request body and return the key.
Before AppleTV returns 32 bytes to the client, call 'FdkDecodeAudioFun8(rawData, 32, jg, out_size, 1, sessionId);' function where:
- rawData: request body
- 32: body length
- jg: ???
- out_size: response length
- sessionId: id used to know current context (AppleTV supporting up to 16 sessions)
Someone has found that method on 'libhpplayaudio.so' library.
After some analysis and a lot of assembly code they came up with this diagram:-- hkeyxif
------ REQUEST POST /pair-verify [CSeq: 2] ------
POST /pair-verify RTSP/1.0
X-Apple-PD: 1
X-Apple-AbsoluteTime: 566789538
Content-Length: 68
Content-Type: application/octet-stream
CSeq: 2
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
.....L?..Fl/...j.Z3...d.....J..s.i37...............a.......|..0...Rt
------ RESPONSE POST /pair-verify [CSeq: 2] ------
RTSP/1.0 200 OK
Content-Type: application/octet-stream
Content-Length: 96
Server: AirTunes/220.68
CSeq: 2
..a..?..abme......3|
|k............r...s2d8..J
..l.dd..a.....?....F..(..+ ..7f.~.x~..|.........
------ REQUEST POST /pair-verify [CSeq: 3] ------
POST /pair-verify RTSP/1.0
X-Apple-PD: 1
X-Apple-AbsoluteTime: 566789538
Content-Length: 68
Content-Type: application/octet-stream
CSeq: 3
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
.............|.<....-..s.w...w....r...K.Lp...}.L
..Q....r_o...T.k2."
------ RESPONSE POST /pair-verify [CSeq: 3] ------
RTSP/1.0 200 OK
Content-Type: application/octet-stream
Content-Length: 0
Server: AirTunes/220.68
CSeq: 3
Client (iOS Device) send 68 bytes request.
First 4 bytes 01 00 00 00 -> use 01 as flag to check type of verify.
If flag is 01 the remaining bytes are divided as follows [CSeq: 3]:
- 32 bytes ecdh_their
- 32 bytes ed_their
Here we must create a ecdh_shared (ed_our + ecdh_theirs) used to initialize the AES CTR 128 chiper.
This ecdh_shared will also be used in the next request [CSeq: 3], to verify the client's signature.
The return packet is 96 bytes.
- First 32 bytes is our ecdh_ours (generated w/ curve25519)
- Second 64 bytes is the Ed25519 signature of (ecdh_ours + ecdh_theirs) encrypted with AES CTR 128 encryption.
If flag is 00 the remaining bytes are divided as follows [CSeq: 3]:
- 64 bytes signature
Here we need to check the signature sent by the client to make sure everything went well.
We must initialize the AES CTR 128 chiper with ecdh_shared key and Verify the signature with Ed25519 algorithm.
Before AppleTV returns 96 bytes to the client, call 'FdkDecodeAudioFun9(rawData, 68, jg, out_size, 1, sessionId);' function where:
- rawData: request body
- 68: body length
- jg: ???
- out_size: response length
- sessionId: id used to know current context (AppleTV supporting up to 16 sessions)
After some analysis and a lot of assembly code they came up with this diagram:
For more specific details see the source article.-- hkeyxif
------ REQUEST POST /fp-setup [CSeq: 4] ------
POST /fp-setup RTSP/1.0
X-Apple-ET: 32
Content-Length: 16
Content-Type: application/octet-stream
CSeq: 4
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
FPLY...............
------ RESPONSE POST /fp-setup [CSeq: 4] ------
RTSP/1.0 200 OK
Content-Length: 142
Server: AirTunes/220.68
Content-Type: application/octet-stream
FPLY..............D.....K.L/...........a....?....vd.J...Z....g....q...f....h..A>
SK.[.r..t..E.......O.uY............U.B.....V.@...=.u....
------ REQUEST POST /fp-setup [CSeq: 5] ------
POST /fp-setup RTSP/1.0
X-Apple-ET: 32
Content-Length: 164
Content-Type: application/octet-stream
CSeq: 5
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
FPLY..................as90d..K./......A.vv.CC.??....^l8asd([~/. ....Z......O.up.....6q>2....L...+.?.??....^l8..E.................L#....
------ RESPONSE POST /fp-setup [CSeq: 5] ------
RTSP/1.0 200 OK
Content-Length: 32
Server: AirTunes/220.68
Content-Type: application/octet-stream
FPLY..........A.l........B.....
[CSeq: 4] Client (iOS Device) send 16 bytes request.
The 5th byte must be 0x03.
The 15th byte is used to understand which 'mode' to use.
Based on the fifteenth byte, the answer will be:
byte | return value |
---|---|
0x00 | 0x46,0x50,0x4c,0x59,0x03,0x01,0x02,0x00,0x00,0x00,0x00,0x82,0x02,0x00,0x0f,0x9f,0x3f,0x9e,0x0a,0x25,0x21,0xdb,0xdf,0x31,0x2a,0xb2,0xbf,0xb2,0x9e,0x8d,0x23,0x2b,0x63,0x76,0xa8,0xc8,0x18,0x70,0x1d,0x22,0xae,0x93,0xd8,0x27,0x37,0xfe,0xaf,0x9d,0xb4,0xfd,0xf4,0x1c,0x2d,0xba,0x9d,0x1f,0x49,0xca,0xaa,0xbf,0x65,0x91,0xac,0x1f,0x7b,0xc6,0xf7,0xe0,0x66,0x3d,0x21,0xaf,0xe0,0x15,0x65,0x95,0x3e,0xab,0x81,0xf4,0x18,0xce,0xed,0x09,0x5a,0xdb,0x7c,0x3d,0x0e,0x25,0x49,0x09,0xa7,0x98,0x31,0xd4,0x9c,0x39,0x82,0x97,0x34,0x34,0xfa,0xcb,0x42,0xc6,0x3a,0x1c,0xd9,0x11,0xa6,0xfe,0x94,0x1a,0x8a,0x6d,0x4a,0x74,0x3b,0x46,0xc3,0xa7,0x64,0x9e,0x44,0xc7,0x89,0x55,0xe4,0x9d,0x81,0x55,0x00,0x95,0x49,0xc4,0xe2,0xf7,0xa3,0xf6,0xd5,0xba |
0x01 | 0x46,0x50,0x4c,0x59,0x03,0x01,0x02,0x00,0x00,0x00,0x00,0x82,0x02,0x01,0xcf,0x32,0xa2,0x57,0x14,0xb2,0x52,0x4f,0x8a,0xa0,0xad,0x7a,0xf1,0x64,0xe3,0x7b,0xcf,0x44,0x24,0xe2,0x00,0x04,0x7e,0xfc,0x0a,0xd6,0x7a,0xfc,0xd9,0x5d,0xed,0x1c,0x27,0x30,0xbb,0x59,0x1b,0x96,0x2e,0xd6,0x3a,0x9c,0x4d,0xed,0x88,0xba,0x8f,0xc7,0x8d,0xe6,0x4d,0x91,0xcc,0xfd,0x5c,0x7b,0x56,0xda,0x88,0xe3,0x1f,0x5c,0xce,0xaf,0xc7,0x43,0x19,0x95,0xa0,0x16,0x65,0xa5,0x4e,0x19,0x39,0xd2,0x5b,0x94,0xdb,0x64,0xb9,0xe4,0x5d,0x8d,0x06,0x3e,0x1e,0x6a,0xf0,0x7e,0x96,0x56,0x16,0x2b,0x0e,0xfa,0x40,0x42,0x75,0xea,0x5a,0x44,0xd9,0x59,0x1c,0x72,0x56,0xb9,0xfb,0xe6,0x51,0x38,0x98,0xb8,0x02,0x27,0x72,0x19,0x88,0x57,0x16,0x50,0x94,0x2a,0xd9,0x46,0x68,0x8a |
0x02 | 0x46,0x50,0x4c,0x59,0x03,0x01,0x02,0x00,0x00,0x00,0x00,0x82,0x02,0x02,0xc1,0x69,0xa3,0x52,0xee,0xed,0x35,0xb1,0x8c,0xdd,0x9c,0x58,0xd6,0x4f,0x16,0xc1,0x51,0x9a,0x89,0xeb,0x53,0x17,0xbd,0x0d,0x43,0x36,0xcd,0x68,0xf6,0x38,0xff,0x9d,0x01,0x6a,0x5b,0x52,0xb7,0xfa,0x92,0x16,0xb2,0xb6,0x54,0x82,0xc7,0x84,0x44,0x11,0x81,0x21,0xa2,0xc7,0xfe,0xd8,0x3d,0xb7,0x11,0x9e,0x91,0x82,0xaa,0xd7,0xd1,0x8c,0x70,0x63,0xe2,0xa4,0x57,0x55,0x59,0x10,0xaf,0x9e,0x0e,0xfc,0x76,0x34,0x7d,0x16,0x40,0x43,0x80,0x7f,0x58,0x1e,0xe4,0xfb,0xe4,0x2c,0xa9,0xde,0xdc,0x1b,0x5e,0xb2,0xa3,0xaa,0x3d,0x2e,0xcd,0x59,0xe7,0xee,0xe7,0x0b,0x36,0x29,0xf2,0x2a,0xfd,0x16,0x1d,0x87,0x73,0x53,0xdd,0xb9,0x9a,0xdc,0x8e,0x07,0x00,0x6e,0x56,0xf8,0x50,0xce |
0x03 | 0x46,0x50,0x4c,0x59,0x03,0x01,0x02,0x00,0x00,0x00,0x00,0x82,0x02,0x03,0x90,0x01,0xe1,0x72,0x7e,0x0f,0x57,0xf9,0xf5,0x88,0x0d,0xb1,0x04,0xa6,0x25,0x7a,0x23,0xf5,0xcf,0xff,0x1a,0xbb,0xe1,0xe9,0x30,0x45,0x25,0x1a,0xfb,0x97,0xeb,0x9f,0xc0,0x01,0x1e,0xbe,0x0f,0x3a,0x81,0xdf,0x5b,0x69,0x1d,0x76,0xac,0xb2,0xf7,0xa5,0xc7,0x08,0xe3,0xd3,0x28,0xf5,0x6b,0xb3,0x9d,0xbd,0xe5,0xf2,0x9c,0x8a,0x17,0xf4,0x81,0x48,0x7e,0x3a,0xe8,0x63,0xc6,0x78,0x32,0x54,0x22,0xe6,0xf7,0x8e,0x16,0x6d,0x18,0xaa,0x7f,0xd6,0x36,0x25,0x8b,0xce,0x28,0x72,0x6f,0x66,0x1f,0x73,0x88,0x93,0xce,0x44,0x31,0x1e,0x4b,0xe6,0xc0,0x53,0x51,0x93,0xe5,0xef,0x72,0xe8,0x68,0x62,0x33,0x72,0x9c,0x22,0x7d,0x82,0x0c,0x99,0x94,0x45,0xd8,0x92,0x46,0xc8,0xc3,0x59 |
[CSeq: 5] Client (iOS Device) send 164 bytes request.
The 5th byte must be 0x03.
You must save the 164 bytes because this is the KeyMessage.
In the next step I will explain when and how to use this KeyMessage.
You must return 32 bytes to the Client.
First 12 bytes are fairplay header (0x46, 0x50, 0x4c, 0x59, 0x03, 0x01, 0x04, 0x00, 0x00, 0x00, 0x00, 0x14).
The remaining bytes are the last 20 bytes of the request.
After the handshake protocol, there will be two SETUP requests used to initialize screen Mirroring.
------ REQUEST SETUP rtsp:// [CSeq: 6] ------
SETUP rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 535
Content-Type: application/x-apple-binary-plist
CSeq: 6
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
bplist00...........
..
.................RetSeiv^timingProtocol[sessionUUIDVosName^osBuildVersion]sourceVersionZtimingPort_..isScreenMirroringSessionYosVersionTekeyXdeviceIDUmodelTnameZmacAddress. O....mD..9o.YRR.0./SNTP_.$43C10532-7CBC-419E-9BB3-528F7D6F9AE0YiPhone OSV16A404W371.4.7.. V12.0.1O.HFPLY.......<.....nT=......9..X......w.Jw9.t.v..iK.c....Tj.u..G..KL.....X_..DC:A3:F1:B2:A6:DAYiPhone9,1jT..2v.. .i.P.h.o.n.e_..DC:0C:5C:B7:D6:D8...).,.0.?.K.R.a.o.z.......................
....... .'.r......................................
------ RESPONSE SETUP rtsp:// [CSeq: 6] ------
RTSP/1.0 200 OK
Content-Length: 0
Server: AirTunes/220.68
CSeq: 6
------ REQUEST SETUP rtsp:// [CSeq: 10] ------
SETUP rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 188
Content-Type: application/x-apple-binary-plist
CSeq: 10
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
bplist00...Wstreams.........Ttype]timestampInfo_..streamConnectionID.n. .....
.TnameUSubSu.
UBePxT.
.UAfPxT.
.UBefEn.
.UEmEnc.D...6QD......!/DFLOTZ]cfloux~................................
------ RESPONSE SETUP rtsp:// [CSeq: 10] ------
RTSP/1.0 200 OK
Content-Length: 120
Content-Type: application/x-apple-binary-plist
Server: AirTunes/220.68
bplist00..l.n.....YeventPort...ZtimingPortWstreamsXdataPort....cTtype...
. .E*;
2.@....=...............................L
The first request is 'x-apple-binary-plist' data, after decoding:
// REQUEST [CSeq: 6] DECODED
<plist version="1.0">
<dict>
<key>et</key>
<integer>32</integer>
<key>eiv</key>
<data>
Ct34RID9D/KJALLCJzDWi==
</data>
<key>timingProtocol</key>
<string>NTP</string>
<key>sessionUUID</key>
<string>532E49A1-E89A-75D3-A355-426614181992</string>
<key>osName</key>
<string>iPhone OS</string>
<key>osBuildVersion</key>
<string>18A8395</string>
<key>sourceVersion</key>
<string>420.4.7</string>
<key>timingPort</key>
<integer>60373</integer>
<key>isScreenMirroringSession</key>
<true/>
<key>osVersion</key>
<string>14.4.2</string>
<key>ekey</key>
<data>
RlCFAVGFAQHFAAA7ASDDAALLvUD1C2vqRjLK9wtJY6v9AS9d5dLLzn2JSJ2ysNpS4VasdkFHlOkAusDqUeXzEoAiDLdF/5Y
</data>
<key>deviceID</key>
<string>2A:1B:57:36:38:D4</string>
<key>model</key>
<string>iPhone13,3</string>
<key>name</key>
<string>SteeBono</string>
<key>macAddress</key>
<string>DC:2A:4C:A7:B2:E4</string>
</dict>
</plist>
As you can see the first request give us useful informations like:
- ekey: AES Key (we have to save this data because we will need it later)
- eiv: AES IV (we have to save this data because we will need it later)
- timingPort: port used for the heartbeat (you can change in the response)
- timingProtocol: protocol used to send timing data
- isScreenMirroringSession: boolean used to indicate the type of streaming (video or audio only)
Response for the first request must be 200 OK with a 'x-apple-binary-plist' body like that:
<plist version="1.0">
<dict>
<key>eventPort</key>
<integer>52244</integer>
<key>timingPort</key>
<integer>7011</integer>
</plist>
Here you can return the same port used by AirTunes and manage timing and event requests directly from the AirTunes service.
Receiver response must contains the following:
- eventPort: port used from the client to send events to the receiver
Receiver response can contains the following:
- timingPort: port used from the client to send heartbeat to the receiver (only if you want change the port sent from client)
The second request is 'x-apple-binary-plist' data, after decoding:
<plist version="1.0">
<dict>
<key>streams</key>
<array>
<dict>
<key>type</key>
<integer>110</integer>
<key>timestampInfo</key>
<array>
<dict>
<key>name</key>
<string>SubSu</string>
</dict>
<dict>
<key>name</key>
<string>BePxT</string>
</dict>
<dict>
<key>name</key>
<string>AfPxT</string>
</dict>
<dict>
<key>name</key>
<string>BefEn</string>
</dict>
<dict>
<key>name</key>
<string>EmEnc</string>
</dict>
</array>
<key>streamConnectionID</key>
<integer>298347298472738472</integer>
</dict>
</array>
</dict>
</plist>
As you can see the second request give us two useful infos:
- type: type of streaming
- 96: Real time audio
- 103: Buffered audio
- 110: Screen Mirroring
- 120: Playback
- 130: Remote control
- streamConnectionID: id of current connection (we have to save this data because we will need it later)
Here you must initialize Mirroring Service on 7020 port to handle H264 data.
Response for the second request must be 200 OK with a 'x-apple-binary-plist' body like that:
<plist version="1.0">
<dict>
<key>streams</key>
<array>
<dict>
<key>dataPort</key>
<integer>7020</integer>
<key>type</key>
<integer>110</integer>
</dict>
</array>
</dict>
</plist>
Receiver response must contains the following:
- type: type of streaming
- 96: Real time audio
- 103: Buffered audio
- 110: Screen Mirroring
- 120: Playback
- 130: Remote control
- dataPort: port used from the client to send video streaming data to the receiver
After the handshake protocol (if you streaming only audio) or after mirroring data SETUP, there will be another SETUP request used to initialize Audio streaming.
------ REQUEST SETUP rtsp:// [CSeq: 17] ------
SETUP rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 199
Content-Type: application/x-apple-binary-plist
CSeq: 17
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
bplist00...Wstreams........
..
.
......ZlatencyMax^redundantAudioZlatencyMinRctSspf[controlPort[usingScreen[audioFormatTtype............. ......`....(3BMPT`lx}.......................................
------ RESPONSE SETUP rtsp:// [CSeq: 17] ------
RTSP/1.0 200 OK
Content-Length: 118
Content-Type: application/x-apple-binary-plist
Server: AirTunes/220.68
bplist00..D........ ..
..ZtimingPortWstreams..hXdataPort.`Ttype[controlPort.$.
/.?,:8................................K
The request is 'x-apple-binary-plist' data, after decoding:
<plist version="1.0">
<dict>
<key>streams</key>
<array>
<dict>
<key>latencyMax</key>
<integer>3750</integer>
<key>redundantAudio</key>
<integer>2</integer>
<key>latencyMin</key>
<integer>3750</integer>
<key>ct</key>
<integer>8</integer>
<key>spf</key>
<integer>480</integer>
<key>controlPort</key>
<integer>63658</integer>
<key>usingScreen</key>
<true/>
<key>audioFormat</key>
<integer>16777216</integer>
<key>type</key>
<integer>96</integer>
</dict>
</array>
</dict>
</plist>
As you can see this request give us useful informations like:
- latencyMax: the audio latency
- redundantAudio: redundancy when transmitting audio frames across a lossy network transport
- latencyMin: the audio latency
- ct: compression type
- spf: frames per packet
- controlPort: port used to request resend lost packet
- usingScreen: boolean used to indicate the type of streaming (video + audio or audio only)
- audioFormat: the audio format
- 0x0: PCM
- 0x40000: ALAC (96 AppleLossless, 96 352 0 16 40 10 14 2 255 0 0 44100)
- 0x400000: AAC (96 mpeg4-generic/44100/2, 96 mode=AAC-main; constantDuration=1024)
- 0x1000000: AAC_ELD (96 mpeg4-generic/44100/2, 96 mode=AAC-eld; constantDuration=480)
- type: type of streaming
- 96: Real time audio
- 103: Buffered audio
- 110: Screen Mirroring
- 120: Playback
- 130: Remote control
Response for this request must be 200 OK with a 'x-apple-binary-plist' body like that:
<plist version="1.0">
<dict>
<key>streams</key>
<array>
<dict>
<key>dataPort</key>
<integer>34505</integer>
<key>controlPort</key>
<integer>40945</integer>
<key>type</key>
<integer>96</integer>
</dict>
</array>
</dict>
</plist>
Here you must return the dataPort and controlPort on your server that you wish the client to use for audio, as well as the type 96.
This is similar to the earlier type 110 response, except for the additional control port (Note the data port used for type 96 audio must be different from the one used for type 110 video).
Before we can decrypt the video stream, we need to decrypt the AES Key received during the first SETUP request.
To decrypt the AES Key, we need the KeyMessage that we have saved after receiving the CSeq 5 request.
I'd like to explain how the AES key is decrypted, but I have no idea how it's done.
Below is a small piece of code that I wrote starting from the original written in C (OmgHax):
After the SETUP, the client will start sending the encrypted H264 video stream.
Filter used on WireShark: (ip.src==RECEIVER_IP || ip.src==SENDER_IP) && (ip.dst==RECEIVER_IP || ip.dst==SENDER_IP) && ( udp || (tcp.srcport != SENDER_AIRTUNES_PORT && tcp.dstport != RECEIVER_AIRTUNES_PORT))
In WireShark you can see the parsing by right-clicking -> UDP Package -> Decode As -> NTP.
After decrypting the AES key, we can initialize the AES CTR Decrypter.
To initialize the CTR Decrypter we need to:
- Perform a combined hash between AES Key and Ecdh Shared (result: eaesHash)
- Perform a combined hash between a concatenation of "AirPlayStreamKey" + streamConnectionId and first 16 bytes of eaesHash (result: keyHash)
- Perform a combined hash between a concatenation of "AirPlayStreamIv" + streamConnectionId and first 16 bytes of eaesHash (result: ivHash)
- Take first 16 bytes of keyHash and ivHash to extract decrypted AES Key and decrypted AES IV
Something like that:
The package we will receive from the client will consist of:
- payloadsize: size of encrypted data
- payloadtype: type of data (0 - decrypt video data, 1 - process sps/pps)
- payloadoption: ????
- pts: data used to instantiate H264 Codec
- other: ????
- data: mirroring data (ENCRYPTED, we can DECRYPT with CTR Decrypter initialized in the previous step)
After some analysis and a lot of assembly code they came up with this diagram:
For more specific details see the source article.-- hkeyxif
After the SETUP, the client will start sending the encrypted audio stream.
After decrypting the AES key, we can initialize the AES CBC Decrypter.
To initialize the CBC Decrypter we need to:
- Perform a combined hash between AES Key and Ecdh Shared (result: eaesHash)
- Initialize AES CBC Decrypter with eaesKey and aesIV received during the first SETUP request
Something like that:
The package we will receive from the client will consist of:
- flag: ????
- type: type of data (0x56 - decrypt audio data, 0x54 - process RTP headers with NTP time)
- seq number: sequential number foreach packages
- timestamp: the timestamp
- ssrc: ????
- data: audio data (ENCRYPTED, we can DECRYPT with CBC Decrypter initialized in the previous step)
After some analysis and a lot of assembly code they came up with this diagram:
For more specific details see the source article.-- hkeyxif
Timing port is used as an ntp pair.
Receiver -> Client
80 d2 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 83 aa 7e 80 00 00 00 f3
Client -> Receiver
80 d3 00 07 00 00 00 00 83 aa 7e 80 00 00 00 f3 83 b7 bc e9 3b d6 ea c8 83 b7 bc e9 3b e1 ae 70
Receiver send 32 bytes.
First 24 bytes are fixed, last 8 bytes are the transmission time of the ntp time.
Client send 32 bytes. First 8 bytes are fixed, last 24 bytes are Original Timestamp, Reveice Timestamp and Transmit Timestamp.
Control port is used to receive infos about RTP and to receive the retransmitted audio data.
If type is 0x56 the package will contain the retransmitted audio package If type is 0x54 the package will contain infos about RTP
Here you will find other generic requests handled by the AirTunes service.
------ REQUEST GET_PARAMETER rtsp:// [CSeq: x] ------
GET_PARAMETER rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 8
Content-Type: text/parameters
CSeq: x
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
volume
------ RESPONSE GET_PARAMETER rtsp:// [CSeq: x] ------
RTSP/1.0 200 OK
Content-Type: text/parameters
Content-Length: 13
Server: AirTunes/220.68
CSeq: x
volume: 0.0
The client makes this call when it wants to know the receiver's volume level.
The body of the request has the content type 'text/parameters'; the parameter here is 'volume'.
Response for that request must be 200 OK with a 'text/parameters' body like that:
volume: 10.0\r\n
Where 'volume' is the parameter and '10.0' is the value.
Pay attention, in this case the 'SET_PARAMETER' request is used to set the volume, but it could be used to change for example the cover of the album being played, the title of a song or other information.
------ REQUEST SET_PARAMETER rtsp:// [CSeq: x] ------
SET_PARAMETER rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 20
Content-Type: text/parameters
CSeq: x
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
volume: -12.000000
------ RESPONSE SET_PARAMETER rtsp:// [CSeq: x] ------
RTSP/1.0 200 OK
Server: AirTunes/220.68
CSeq: x
The client makes this call when it wants to change the receiver's volume level.
The body of the request has the content type 'text/parameters'; the parameter here is 'volume' and the value is '-12.000000'.
Response for that request must be 200 OK without body.
------ REQUEST POST /feedback [CSeq: x] ------
POST /feedback RTSP/1.0
CSeq: x
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
------ RESPONSE POST /feedback [CSeq: x] ------
RTSP/1.0 200 OK
Server: AirTunes/220.68
CSeq: x
The client makes this call to ensure the receiver is alive.
The classic 'heartbeat'.
Response for that request must be 200 OK without body.
------ REQUEST TEARDOWN rtsp:// [CSeq: x] ------
TEARDOWN rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 69
Content-Type: application/x-apple-binary-plist
CSeq: x
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
bplist00...Wstreams.....Ttype.`......................................
------ RESPONSE TEARDOWN rtsp:// [CSeq: x] ------
RTSP/1.0 200 OK
Connection: close
Server: AirTunes/220.68
CSeq: x
The client makes this call when it wants to stop screen mirroring and audio streaming.
The request is 'x-apple-binary-plist' data, after decoding:
<plist version="1.0">
<dict>
<key>streams</key>
<array>
<dict>
<key>type</key>
<integer>96</integer>
</dict>
</array>
</dict>
</plist>
As you can see, the client sends the type of service to be destroyed.
Type:
- 96: receiver can destroy audio service
- 110: receiver can destroy mirroring service
Having this information will make implementing the protocol easier than expected.
It is also possible to create the client following the same logic in reverse.
Thanks to this project, I can say that I have improved my skills.
Thanks for reading,
S.