Writing a user-mode HID device driver
Prelude
I use my Wacom Intuos S tablet a lot, and I’ve been using open source drivers for it ever since I purchased it, for the pen to act as a mouse.
Given this, I had always wanted to make my own device driver for the tablet, and given my use case for it was quite simple, I knew it wouldn’t be super hard to make one. While it wasn’t super complicated, I was surprised at how much there was to learn.
So, in this post, I’ll be going through how I managed to make a HID driver for the tablet, given the very sparse documentation and little information on the topic.
Inspection
At first I wanted to figure out what kind of driver I wanted to make. There were a few options; I could make a kernel device driver, which, to me, was out of the picture not because of difficulty but rather because it was completely overkill for a device such as a drawing tablet. Given this device didn’t have to be loaded at boot or at the kernel level, I decided to fallback on some user-mode options. Out of these, I was between 2: UMDF and HID drivers. UMDF, as opposed to KMDF, is a user-mode driver framework (hence the name) provided by the Windows Driver Kit (WDK). However, I chose to use HID drivers for various reasons:
- HID is a pretty good standard; it abstracts a lot of unnecessary complexity and works quite well for devices like these, along with mice, game controllers, etc.
- UMDF, as mentioned before, is provided by WDK, which is Windows. If I were to scale this to Linux (which is planned), then it would make it impossibly difficult to port over given it’d just be a major rewrite, whereas with HID there are pre-existing libraries that provide cross-platform support with little refactoring.
For this article, I will be focussing on Windows, given I used WinAPI HID functions to assist me in making the driver rather than considering cross-platform at first.
Now, it was time to figure out how the hell we make a HID driver for a tablet. Or any device for that matter. (Now I have had experience with these before, but for the sake of this, let’s assume I do not.)
MSDN provides this as their introduction to HID and as a form of “documentation.” Honestly, its a nightmare to navigate, but I assume they don’t expect that many people to be learning this niche unless they have some form of low-level experience.
First Steps
The first idea that came to mind was to be able to retrieve a list of devices from the OS, and to somehow be able to filter out my tablet in the process. Looking through the “helpful” MSDN docs, we see HidD_GetAttributes
. At first glance, this seems great. In theory we should be able to get PHIDD_ATTRIBUTES
as an output from this function, which contains the product id, vendor id and various metadata about the device. However, this function also requires a HANDLE
to the device, which we currently do not have. So how do we get a HANDLE
? Sieving through MSDN even more, we can find a common WinAPI function called CreateFile
which, if you’ve worked with WinAPI before, you would probably have worked with. This can allow us to open the device and get a handle, as if it were a “file”.
CreateFile is defined as such:
1
2
3
4
5
6
7
8
9
10
HANDLE CreateFileA(
[in] LPCSTR lpFileName,
[in] DWORD dwDesiredAccess,
[in] DWORD dwShareMode,
[in, optional] LPSECURITY_ATTRIBUTES lpSecurityAttributes,
[in] DWORD dwCreationDisposition,
[in] DWORD dwFlagsAndAttributes,
[in, optional] HANDLE hTemplateFile
);
As you can see, the main thing we need to get a specific HANDLE
to a device, is through its “lpFileName
”, the device path.
Searching MSDN for “Device Path” leads to a few results, however the simplest I found was PSP_DEVICE_INTERFACE_DETAIL_DATA. This struct contains a device path, and the relevant function for the struct is SetupDiGetDeviceInterfaceDetail as seen in the footer of the struct docs.
We’re almost there. SetupDiGetDeviceInterfaceDetail
is defined as such:
1
2
3
4
5
6
7
8
WINSETUPAPI BOOL SetupDiGetDeviceInterfaceDetailA(
[in] HDEVINFO DeviceInfoSet,
[in] PSP_DEVICE_INTERFACE_DATA DeviceInterfaceData,
[out, optional] PSP_DEVICE_INTERFACE_DETAIL_DATA_A DeviceInterfaceDetailData,
[in] DWORD DeviceInterfaceDetailDataSize,
[out, optional] PDWORD RequiredSize,
[out, optional] PSP_DEVINFO_DATA DeviceInfoData
);
Most of which is quite trivial to obtain, given the further functions required are linked in the docs. Namely SetupDiGetClassDevs
to obtain the DeviceInfoSet argument and SetupDiEnumDeviceInterfaces
for the DeviceInterfaceData argument.
We can generate some simple test code to obtain our devices.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
GUID HidGUID = {};
HidD_GetHidGuid(&HidGUID);
const HDEVINFO DeviceInfoSet = SetupDiGetClassDevs(&HidGUID, nullptr, nullptr, DIGCF_PRESENT | DIGCF_DEVICEINTERFACE);
if (DeviceInfoSet == INVALID_HANDLE_VALUE)
{
std::println(std::cerr, "SetupDiGetClassDevs failed");
return -1;
}
SP_DEVICE_INTERFACE_DATA DeviceInterfaceData = {};
DeviceInterfaceData.cbSize = sizeof(SP_DEVICE_INTERFACE_DATA);
for (DWORD DeviceIndex = 0; SetupDiEnumDeviceInterfaces(DeviceInfoSet, nullptr, &HidGUID, DeviceIndex, &DeviceInterfaceData); DeviceIndex++)
{
// First call to obtain required allocated size
DWORD RequiredSize = 0;
SetupDiGetDeviceInterfaceDetail(DeviceInfoSet, &DeviceInterfaceData, nullptr, 0, &RequiredSize, nullptr);
// Allocate memory for device detail data
const PSP_DEVICE_INTERFACE_DETAIL_DATA DeviceInterfaceDetailData = reinterpret_cast<PSP_DEVICE_INTERFACE_DETAIL_DATA>(new std::uint8_t[RequiredSize]);
DeviceInterfaceDetailData->cbSize = sizeof(SP_DEVICE_INTERFACE_DETAIL_DATA);
if (!SetupDiGetDeviceInterfaceDetail(DeviceInfoSet, &DeviceInterfaceData, DeviceInterfaceDetailData, RequiredSize, nullptr, nullptr))
{
std::println(std::cerr, "SetupDiGetDeviceInterfaceDetail failed");
delete[] DeviceInterfaceDetailData;
continue;
}
const HANDLE DeviceHandle =
CreateFile(DeviceInterfaceDetailData->DevicePath, GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, nullptr, OPEN_EXISTING, 0, nullptr);
if (DeviceHandle == INVALID_HANDLE_VALUE)
{
delete[] DeviceInterfaceDetailData;
continue;
}
HIDD_ATTRIBUTES HidAttributes = {};
HidD_GetAttributes(DeviceHandle, &HidAttributes);
std::println("Vendor Id: {} | Product Id: {}", HidAttributes.VendorID, HidAttributes.ProductID);
CloseHandle(DeviceHandle);
delete[] DeviceInterfaceDetailData;
}
Now this may seem like a mess, since thats usually just how WinAPI looks, but the code isnt super complicated. Just a chain of functions to get our HID device list and handle to the device. This will be very useful for any operations that need to be done and any communications that need to be done with the device.
Obtaining Device Data
Now that we have our handle to any device we want, we need to find the tablet device. The most intuitive way I thought of doing this was to filter by the vendor ID, given I only have one Wacom device. It’s not hard to find that the vendor ID for Wacom is 0x56A
. Hence we can simply just check if the vendor ID matches, and append to some form of list, or return the HANDLE
.
With this device HANDLE
we should also be able to get the name of the device using HidD_GetProductString
. Testing this:
1
2
3
4
5
6
7
8
9
10
if (HidAttributes.VendorID == 0x56A)
{
WCHAR ProductName[MAX_PATH] = {};
if (!HidD_GetProductString(DeviceHandle, ProductName, sizeof(ProductName)))
{
std::println(std::cerr, "HidD_GetProductString failed");
}
std::wprintf(L"Product Name: %s", ProductName);
}
We get an output
1
> Product Name: Intuos S
This confirms we now have the correct device. So now, how do we read any form of data from it? Well with the current information we have this is very easy. Just as we used CreateFile
to obtain a HANDLE
to the device, we can use ReadFile
to read any data from it.
Implementation
Now that we have our device HANDLE
, we can start reading data from the device. Using ReadFile
.
1
2
3
4
5
6
7
BYTE ReportBuffer[REPORT_SIZE];
DWORD BytesRead = 0;
while (ReadFile(DeviceHandle, ReportBuffer, sizeof(ReportBuffer), &BytesRead, nullptr))
{
}
However we need REPORT_SIZE
which I have defined as the size of each report to read. To obtain this with a little bit of digging online we can find this. From this (it was such a pain) we can find that each report size should be a 192 byte array. Hence we can #define REPORT_SIZE 192
.
If we output the buffer content by looping through it we get… Nothing…Well to be honest this is expected. The device is not going to send any useful data unless being interacted with. When moving the pen near the device, we get a buffer as follows:
1
2
10 40 6f 17 0 f2 1a 0 0 0 0 0 0 0 0 0 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Where each byte is in hex
Now looking at a bunch of these reports, from just hovering the pen:
1
2
3
4
5
6
7
8
9
10
11
12
13
10 60 ca 21 0 d7 1a 0 0 0 0 0 0 0 0 0 30 a8 5 80 3a 62 8 10 0 62 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 60 3e 23 0 ef 1a 0 0 0 0 0 0 0 0 0 30 a8 5 80 3a 62 8 10 0 62 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 60 56 24 0 e9 1a 0 0 0 0 0 0 0 0 0 35 a8 5 80 3a 62 8 10 0 62 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 60 9 25 0 cf 1a 0 0 0 0 0 0 0 0 0 34 a8 5 80 3a 62 8 10 0 62 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 60 74 25 0 b2 1a 0 0 0 0 0 0 0 0 0 38 a8 5 80 3a 62 8 10 0 62 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
It seems as if the first byte 0x10
is always constant, and so is 0x60
(we’ll come onto this one later).
The next 2 bytes, 0x74 0x25
in the last report, seem to be some form of varying value. Lets assume this is the X coordinate.
This is followed by a 0x00
(which I have no idea what it is, nor do I use it in my actual driver. Seems to be permanently stuck at 0
)
Then the next 2 bytes similar to the X coordinate, also vary when moving the pen around.
From this simple inspection we have arrived at what is basically a data structure for the report. Writing the C++ code for this can go as follows:
1
2
3
4
5
6
7
8
9
10
11
12
#pragma pack(push, 1)
struct WacomReport
{
std::int8_t MagicNumber; // 0x10 through my testing
std::int8_t ProximityFlag; // 0x40 when reports are being fed, 0x60 when the pen is in close proximity, 0x61 when the pen is touching the tablet
std::int16_t X;
std::uint8_t Unknown; // 0x00 through my testing (not sure what this is for)
std::int16_t Y;
std::int8_t TheRest[REPORT_SIZE - 7]; // The rest of the report I don't care about
};
#pragma pack(pop)
Notice, in my actual struct I have named the second byte as a ProximityFlag
. This is because of its behavior, it switches depending on the state of the pen near the device. It is also used to detect when the pen is touching, since it switches to 0x61
when this is the case.
From this we can now start parsing assumed X and Y coordinates from each report.
The code is quite simple, as follows:
1
2
3
4
5
6
while (ReadFile(WacomDevice, ReportBuffer, sizeof(ReportBuffer), &BytesRead, nullptr))
{
WacomReport Report = *reinterpret_cast<WacomReport*>(ReportBuffer);
std::println("{}, {}", Report.X, Report.Y);
}
And the respective output is also as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
9946, 6680
9578, 6640
9238, 6600
8862, 6565
8507, 6533
8131, 6509
7769, 6491
7408, 6480
7045, 6481
6711, 6490
6368, 6516
6071, 6549
5769, 6592
5505, 6649
5273, 6717
5055, 6797
4858, 6888
4703, 6996
Now obviously, I have no actual way of proving to you that these coordinates are accurate, however when moving the pen horizontal its clear that our “assumed X coordinate” also behaves as you’d expect, so our initial assumptions must have been correct.
Coordinate Mapping
Given we now can obtain a device handle, read the respective data buffer from the device and now also parse said data, it seems like we’re very close to completion. While this may be the case, we still need to figure out a crucial part of the driver. These coordinates are clearly not display coordinates, we need some form of way of mapping these so-called Wacom coordinates to the screen.
I’ll show the code for it first, and then explain the reasoning behind each step and section:
1
2
3
4
5
6
7
8
Vector2i MapToScreenCoordinates(std::uint16_t DeviceX, std::uint16_t DeviceY, const std::uint32_t ScreenWidth, const std::uint32_t ScreenHeight)
{
float NormalizedX = static_cast<float>(DeviceX - DEVICE_X_MIN) / (DEVICE_X_MAX - DEVICE_X_MIN);
float NormalizedY = static_cast<float>(DeviceY - DEVICE_Y_MIN) / (DEVICE_Y_MAX - DEVICE_Y_MIN);
std::uint32_t MappedX = static_cast<uint32_t>(NormalizedX * ScreenWidth);
std::uint32_t MappedY = static_cast<uint32_t>(NormalizedY * ScreenHeight);
return {MappedX, MappedY};
}
First, let’s define what each variable actually is:
DeviceX
andDeviceY
are the raw tablet coordinates provided by the device.DEVICE_X_MIN
andDEVICE_X_MAX
represent the minimum and maximum X-coordinate range of the tablet (similarly for Y). These are obtained by extensive googling and the documentation I mentioned earlier.
Now, the math.
(DeviceX - DEVICE_X_MIN)
: This converts the raw device coordinate into a value relative to the minimum. For example, if the X-coordinate ranges from 0 to 15200, this step moves the coordinate to a 0-based system.(DEVICE_X_MAX - DEVICE_X_MIN)
: This gives the total range of the tablet in the X-axis (similarly for Y).
The division of these two values normalizes the tablet coordinate to a range of [0, 1]. When DeviceX = DEVICE_X_MIN, the result is 0, and vice versa for Y. This converts the tablet’s raw coordinate range into a proportion, regardless of the size of the tablet.
Then we simply scale using this proportion using the screen size, hence this:
1
2
std::uint32_t MappedX = static_cast<uint32_t>(NormalizedX * ScreenWidth);
std::uint32_t MappedY = static_cast<uint32_t>(NormalizedY * ScreenHeight);
After normalization, NormalizedX and NormalizedY are in the range [0, 1]. By multiplying these normalized values by the screen dimensions (ScreenWidth and ScreenHeight), the coordinates are scaled to fit the screen.
When NormalizedX = 0, the mapped X is 0 (leftmost position). When NormalizedX = 1, the mapped X is ScreenWidth (rightmost position).
We then simply use it as follows:
1
const Utilities::Vector2i ScreenCoordinates = Utilities::MapToScreenCoordinates(Report.X, Report.Y, ScreenWidth, ScreenHeight);
Final Steps
Now we basically have all the components to make the driver actually work. We can use the screen coordinate mapping in the reading data buffer loop to get the converted screen coordinate for every input recieved. We can then send that screen coordinate as a mouse movement through the Windows API (I will not be going into this, since it’s not actually relevant and its quite trivial. If you really care that much, look into the project GitHub)
Similarly we can also use the ProximityFlag
I talked about earlier to check whether or not the pen is touching, hence simulating a mouse button press, or in fact any button you want.
1
2
3
4
5
6
7
8
if (Report.ProximityFlag == WACOM_TOUCH_FLAG)
{
// Hold the mouse
}
else
{
// Release the mouse
}
Or something to that effect. Well, that’s basically it. If you want the full working source-code or the product feel free to check it out here If you have any questions, feel free to contact me on my discord, just.cabbage
.
Thanks for reading if you made it this far <3