My apologies in advance if I'm missing something obvious here.
The _wordbreaks and _grapheme_breaks functions, while useful, currently return void instead of the number of breaks written to the output array. Is there a reason why it would be inappropriate to return the number of breaks (or number of clusters) in this context? I'm not opposed to scanning the result buffer to determine this information, but the second pass strikes me as unnecessary.
In my particular case I need to split strings at grapheme boundaries based on user supplied integers, and it would make sense to skip the operation entirely if (n >= array_units || n >= grapheme_clusters).