qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v12 21/36] qapi: Tighten the regex on valid name


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v12 21/36] qapi: Tighten the regex on valid names
Date: Wed, 18 Nov 2015 19:08:26 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Eric Blake <address@hidden> writes:

> On 11/18/2015 04:51 AM, Markus Armbruster wrote:
>> Eric Blake <address@hidden> writes:
>> 
>>> We already documented that qapi names should match specific
>>> patterns (such as starting with a letter unless it was an enum
>>> value or a downstream extension).  Tighten that from a suggestion
>>> into a hard requirement, which frees up names beginning with a
>>> single underscore for qapi internal usage.
>> 
>> If we care enough about naming conventions to document them, enforcing
>> them makes only sense.
>> 
>>>                                             Also restrict things
>>> to avoid 'a__b' or 'a--b' (that is, dash and underscore must not
>>> occur consecutively).
>> 
>> I feel this is entering "foolish names" territory, where things like
>> "IcAnFiNdNaMeSeVeNlEsSrEaDaBlEtHaNStudlyCaps" live.  Catching fools
>
> IhAdToOmUcHfUnReAdInGtHaT
>
> [It's amazing that scholars can ever manage to correctly interpret old
> written languages, thanks to omitted word breaks (scriptio continua) or
> omitted vowels (such as in Hebrew)]

Indeed.

>> automatically is generally futile, they're too creative :)
>> 
>> Let's drop this part.
>
> Sure, I can do that. I'll post a fixup patch, as it will affect docs too.
>
>> 
>>> Add a new test, reserved-member-underscore, to demonstrate the
>>> tighter checking.
>>>
>>> Signed-off-by: Eric Blake <address@hidden>
>>>
>>> ---
>>> v12: new patch
>>> ---
>
>>> +incompatibly in a future release.  All names must begin with a letter,
>>> +and contain only ASCII letters, digits, dash, and underscore, where
>>> +dash and underscore cannot occur consecutively.  There are two
>
> Namely, right here.
>
>>> +# Names must be letters, numbers, -, and _.  They must start with letter,
>>> +# except for downstream extensions which must start with __RFQDN_.
>>> +# Dots are only valid in the downstream extension prefix.
>>> +valid_name = re.compile('^(__[a-zA-Z][a-zA-Z0-9.]*_)?'
>>> +                        '[a-zA-Z][a-zA-Z0-9]*([_-][a-zA-Z0-9]+)*$')
>> 
>> This regexp consists of two parts: optional __RFQDN_ prefix and the name
>> proper.
>> 
>> The latter stays simpler if we don't attempt to catch adjacent [-_].
>> 
>> The former isn't quite right.  According to RFC 822 Appendix 1 - Domain
>> Name Syntax Specification:
>> 
>> * We must accept '-' in addition to digits.
>> 
>> * We require RFQDN to start with a letter.  This is still a loose match.
>>   The tight match is "labels separated by dots", where labels consist of
>>   letters, digits and '-', starting with a letter.  I wouldn't bother
>>   checking first characters are letters at all.
>> 
>> Recommend
>> 
>>    valid_name = re.compile('^(__[a-zA-Z0-9.-]+_)?'
>>                            '[a-zA-Z][a-zA-Z0-9_-]*$')
>
> Sure, that works for me. It's tighter than what we had before, but
> looser than what I proposed so that it allows more RFQDN.  It
> potentially lets users confuse us by abusing 'foo__max' or similar, but
> I'm less worried about that abuse - it's okay to stop the blatantly
> obvious mistakes without worrying about the corner cases, if it makes
> the maintenance easier for the cases we do catch.
>
>> 
>>>
>>>
>>>  def check_name(expr_info, source, name, allow_optional=False,
>>> @@ -374,8 +376,8 @@ def check_name(expr_info, source, name, 
>>> allow_optional=False,
>>>                                  % (source, name))
>>>      # Enum members can start with a digit, because the generated C
>>>      # code always prefixes it with the enum name
>>> -    if enum_member:
>>> -        membername = '_' + membername
>>> +    if enum_member and membername[0].isdigit():
>> 
>> What's wrong with the old condition?
>> 
>>> +        membername = 'D' + membername
>
> The old code prepended a lone '_', which doesn't work any more with the
> tighter valid_name regex, so we have to use some other prefix (I picked
> 'D').
>
>>>      # Reserve the entire 'q_' namespace for c_name()
>>>      if not valid_name.match(membername) or \
>>>         c_name(membername, False).startswith('q_'):
>
> The other thing is that I realized that an enum value of 'q-int' was
> getting munged to '_q-int' which no longer gets flagged by the c_name()
> filter.  But neither would 'Dq-int' get flagged.  So limiting the
> munging to just enum values that start with a digit (where we know it
> doesn't start with q) means we don't weaken the second condition.

Aha!  Worth a comment, I think.

> I guess I need to call that out in the commit message; all the more
> reason for me to send a fixup.

Yes, please!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]