gh-53584: Prevent variable-nargs options from stealing required positional args#146513
Open
stephenfin wants to merge 2 commits intopython:mainfrom
Open
gh-53584: Prevent variable-nargs options from stealing required positional args#146513stephenfin wants to merge 2 commits intopython:mainfrom
stephenfin wants to merge 2 commits intopython:mainfrom
Conversation
Most of these are marked as expected fail for now, pending the fix. Signed-off-by: Stephen Finucane <stephen@that.guru>
… positional args
Options with nargs='?', nargs='*', or nargs='+' were greedily consuming
argument strings that should have been reserved for required positional
arguments. For example:
parser.add_argument('--foo', nargs='?')
parser.add_argument('bar')
parser.parse_args(['--foo', 'abc']) # bar got nothing
argparse works by pattern-matching, with the bulk of its logic defined
in _parse_known_args(). That method encodes each argument string as a
single character ('O' for a recognised option flag, 'A' for everything
else, and '-' for strings following '--') which gives us a resulting
string pattern (e.g. 'OAOA'). We then "consume" arguments using this
string pattern, handling both positionals (via the consume_positionals()
closure) and optionals (via consume_optional()).
The bug arises in the latter. consume_optional() processes options by
building a regex based on the option's nargs (e.g. '-*A?-*' for
nargs='?') and then runs this regex against the remainder of the string
pattern (i.e. anything following the option flag). This is done via
_match_argument(). The regex will always stop at the next option flag
('O' token) but for non-fixed nargs values like '?' and '+' may greedily
consume every positional ('A' token) up to that point.
This fix works by manipulating the string pattern as part of our
optional consumption. Any 'A' tokens that are required by remaining
positionals are masked to 'O' to prevent the regex consuming them.
Masking will only consider tokens up to the next option flag ('O') and
it both accounts for what future options can absorb (to avoid masking
more than is necessary) and ensures that at least the minimum arguments
required for the optional are actually consumed.
In addition, we also handle the parse_intermixed_args() case. In
intermixed mode, positional-typed arguments already collected to the
left of the current option are also accounted for, since they will
satisfy positionals in the second-pass parse.
Signed-off-by: Stephen Finucane <stephen@that.guru>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
argparse supports options with variable numbers of args. gh-53584 tracks a long-standing bug where options with a non-fixed number of args (
nargs='?',nargs='*', ornargs='+') will greedily consume argument strings that should have been reserved for required positional arguments. For example:argparse works by pattern-matching, with the bulk of its logic defined in
_parse_known_args(). That method encodes each argument string as a single character ('O'for a recognised option flag,'A'for everything else, and'-'for strings following--) which gives us a resulting string pattern (e.g.'OAOA'). We then "consume" arguments using this string pattern, handling both positionals (via theconsume_positionals()closure) and optionals (viaconsume_optional()).The bug arises in the latter.
consume_optional()processes options by building a regex based on the option's nargs (e.g.'-*A?-*'fornargs='?') and then runs this regex against the remainder of the string pattern (i.e. anything following the option flag). This is done via_match_argument(). The regex will always stop at the next option flag ('O'token) but for non-fixed nargs values like'?'and'+'may greedily consume every positional ('A'token) up to that point.This fix works by manipulating the string pattern as part of our optional consumption. Any
'A'tokens that are required by remaining positionals are masked to'O'to prevent the regex consuming them. Masking will only consider tokens up to the next option flag ('O') and it both accounts for what future options can absorb (to avoid masking more than is necessary) and ensures that at least the minimum arguments required for the optional are actually consumed.In addition, we also handle the
parse_intermixed_args()case. In intermixed mode, positional-typed arguments already collected to the left of the current option are also accounted for, since they will satisfy positionals in the second-pass parse.Note that this is a rather gnarly issue, and I've done my best to avoid changing the API behavior of the module without layering on too much additional complexity, in the hope that this might actually be backportable. Hopefully my proposed approach is sound but I'm happy to iterate on this if there's something I've missed or there is a better way to do this.