Evaluation complexity bounds for smooth constrained nonlinear optimization using scaled KKT conditions and high-order models

Evaluation complexity for convexly constrained optimization is considered and it is shown ﬁrst that the complexity bound of O ( ǫ − 3 / 2 ) proved by Cartis, Gould and Toint (IMAJNA 32(4) 2012, pp.1662-1695) for computing an ǫ -approximate ﬁrst-order critical point can be obtained under signiﬁcantly weaker assumptions. Moreover, the result is generalized to the case where high-order derivatives are used, resulting in a bound of O ( ǫ − ( p +1) /p ) evaluations whenever derivatives of order p are available. It is also shown that the bound of O ( ǫ − 1 / 2 P ǫ − 3 / 2 D ) evaluations ( ǫ P and ǫ D being primal and dual accuracy thresholds) suggested by Cartis, Gould and Toint (SINUM, 53(2), 2015, pp.836-851) for the general nonconvex case involving both equality and inequality constraints can be generalized to yield a bound of O ( ǫ − 1 /p P ǫ − ( p +1) /p D ) evaluations under similarly weakened assumptions.


Introduction
In [4] and [7], we examined the worst-case evaluation complexity of finding an ǫ-approximate first-order critical point for smooth nonlinear (possibly nonconvex) optimization problems for a methods using both first and second derivatives of the objective function.The case where constraints are defined by a convex set was considered in the first of these references while the general case (with equality and inequality constraints) was discussed in the second.
It was shown in [4] that at most O(ǫ −3/2 ) evaluations of the objective function and its derivatives are needed to compute such an approximate critical point.This result, which is identical in order to the best known result for the unconstrained case, comes at the price of potentially restrictive technical assumptions: it was assumed that an approximate first-order critical point of a cubic model subject to the problem's constraints can be obtained for the subproblem solution in a uniformly bounded number of descent steps that is independent of ǫ, that all iterates remains in a bounded set and that the gradient of the objective function is also Lipschitz continuous (see [4] for details).The analysis of [7] then built on the result of the convex case by first specializing it to convexly constrained nonlinear least-squares and then using the resulting complexity bound in the context of a two-phase algorithm for the problem involving general constraints.If ǫ P and ǫ D are the primal and the dual criticality thresholds, respectively, it was suggested that at most O(ǫ ) evaluations of the objective function and its derivatives are needed to compute an approximate critical point in that case, where the Karush-Kuhn-Tucker (KKT) conditions are scaled to take the size of the Lagrange multipliers into account.Because of the proof of this result is based an the bound obtained for the convex case, it suffers from the same limitations (not to mention an additional constraint on the relative sizes of ǫ P and ǫ D , see [7]).
More recently, Birgin, Gardenghi, Martínez, Santos and Toint [3] provided a new regularization algorithm for the unconstrained problem with two interesting features.The first is that the model decrease condition used for the subproblem solution is weaker than that used previously, and the second is that the use of problem derivatives of order higher than two is allowed, resulting in corresponding reductions in worst-case complexity.In addition, the same authors also analyzed the worst-case evaluation complexity of the general constrained optimization problem in [2] also allowing for high-order derivatives and models in a framework inspired by that of [6,7].At variance with the analysis of these latter references, their analysis considers unscaled approximate first-order critical points in the sense that such points satisfy the standard unscaled KKT conditions with accuracy ǫ P and ǫ D .
The first purpose of this paper is to explore the potential of the proposals made in [3] to overcome the limitations of [4] and to extend its scope by considering the use of highorder derivatives and models.A second objective is to use the resulting worst-case bounds to establish strengthened evaluation complexity bounds for the general nonlinearly constrained optimization problem in the framework of scaled KKT conditions, thereby improving [7].The paper is thus organized in two main sections, Section 2 covering the convexly constrained case and Section 3 that allowing general nonlinear constraints.The results obtained are finally discussed in Section 4.

Convex constraints
The first problem we wish to solve is formally described as where we assume that f : IR n −→ IR is p-times continuously differentiable, bounded from below, and has Lipschitz continuous p-th derivatives.For the q-th derivative of a function h : IR n → IR to be Lipschitz continuous on the set S ⊆ IR n , we require that i.e. there exists a constant L h,q ≥ 0 such that, for all x, y ∈ S, ∇ q x h(x) − ∇ q x h(y) T ≤ (q − 1)! L h,q x − y where • T is the recursively induced Euclidean norm on the space of q-th order tensors.We also assume that the feasible set F is closed, convex and non-empty.Note that this formulation covers standard inequality (and linear equality) constrained optimization in its different forms: the set F may be defined by simple bounds, and both polyhedral and more general convex constraints.We remark though that we are tacitly assuming here that the cost of evaluating constraint functions and their derivatives is negligible.
The algorithm considered in this paper is iterative.Let T p (x k , s) be the p-th order Taylorseries approximation to f (x k + s) at some iterate x k ∈ IR n , and define the local regularized model at where σ k > 0 is the regularization parameter.Note that The approach used in [4] (when p = 2) seeks to define a new iterate x k+1 from the preceding one by computing an approximate solution of the subproblem using a modified version of the Adaptive Regularization with Cubics (ARC) method for unconstrained minimization.By contrast, we now examine the possibility of modifying the ARp algorithm of [3] with the aim of inheriting its interesting features.As in [4], the modification involves a suitable continuous first-order criticality measure for the constrained problem of minimizing a given function h : IR n → IR on F. For an arbitrary x ∈ F, this criticality measure is given by π h (x) where P F denotes the orthogonal projection onto F and • the Euclidean norm.It is known that x is a first-order critical point of problem (2.1) if and only if π f (x) = 0. Also note that π f (x) = ∇ x h(x) whenever F = IR n .We now describe our algorithm as the ARpCC algorithm (ARp for Convex Constraints) on the following page.
We first state a useful property of the ARpCC algorithm, which ensures that a fixed fraction of the iterations 1, 2, . . ., k must be either successful or very successful.
We start our worst-case analysis by formalizing our assumptions

AS.1
The objective function f is p times continuously differentiable on an open set containing F.
2. Compute a step s k by approximately minimizing m k (x k + s) over s ∈ F so that (2.8) 3. Compute f (x k + s k ) and otherwise.

AS.2
The p-th derivative of f is Lipschitz continuous on F.

AS.3
The feasible set F is closed, convex and non-empty.
The ARpCC algorithm is required to start from a feasible x 0 ∈ F, which, together with the fact that the subproblem solution in Step 2 involves minimization over F, leads to AS. 3.
We now recall some simple results whose proof can be found in [3] in the context of the original ARp algorithm.Lemma 2.2 Suppose that AS.1-AS.3hold.Then, for each k ≥ 0, (i)

.15)
Proof.See [3] for the proofs of (2.12) and (2.13), which crucially depend on AS.1 and AS.2 being valid on the segment [x k , x k + s k ], i.e.
Observe also that (2.2) and (2.7) ensure (2.14).Assume now that (2.17) Using (2.12) and (2.14), we may then deduce that and thus that ρ k ≥ η 2 .Then iteration k is very successful in that ρ k ≥ η 2 and σ k+1 ≤ σ k .As a consequence, the mechanism of the algorithm ensures that (2.15) holds.✷ We now prove that, at successful iterations, the step at iteration k must be bounded below by a multiple of the p-th root of the criticality measure at iteration k + 1.
Lemma 2.3 Suppose that AS.1-AS.3hold.Then for all k ∈ S. (2.18) Proof.Since k ∈ S and by definition of the trial point, we have that Observe now that (2.13) and (2.15) imply that Combing this bound with the triangle inequality, the contractive nature of the projection and (2.8), we deduce that and (2.18) follows.✷ We now consolidate the previous results by deriving a lower bound on the objective function decrease at successful iterations.

✷
It is important to note that the validity of this lemma does not depend on the history of the algorithm, but is only conditional to the smoothness assumption on the objective function holding along the step from x k to x k+1 .We will make use of that observation in Section 3.
Our worst-case evaluation complexity results can now be proved by combining this last result with the fact that π f (x k ) cannot be smaller than ǫ before termination.
Theorem 2.5 Suppose that AS.1-AS.3hold and let f low be a lower bound on f on F.Then, given ǫ > 0, the ARpCC algorithm applied to problem (2.1) needs at most successful iterations (each involving one evaluation of f and its p first derivatives) and at most iterations in total to produce an iterate x ǫ such that π f (x ǫ ) ≤ ǫ, where κ u is given by (2.11) with σ max defined by (2.15).
Proof.At each successful iteration, we have, using Lemma 2.4, that where we used the fact that π f (x k+1 ) ≥ ǫ before termination to deduce the last inequality.Thus we deduce that, as long as termination does not occur, from which the desired bound on the number of successful iterations follows.Lemma 2.1 is then invoked to compute the upper bound on the total number of iterations.

✷
For what follows, it is very important to note that the Lipschitz continuity of ∇ q x f was only used (in Lemma 2.2) to ensure that (2.16) holds for all k ≥ 0.

The general constrained case
We now consider the general smooth constrained problem in the form where c : IR n → IR m is sufficiently smooth and f and F are as above.Note that this formulation covers the general problem involving both equality and inequality constraints, the latter being handled using slack variables and the inclusion of the associated simple bounds in the definition of F.
Our idea is now to first apply the ARpCC algorithm to the problem If an approximately feasible point is found, then we may follow the spirit of [5][6][7] and [2] and apply the same ARpCC to approximately solve the problem in the set for some monotonically decreasing sequence of "targets" t k (k = 1, . ..).
Observe that the recomputations of π µ (x k+1 , t k+1 ) in Step 2.(b) do not require re-evaluating f (x k+1 ) or c(x k+1 ) or any of their derivatives.
We now complete our assumptions.

AS.4
All derivatives of f of order 0 to p are Lipschitz continuous in F.

AS.5
For each i = 1, . . ., m, the constraint function c i is p times continuously differentiable on an open set containing F.

AS.6
All derivatives of order 0 to p of each c i (i = 1, . . ., m) are Lipschitz continuous in F.

AS.7
There exists constants β ≥ ǫ P and f low ∈ IR such that f (x) ≥ f low for all Assume, without loss of generality, that all Lipschitz constants implied by AS.4 and AS.6 are bounded above by L ≥ 1.Also note the problem of finding an ǫ P -feasible minimizer of f (x) is only meaningful if AS.7 holds.We first verify that our assumptions are sufficient to imply that ν(x) and µ(x, t) have Lipschitz p-th derivative on all segments [x k , x j + s j ] generated by the algorithm, allowing us to exploit the results of Section 2. Lemma 3.1 Assume that AS.3, AS.5 and AS.6 hold.Let the iteration of the ARpCC algorithm applied to problem (3.2) be indexed by j.Then the "segment" Lipschitz condition (2.16) holds for ∇ q x ν(x) holds on every segment [x j , x j + s j ] (j ≥ 0) generated by the ARpCC algorithm during Phase 1 and any q ∈ {1, . . ., p}.If, in addition, AS.1 and AS.4 also hold, then the same conclusion holds for ∇ q x µ(x, t) on every segment [x j , x j + s j ] (j ≥ 0) generated by the ARpCC algorithm during Step 2.(a) of Phase 2 and any q ∈ {1, . . ., p}, the Lipschitz constant in this latter case being independent of t.Algorithm 3.1: Adaptive Regularization using p-th order models for general constraints (ARpGC) A constant β defining C β , a starting point x −1 , a minimum regularization parameter σ min > 0, an initial regularization parameter σ 0 ≥ σ min are given, as well as a constant δ ∈ (0, 1).The primal and dual tolerances 0 < ǫ P < 1 and 0 < ǫ D < 1 are also given.

Phase 1:
Starting from x 0 = P F (x −1 ), apply the ARpCC algorithm to minimize Phase 2: (a) Starting from x k , apply the ARpCC algorithm to minimize µ(x, t k ) as a function of x ∈ F until an iterate x k+1 ∈ F is found such that and terminate with ( ii.If r(x k+1 , t k ) ≥ δǫ P and f (x k+1 ) < t k , define t k+1 according to and terminate with ( iii.If r(x k+1 , t k ) ≥ δǫ P and f (x k+1 ) ≥ t k , terminate with (x ǫ , t ǫ ) = (x k+1 , t k ) Proof.Since (where {α ℓ,j } are suitable positive and finite coefficients), condition (2.16) is satisfied on the segment [x j , x j + s j ] if (i) the derivatives {∇ i=1 are uniformly bounded on [x j , x j + s j ], and (iii) we have that for some constant L 1 > 0. The first of these conditions is ensured by AS.6, the second by the observation that AS.6 again implies that ∇ ℓ x c i (x) ≤ L for ℓ ∈ {1, . . ., q} (see [11, Lem.1.2.2, p. 21]).Moreover, and the first term on the right-hand side is bounded above by L 2 ξ s j and the second by |c i (x j )|Lξ s j .Hence (3.8) holds with because the ARpCC algorithm ensures that c(x j ) ≤ c(x 0 ) for all j ≥ 0. As a consequence, AS.
where we have used (3.14) and ǫ P ≤ 1 to deduce the inequality.Note that this constant is independent of t j , as requested.✷ We now start our complexity analysis as such by examining the complexity of Phase 1.
Lemma 3.2 Suppose that AS.3, AS.5 and AS.6 hold.Then Phase 1 of the ARpGC algorithm terminates after at most evaluations of c and its derivatives, where κ c being the problem-dependent constant defined in (2.20) for the function 1  2 c(x) 2 corresponding to (3.2).
Proof.Let us index the iteration of the ARpCC algorithm applied to problem (3.2) by j and assume that iteration j is successful and that c(x j ) ≥ δǫ P . (3.9) By contrast, if c(x j+1 ) > 1 2 c(x j ) , then, using the decreasing nature of the sequence { c(x j ) }, Lemma 2.4 (which is applicable because of Lemma 3.1) and the second part of (3.4), we obtain that where we have used (3.9) to derive the last inequality.Because of (2.20), we thus obtain from this last bound and (3.10) that, for all j, c( As in Theorem 2.5, we then deduce that the number of successful iterations required for the ARpCC algorithm to produce a point x 1 satisfying (3.4) is bounded above by The desired conclusion then follows by using Lemma 2.1 and adding one for the final evaluation at termination.✷ Note that an improved complexity bound for convexly-constrained least-squares problems, and hence for Phase 1, was given in [8].In particular, the bound in Lemma 3.2 improves to whenever p is a power of 2. However, we are not aware how to use the better Phase 1 result to improve the complexity of Phase 2, and so we are omitting including it here in full.We now partition the Phase 2 outer iterations (before that where termination occurs) into two subsets whose indexes are given by  (3.15).We now prove (3.17), which only occurs when r(x k+1 , t k ) ≤ δǫ P , that is when From (3.6), we then have that Now taking into account that the global minimum of the problem min for ω ∈ [0, ǫ P ] is attained at (f * , c * ) = (ω, 0) and it is given by ψ(f * , c * ) = ǫ P − ω (see [7,Lemma 5.2]), we obtain from (3.19) and (3.20) (setting ω = δǫ P ) that Using the results of this lemma allows us to bound the number of outer iterations in K + .
Lemma 3.4 Suppose that AS.7 holds.Then Proof.
We first note that (3.14) and (3.15) and AS.7 ensure that x k ∈ C β for all k ≥ 0. The result then immediately follows from AS.7 again and the observation that, from (3.17), t k decreases monotonically with a decrease of at least (1 − δ)ǫ P for k ∈ K + .✷ We now state a very useful consequence of Lemmas 3.1 and 3.3.Lemma 3.5 Suppose that AS.1 and AS3-AS.6 hold.Then there exists a constant σ µ,max > σ min such that all regularization parameters arising in the ARpCC algorithm within Step 2.(a) of the ARpGC algorithm are bounded above by σ µ,max .
Proof.AS.1, AS.4-AS.6 and Lemma 3.1 guarantee the existence of a Lipchitz constant independent of t such that the "segment-wise" Lipschitz condition (2.16) holds for each segment [x j,ℓ , x j,ℓ + s j,ℓ ].The result is then derived by introducing L µ,p in (2.15) to obtain σ µ,max .✷ The main consequence of this result is that we may apply the ARpCC algorithm to the minimization of µ(x, t k ) in Step 2.(a) of the ARpGC algorithm and use all the properties of the former (as derived in the previous section) using problem constants valid for every possible t k .Consider now x k for k ∈ K + and denote by x k+ℓ(k) the next iterate such that k+ℓ(k) ∈ K + or the algorithm terminates at k + ℓ(k).Two cases are then possible: either a single pass in Step 2.(a) of the ARpGC algorithm is sufficient to obtain x k+ℓ(k) (ℓ(k) = 1) or two or more passes are necessary, with iterations k + 1, . . ., k + ℓ(k) − 1 belonging to K − .Assume that the iterations of the ARpCC algorithm at Step 2.(a) of the outer iteration j are numbered (j, 0), (j, 1), . . ., (j, e j ) and note that the mechanism of the ARpGC algorithm ensures that iteration (j, e j ) is successful for all j.Now define, for k ∈ K + , the index set of all inner iterations necessary to deduce x k+ℓ(k) from x k , that is Observe that, by the definitions (3.11) and (3.21), the index set of all inner iterations before termination is given by ∪ k∈K + I k , and therefore that the number of evaluations of problem's functions required to terminate in Phase 2 is bounded above by where we added 1 to take the final evaluation into account and where we used Lemma 3.4 to deduce the inequality.We now invoke the complexity properties of the ARpCC algorithm applied to problem (3.3) to obtain an upper bound on the cardinality of each I k .
Lemma 3.6 Suppose that AS.1, AS.3-AS.6 hold.Then, for each k ∈ K + before termination, where κ µ CC is independent of ǫ P and ǫ D and captures the problem-dependent constants associated with problem (3.3) for all values of t k generated by the algorithm.
Proof.Observe first that, because of Lemma 3.5, we may apply the ARpCC algorithm for the minimization of µ(x, t j ) for each j such that k ≤ j < k + ℓ(k).Observe that (3.15) and the mechanism of this algorithm guarantees the decreasing nature of the sequence { r(x j , t j ) } k+ℓ(k)−1 j=k and hence of the sequence { r(x j,s , t j ) } (j,s)∈I k .This reduction starts from the initial value r(x k,0 , t k ) = ǫ P and is carried out for all iterations with index in I k at worst until it is smaller than δǫ P (see the first part of (3.5)).We may then invoke Lemmas 3.5 and 2.4 to deduce that, if (j, s) ∈ I k is the index of a successful inner iteration and as long as the third part of (3.5) does not hold, ( r(x j,s , t j ) − r(x j,s+1 , t j ) ) r(x j,s , t j ) ≥ 1 2 r(x j,s , t j ) for 0 ≤ s < e j and for some constant κ µ,s CC > 0 independent of ǫ P , ǫ D , s and j, while As above, suppose first that r(x j,s+1 , t j ) ≤ 1 2 r(x j,s , t j ) .Then because of the first part of (3.5).If r(x j,s+1 , t j ) > 1 2 r(x j,s , t j ) instead, then (3.23) implies that Combining this bound with (3.24) gives that As a consequence, the number of successful iterations of the ARpCC algorithm needed to compute x k+ℓ(k) from x k cannot exceed We now use Lemma 3.5 again and invoke Lemma 2.1 to account for possible unsuccessful inner iterations, yielding that the total number of successful and unsuccessful iterations of the ARpCC algorithm necessary to deduce x k+ℓ(k) from x k is bounded above by We now state a useful property of the set F. Lemma 3.7 For arbitrary x ∈ F, v ∈ IR n and τ ∈ IR with τ ≥ 1, Proof.The result follows immediately from [1, Lem.2.3.1] which states that P F [x + τ v] − x /τ is a monotonically non-increasing function of τ > 0 for any x in a given convex set F. ✷ We finally combine our results in a final theorem stating our evaluation complexity bound for the ARpGC algorithm.
evaluations of f , c and their derivatives up to order p to compute a point x ǫ such that either c(x ǫ ) > δǫ P and where Λ(x, y) is the Lagrangian with respect to the equality constraints and y ǫ is a vector of Lagrange multipliers associated with the equality constraints.

Proof.
If the ARpGC algorithm terminates in Phase 1, we immediately obtain that (3.26) holds, and Lemma 3.2 then ensures that the number of evaluations of c and its derivatives cannot exceed (3.28) The conclusions of the theorem therefore hold in this case.
Let us now assume that termination does not occur in Phase 1. Then the ARpGC algorithm must terminate after a number of evaluations of f and c and their derivatives which is bounded above by the upper bound on the number of evaluations in Phase 1 given by (3.28) plus the bound on the number of evaluations of µ given by (3.22) and Lemma 3.6.
Using the fact that ⌊a⌋ + ⌊b⌋ ≤ ⌊a + b⌋ for a, b ≥ 0 and ⌊a + i⌋ = ⌊a⌋ + i for a ≥ 0 and i ∈ IN, this yields the combined upper bound Assume first that f (x ǫ ) = t ǫ .Then, using the definition of r(x, t), we deduce that Assume now that f (x ǫ ) > t ǫ (the case where f (x ǫ ) < t ǫ is excluded by (3.18)) and note that 0 < f (x ǫ ) − t ǫ ≤ ǫ P ≤ 1 because of the second bound in (3.16) and the decreasing nature of r(x, t k ) during inner iterations.Defining now and successively using Lemma 3.7 with , the third part of (3.5), (3.29) and the definition of r(x, t), we deduce that p ) whenever ǫ P = ǫ D = ǫ.It is important to note that the complexity bound given by Theorem 3.8 depends on f (x 1 ), the value of the objective function at the end of Phase 1. Giving an upper bound on this quantity is in general impossible, but can be done in some case.A trivial bound can of course be obtained if f (x) is bounded above in C β .This has the advantage of providing a complexity result which is self-contained (in that it only involves problem-dependent quantities), but it is quite restrictive as it excludes, for instance, problems only involving equality constraints (F = IR n ) and coercive objective functions.A bound is also readily obtained if the set F is itself bounded (for instance when the variables are subject to finite lower and upper bounds) or if one assumes that the iterates generated by Phase 1 remain bounded.This may for example be the case if the set {x ∈ IR n | c(x) = 0} is bounded.An ǫ P -dependent bound can finally be obtained if one is ready to assume that all derivatives of order 1 to p of c(x) (and thus of ν(x)) are bounded by a constant in the level set C 0 def = {x ∈ F | c(x) ≤ c(x 0 ) } because it can then be shown that s k is uniformly bounded above and hence that x 1 − x 0 is itself bounded above by a constant times the (ǫ P -dependent) number of iterations in Phase 1 given by Lemma 3.2.Using the boundedness of the gradient of ν(x) on the path of iterates then ensures the (extremely pessismistic) upper bound f ( . Substituting this bound in (3.25) in effect squares the complexity of obtaining (x ǫ , t ǫ ).

Discussion
We have first shown in Section 2 that, if derivatives of the objective function up to order p can be evaluated and if the p-th one is Lipschitz continuous, then the ARpCC algorithm applied to the convexly constrained problem (2.1) needs at most O(ǫ p+1 p ) evaluations of f and its derivatives to compute an ǫ-approximate first-order critical point.This worst-case bound corresponds to that obtained in [4] when p = 2, but with significantly weaker assumptions.Indeed, the present proposal no longer needs any assumption on the number of descent steps in the subproblem solution, the iterates are no longer assumed to remain in a bounded set and the Lipschitz continuity of the gradient is no longer necessary.That these stronger results are obtained as the result of a considerably simpler analysis is an added bonus.While we have not developed here the case (covered for p = 2 in [4]) where the p-th derivative is only known approximately (in the sense that ∇ p x f (x k ) is replaced in the model's expression by some tensor B k such that the norm of (∇ p x f (x k ) − B k ) applied p − 1 times to s k must be O( s k p )), the generalization of the present proposal to cover this situation is easy.The proposed worst-case evaluation bound also generalizes that of [3] for unconstrained optimization to the case of set-constrained problems, under very weak assumptions on the feasible set.As was already the case for p ≤ 2, it is remarkable that the complexity bound for the considered class of problems (which includes the standard bound constrained case) is, for all p ≥ 1, identical in order to that of unconstrained problems.
The present framework for handling convex constraints is however not free of limitations, resulting from the choice to transfer difficulties associated with the original problem to the subproblem solution, thereby sparing precious evaluations of f and its derivatives.The first is that we need to compute projections onto the feasible set to obtain values of π f and π m k .While this is straightforward and computationally inexpensive for simple convex sets such boxes, spheres, cylinders or the order-simplex, the process might be more intensive for the general case.The second limitation is that, even if the projections can be computed, the approximate solution of the subproblem may also be very expensive in terms of internal calculations (we do not consider here suitable algorithms for this purpose).Observe nevertheless that, crucially, neither the computation of the projections nor the subproblem solution involve evaluating the objective function or its derivatives: despite their potential computational drawbacks, they have therefore no impact on the evaluation complexity of the original problem.However, as the cost of evaluating any constraint function/derivative possibly necessary for computing projections is neglected by the present approach, it must therefore be seen as a suitable framework to handle "cheap inequality constraint" such as simple bounds.
We have also shown in Section 3 that the evaluation complexity of finding an approximate first-order scaled critical point for the general smooth nonlinear optimization problem involving both equality and inequality constraints is at most O(ǫ −1/p P ǫ −(p+1)/p D ) evaluations of the objective function, constraints and their derivatives up to order p.We refer here to an "approximate scaled critical point" in that such a point is required to satisfy (3.26) or (3.27),

Theorem 3 . 8
Suppose that AS.1, and AS.3-AS.7 hold.Then, for some constants κ c CC and κ µ CC independent of ǫ P and ǫ D , the ARpGC algorithm applied to problem (3.1) needs at most If we now assume that AS.1 and AS.4 also hold, we may repeat, for µ(x, t) (with fixed t) the same reasoning as above and obtain that condition (2.16) holds for each segment [x j , x j + s j ] generated by the ARpCC algorithm applied in Step2.(a) of Phase 2, with Lipschitz constant 3, AS.5 and AS.6 guarantee that (2.16) holds with the Lipschitz constant m max i=1,...,mα i L 2 + L 2 + c(x 0 ) L .