Inference and Operational Conduct Issues with Sample Size Adjustment Based On Interim Observed Effect Size
H.M. James Hung (DB1/OB/OPaSS/CDER/FDA)
Lu Cui (Aventis Pharmaceuticals)
SueJane Wang (DB2/OB/OPaSS/CDER/FDA)
John Lawrence (DB1/OB/OPaSS/CDER/FDA)
Presented in Annual Symposium of New Jersey
Chapter of ASA, Piscataway, NJ, June 4, 2002 ILZ[Z)VXft(!H
Disclaimer
The views expressed in this presentation are not
those of the U.S. Food and Drug Administration,
nor of Aventis Pharmaceuticals.
Dr. Lu Cui was one of the primary investigators
of this research during his tenure in FDA.
0f,,ROG6Selected References in Adaptive Design/InterimAnalysis76'PH
Acknowledgments
Ncgֳgֳ? Y
The research was supported by FDA/CDER RSR Funds, #96010A and #99/00008. Thanks are due to Dr. Lu Cui for sharing some of his slides
Bauer & Khne (1994, Biometrics)
Bauer & Rhmel (1995, Stat. In Med.)
Lan & Trost (1997, ASA Proceedings)
Fisher (1998, Stat. In Med.)
Posch & Bauer (1999, Biometrical J.)
Kieser, Bauer & Lehmacher (1999, Biometrical J.)
Lehmacher & Wassmer (1999, Biometrics)
Mller & Schfer (2001, Biometrics)
Berry (2002, ASA Biopharmaceutical Report)
Brannath, Posch & Bauer (2002, JASA)
Z$fgֳgֳ ?`
[ $0ff
Nfgֳgֳ?
1The materials of this presentation are selected from the main results of our RSR research work.
Cui, Hung, Wang (1997 ASA; 1999 Biometrics)
Lawrence & Hung (2002 ENAR talk)
ZDdgֳgֳ ?P
M f;
Nfgֳgֳ?
S Background
Sample size (or amount of statistical information) is one of the design specifications vital to success of Phase II/III (confirmatory?) clinical trials
It relates directly and closely to the true effect size (treatment difference normalized by the measure of variability) of the targeted response variableNT
0f b H
0h ? ̙33}%(
ZDggֳgֳ ?P
M f
Nggֳgֳ?` b
( Background
Common recommendation
Make educated guess about the effect size
and plan sample size to detect this effect size (or a
range of plausible effect sizes) with sufficient
power [e.g., > 90%  Hung et al (1997 Biometrics)]
This is always good because the fixedinfo design
1) provides statistics that have important good
statistical properties
2) avoids datadriven adjustments that may induce
biases (statistical or operational) making the
results not interpretable F
,lH
0h ? ̙33x( (
Td1?`
l f fH
M f
,
Background
But & .
The effect size depends on a primary
N )H
,0h ? ̙3307(
Z$igֳgֳ ?P
M f
Nigֳgֳ?`
Background
But & .
The effect size may depend on patient mixtures potential heterogeneous effects in subpopulations
For a hard clinical outcome endpoint, educated
guess about effect size is difficult
e.g., for composite event endpoint, require educated guess of where the potential signal lies and what noises may be F
ZDjgֳgֳ ?P
M f"
Njgֳgֳ?`
Background
But & .
The effect size for detection may depend on $$ benefit/risk/cost consideration
& & & & & & . etc
Practical considerations effect size for detection
can be a moving target and change as background
circumstances change and maximum amount of
statistical information one can commit to may also
change F
Zigֳgֳ ?P
M fe
Nkgֳgֳ?
; Background
Experiences:
Often oversimplify clinical trial designs and inferences and impose too many restrictions to the designs. If a trial fails, it is difficult to know whether it is because the treatment does not have an important effect or the study was underpowered for detecting it.n 2<
0f H
Zdkgֳgֳ ?P
M f
Nhgֳgֳ?
Background
Lan (2001, FDA/OB MiniSymposium)
If we know the values of design elements (e.g.,
effect size) a priori, No Need and Not Ethical to conduct a confirmatory trial
Bauer et al (2002, Method Inform Med)
& .. It does not make sense to apply uniformly most powerful test in an unchanged design even if we have convincing evidence that this best test in the preplanned design may be severely underpowered & & . 2Z
Z%gֳgֳ ?P
M f
ND%gֳgֳ?
*
Need to enhance flexibility in traditional clinical trial design/analysis strategy because
practical considerations may
change and often be
unpredictable at the design stageD S,[, H
0h ? ̙33(
Midcourse modification of design specifications
 adjust sample size
 change tested hypothesis from superiority
to noninferiority or vice versa
 change from one prespecified primary
endpoint to another prespecified endpoint
 change test method
 drop a treatment arm
NĐ%gֳgֳ?:
Type I error rate may greatly exceed the acceptable level
Statistical power may be compromised
Traditional estimate may be severely
biased
p;%/( 3f
T$%1?``
0h ? ̙33!3+P<(
<
<
Nd%gֳgֳ?p*
_ (( 3f
<
T%1?``;
b Sample Size Reestimation(P0f
j
(Literature on sample size reestimation is abundant.
Increasing sample size (or amount of statistical
information) based on nuisance parameters without
breaking blind
 has little effect on type I error
 may preserve the intended power level
 needs little or mild statistical adjustment (e.g.,
estimate, CI)
Wittes & Brittain (1990), Gould (1992), Gould & Shih (1992)
Shih (1992, 1993, 1995), Birkett & Day (1994)
Jennison & Turnbull (1999, book), & & & etc5<
_ (( 3f
p
T$%1?``;
b Sample Size Reestimation(P0f
j
`%1?Lq<
,HBut & ..
Lan (1997, ASA talk), Liu (2000, ICSA talk)
e.g., knowing the components of the variance can
lead to estimation of treatment difference; hence
sample size reestimation based on variance
might affect type I error depending on how it is
processed (e.g., by obtaining TSS & WSS)
N%gֳgֳ?p*
_ (( 3f
D
T%1?``;
b Sample Size Reestimation(P0f
j
D
T%1?L
Increasing sample size (or amount of statistical
information) based on the internal data path
may substantially inflate type I error, bias the estimate, invalidate CI
 crude estimate of maximum amount of inflation
obtainable, at least by simulation
3
Nę%gֳgֳ?]
]
TDagֳgֳ?W
]
TU9gֳgֳ?
Q
T$V9gֳgֳ?
Q
TV9gֳgֳ?
Q
TV9gֳgֳ?
!
TW91?`g
\ Sample Size Reestimation&,0f
#
`X91?
v:
P f
$
TdX91?
ZQuestion:
At an interim time of a trial, if the observed treatment difference is far smaller than expected, we wish to increase sample size. Then, what adjustments are needed to
perform valid statistical testing?
Bn(n 3f 3f
H
0h ? ̙33
meL(
Bauer & Khne (1994, Biometrics)
Proschan & Hunsberger (1995, Biometrics)
Lan & Trost (1997, ASA Proceedings)
Cui, Hung & Wang (1997 ASA Proceedings, 1999, Biometrics)
Fisher (1998, Stat. In Med.)
Shen & Fisher (1999, Biometrics)
Lehmacher & Wassmer (1999, Biometrics)
Mller & Schfer (2001, Biometrics)
Liu & Chi (2001, Biometrics)
Brannath, Posch & Bauer (2002, JASA)
Lawrence & Hung (2002, ENAR)33 D 13H
W#Experimental (T) with N subjects$$
N[9gֳgֳ?
QControl (C) with N subjects
N[9gֳgֳ?Pp
<Baseline 3
T$\9gֳgֳ?]
3
N\9gֳgֳ?@
:
]
N\9gֳgֳ?@W:
7
TD]9gֳgֳ?
Q
T]9gֳgֳ?
7
T^9gֳgֳ?
Q
Td^9gֳgֳ?
'
T^9gֳgֳ?
!
8
T$_9gֳgֳ?,
a
7
T_91?
Z ,
`_91?
v:
6 f
T`91?
HTest H0: D = 0 vs. H1: D > 0%n
XD = T  C*
Td91?opv
! Sample Size Reestimation B",0f,,
T91?P
N
s = 1( f fL
`$91?2A
To detect D = d at sig. level a and power 1b,
N (per group) = 2(za+zb)2/d21! % f f:H
:
]
x
ND9gֳgֳ?@W:
7
x
T9gֳgֳ?
Q
x
T9gֳgֳ?
7
x
Td9gֳgֳ?
Q
x
T9gֳgֳ?
x
T$9gֳgֳ?
!
8
x
T9gֳgֳ?,
a
7
x
T91?
Z ,
x
`D91?
v:
6 f
x
T91?opv
! Sample Size Reestimation B",0f,,
x
`d91?vO
`The process involves:
1) projection of effect size with internal data
2) determination of when to change sample size
3) reestimation of sample size
4) adjustment of statistical inference methods
 test statistic and/or critical value
 estimator and confidence interval
& fH
> f
T$91?5
T Sample Size Reestimation
(nonsequential trial)
Plan to enroll N=100 subjects/group to detect
d = 0.46 at a = 0.025 and power 90%
After 40 subjects per group contribute data,
the estimate of D leads to d* = 0.37
Reestimate total sample size M = 150/group
>(,0f
T91?
r Sample Size Reestimation
(nonsequential trial)
At the end of the trial (M = 150) , compute the CHW adaptive test [Cui, Hung & Wang, 1999]
U = (40/100)1/2Z0.40 + (60/100)1/2W0.60
W0.60 : normalized test for the additional 110 subjects per group after the interim time t=0.4
T91?B
Sample Size Reestimation
(nonsequential trial)
U is standard normal under H0
If U > 1.96 , then conclude D > 0
Significance level = 0.025
U is more powerful than
original Z w/o increasing NA(#i,0f
T Sample Size Reestimation
(nonsequential trial)
Estimation & CI for D 
Lawrence & Hung (2002, ENAR talk)
Construct consistent estimator and valid CI for D
CHW test is Zratio of the consistent estimator
N9gֳgֳ?p
QControl (C) with N subjects

N9gֳgֳ?
To:gֳgֳ?]
3

No:gֳgֳ?]
]

TTp:gֳgֳ?W
]

Tp:gֳgֳ?
Tq:gֳgֳ?
y=
70

Ttq:gֳgֳ?
: N/5

Tq:gֳgֳ?
`
:2N/5

T4r:gֳgֳ?
Q

Nr:gֳgֳ?qj
?
0

Ts:gֳgֳ?
'+
Ttt:gֳgֳ?
a
O IA2
T4u:gֳgֳ?,
P
Final
= Sample Size Reestimation
(group sequential trial)4>,0f",

`u:1?
v
S N4ff

Tv:1?
0h ? ̙330(
`
Tw:1?`
Sample Size Reestimation
(group sequential trial)
N is planned to detect D = d at level a and with power 1b
At interim time s, estimate )s 0 < d* < d
(say, based on conditional power)
Y increase sample size from N to M, approximately
M=N(* / d*)2
Total information changes from 1 to w = M/N Compute b = (w  s)/(1 s) E(BFRn72<O,f0f
0h ? ̙33D<P(
Tx:1?_
V Sample Size Reestimation
(group sequential trial)
{Ut} w/ N possibly changed to M
& {Zt} w/o change of N have identical distn.
Find critical value Ct at time t based on the initially selected alphaspending function
Reject H0 if Ut > Ct ; otherwise, trial continues
0h ? ̙33p$( FpA<
TTy:gֳgֳ?
~Empirical power(Adaptive test; 1b = 0.587 w/o N increase; Gaussian)(increase N by <=4x; O BrienFleming boundary)Xu,
fR; 1
ZA??X$
Tz:gֳgֳ?
Empirical power(Adaptive test; 1b = 0.60 w/o N increase; Binomial, pC=0.20)(increase N by <= 4x; O BrienFleming boundary)
,f":D:
ZA
??X$
00H
T Sample Size Reestimation
(Example: group sequential trial)
Plan to have 100 subjects/group to detect
d = 0.46 at a = 0.025 and power 90%
After 50 subjects per group contribute data,
the estimate suggests to detect d* = 0.46/%2
Reestimate sample size M = 150/group
w = 1.5 b = (1.50.5)/(10.5) = 2
x Sample Size Reestimation
(Example: group sequential trial)
Suppose that an interim analysis will be done
when additional 50 subjects/group contribute
data (Mt = 100)
Nt = (100  50)/2 + 50 = 75 and t = 0.75
Suppose that OF alpha spending function is
originally used for interim analysis
Then the critical value for the adaptive test
I Sample Size Reestimation
(Example: group sequential trial)
The adaptive test at M0.75 = 100 is
U0.75 = T0.50(2/3)1/2 + W0.25(1/3)1/2
W0.25 is the normalized test performed on the
additional 50 subjects per group
If U0.75 > 2.36 , then stop the trial and conclude
2f Sample Size Reestimation
(Example: group sequential trial)
If the trial continues to the end, then
the final adaptive test (i.e., at M1 = 150) is
U1 = T0.50(1/2)1/2 + W0.50(1/2)1/2
W0.50 is the normalized test performed on the
additional 100 subjects per group
If U1 > 2.01 , then conclude that experimental
treatment is superior to control@Zi*i,0f,!,,L 3f 3f
Consistent estimator and confidence interval compatible with CHW adaptive test are readily available
All the above discussions are based on asymptotic (i.e., sufficiently large sample size) theory
Cui, Hung, Wang (1999, Biometrics)
Lawrence & Hung (2002, ENAR talk)
h
h
Ta<1?`
6 Sample Size Reestimation
CHW adaptive test reduces to the conventional
test if sample size is not changed. So do the
consistent estimator and confidence interval compatible with CHW adaptive test.
Cui, Hung, Wang (1999, Biometrics)
TDb<1?`y
Sample Size Reestimation
CHW adaptive test has another look using a combination of pvalues from the incremental group data [Lehmacher & Wassmer (1999), Brannath, Posch & Bauer (2002)]
Tb<1?U
^ Sample Size Reestimation
Sample size reestimation criterion
After obtaining the observed )s at time s, one could recalculate sample size M such that conditional power
CP(d*)
= Pr{CHW rejects H0 at the end  )s , ) = d*} = 1b
for the new intended d*. Then, the power of CHW for detecting d* is at least 1b
Lawrence (2002, personal communication)6(
%2r<4n<{,f0f f # # V (
(H
Cc Operational Conduct Issues
Sample size reestimation based on unblinded
data opens rooms for operational biases.
During adaptive change, only unblind data that
are necessary to be unblinded in order to avoid
operational bias. Standard Operation Procedure
(SOP) must be in place in the protocol and trial
conduct must comply with the SOP.
la P f0f,F PD
Tdc<1?F
Operational Conduct Issues
Sample size reestimation based on unblinded
data opens rooms for multiple analyses that may
lead to more protocol amendments and other changes of design elements. These types of changes or amendments (potentially driven by current data) may introduce problems in the interpretation of the results. Attention should be given to this potential hazard with the design. B f0f,s D
d
d
Tc<1?F
F Operational Conduct Issues
But & .
Most of the operational conduct issues related to sample size reestimation based on unblinded data are also encountered in traditional designs with interim analyses
` f0f,
T$d<1?`Fd
n Operational Conduct Issues
Recommend that sample size modification be done, if needed, by an independent third party that has no conflict of interest issue
What took place after sample size (or any design) modification needs to be documented fully
c
Td<1?Fd
U Other Issues
Estimation following datadriven sample size change is an important issue, particularly the effect estimate may be used to plan future superiority trials or activecontrol noninferiority trials.
B Other Issues
Bauer et al (2002, Method Inform Med)
& .. Clearly in such designs more logistics must be put in to properly handle all problems of
interim analyses including their consequences for the design. They need rigid rather than flexible planning modalities. & &
X
X
TDe<1?`
e Summary
Conventional fixedinformation design has made tremendous contributions to clinical science. The statistics have good statistical properties.
This design needs to permit sample size adjustment when it falls short in many applications, e.g., in studying endpoints for which
 prior data are poor or hard to provide reasonably
good educated guess about effect size
\(
\e
\
Te<1?`
Summary
Conventional design properly adapted for necessary sample size adjustment can be very useful with
 proper planning to avoid any operational conduct
change that may lead to bias
 proper adjustment of statistical analysis method
estimation issue needs attention and research
Price paid for datadriven adaptation: lose
`
`
T"v1?`j
0^ Summary
Adaptive design with proper planning is very attractive with proper caveat
 change sample size or randomization allocation
 change study hypothesis (e.g., superiority vs. non
inferiority or equivalence)
 change test method
 change the primary endpoint from one pre
specified endpoint to another prespecified one
!"#$%&'()*+,./0123456789:;<>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[vers
Slide Titles4 6>
_PID_GUIDAN{9D8BB7C06A6C11D6B3000002A5598AD8}598AD8}'_2Matilde Sanchez)