IMAGE BASED VISUAL SERVOING USING BITANGENT POINTS APPLIED TO PLANAR SHAPE ALIGNMENT

(1)

IMAGE BASED VISUAL SERVOING USING BITANGENT POINTS APPLIED

TO PLANAR SHAPE ALIGNMENT

Erol Ozgur and Mustafa Unel Faculty of Engineering and Natural Sciences

Sabanci University

Orhanli Tuzla 34956 Istanbul, Turkey

email: erol@su.sabanciuniv.edu, munel@sabanciuniv.edu

ABSTRACT

We present visual servoing strategies based on bitangents for aligning planar shapes. In order to acquire bitangents we use convex-hull of a curve. Bitangent points are em-ployed in the construction of a feature vector to be used in visual control. Experimental results obtained on a 7 DOF Mitsubishi PA10 robot, verifies the proposed method. KEY WORDS

Visual servoing, alignment, convex hull, bitangents

1 Introduction

Curve alignment is a central problem in current research ar-eas and has played a key role in many particular domain of applications such as object recognition [1], [2] and tracking [3]. In the domain of visual servoing, most of the current alignment systems are based on known geometrical shaped objects such as industrial parts or those have good features like corners, straight edges which are feasible to extract and track in real time [4]. The alignment of smooth free-form planar objects in unknown environments presents a chal-lenge in visually guided assembly tasks.

In this paper we propose to use bitangent points in aligning planar curves by employing both calibrated [5] and uncalibrated image based visual servoing [6] schemes. In literature the use of bitangents in recognizing planar ob-jects by affine invariant alignment was first considered in [7] and for servoing purposes bitangent lines (lines joining corresponding features on the superposition of two views of a scene) were utilized to align the orientation between two cameras at different locations in space by [8]. In [9] similar principles were applied on landing and surveillance of aerial vehicles using vanishing points and lines. In order to acquire bitangent points, we used convex-hull of a curve [10]. Bitangent points are then used in the construction of a feature vector.

The remainder of this paper is organized as follows: Section 2 presents bitangents of curves and how to ac-quire them. Section 3 reviews both model based and model free image based visual servoing, for calibrated and uncali-brated approaches. Section 4 is on experimental results for curve alignment and discussions. Finally, Section 5 con-cludes the paper with some remarks.

2 Bitangents of Curves

A line that is tangent to a curve at two points is called a bitangent and the points of tangency are called bitangent points. See Fig.1. It is well known [11] that these bitangent

Figure 1. Some curves and their bitangents. points directly map to one another under projective trans-formations. They are also called contact points.

2.1 Computation of Bitangent Points

Computation of bitangent points of a curve is presented as a block diagram in Fig.2. Block-I receives a sequence of images from a camera and tracks a region in a specified window using a tracking algorithm such as ESM algorithm [12]. Block-II applies Canny edge detection algorithm to the specified region and extracts the curve boundary data. Finally, Block-III employs a Convex-hull algorithm [10] to find convex hull of the curve. Fig.3 depicts a curve with 3 concavities. The convex-hull algorithm yields convex por-tion of the original data. Initial and final points of each convex portion are bitangent points.

Region Tracking Curve Detection Convex Hull Bitangent Points I II III Image Sequence

Figure 2. Block diagram representation of the algorithm for extracting bitangent points.

(2)

(a) (b)

Figure 3. (a) a curve and its convex-hull, (b) convex data portions and the bitangent points

3 Visual Servoing

3.1 Background

Let

θ

∈ℜn_{, s ∈}ℜm_{and r ∈}ℜ6_{denote the vectors of joint}

variables of a robot, image features obtained from visual sensors and the pose of the end-effector of the robot, re-spectively. The differential relation between

θ

and r with respect to time implies

˙r = J_R(

θ

) ˙

θ

(1)

where J_R(

θ

) =

∂

r/

∂θ

∈ℜ6×n_{is the robot Jacobian which}

describes the relation between the robot joint velocities and the velocities of its end-effector in Cartesian space. The re-lation between s and r is given as s = s(r) and its differen-tiation with respect to time yields

˙

s = J_I(r)˙r (2)

where J_I(r) =

∂

s/

∂

r ∈ℜm×6is the image Jacobian which describes the differential relation of the image features, and pose of the robot end-effector, and the ˙r is the camera ve-locity screw (Vc). The composite Jacobian is defined as

J = J_IJ_R (3)

where J ∈ℜm×n is a matrix which is the product of im-age and robot Jacobian. Thus, the relation between joint coordinates and image features is given by

˙

s = J ˙

θ

(4)

3.2 Calibrated Visual Servoing

Let s∗∈ℜm_{be the constant desired feature vector and} de-fine the error e ∈ℜm_{on image plane as e = s − s}∗_{. Then}

the control problem can be formulated as follows: design an end-effector velocity screw u in such a way that the error disappears, i.e. e → 0.

The image Jacobian of a single point feature vector s = [x, y]T for a fixed-camera system is given by:

µ ˙ x ˙ y ¶ = µ ₁ Z 0 −x Z −xy (1 + x2) −y 0 1_Z −y_Z −(1 + y2₎ _xy _x ¶ | {z } Jxy Vc (5) where x =xp− xc fx , y =yp− yc fy (6) and (xp, yp) are pixel coordinates of the image point and (xc, yc) are the coordinates of the principle point, and ( fx, fy) are effective focal lengths of the vision sensor, respectively.

By rearranging and differentiating (6), and writing in matrix form, we get the following expression

µ ˙ xp ˙ yp ¶ = µ fx 0 0 fy ¶ µ ˙ x ˙ y ¶ (7) and substituting (5) into (7), the following equation appears

µ ˙ xp ˙ yp ¶ = µ fx 0 0 fy ¶ Jxy | {z } J_I Vc (8) ˙ s = J_IVc (9)

where J_Iis the pixel-image Jacobian. In eye-to-hand case, the im-age jacobian has to consider the mapping from the camera frame onto the robot control frame. This relationship is given by the robot-to-camera transformation, denoted by:

Vc= TVR (10)

where V_Ris the end-effector velocity screw in robot control frame. The robot-to-camera velocity transformation T ∈ℜ6×6is defined as below T = µ R [t]xR 0₃ R ¶ (11) where [R,t] are being rotational matrix and the translation vector that map camera frame onto robot control frame and [t]x is the skew symmetric matrix associated with vector t.

Substituting (10) into (9), an expression that relates the im-age motion to the end-effector velocity is acquired:

˙

s = J_IT

|{z} , ¯J_I

V_R= ¯J_IV_R (12)

where ¯J_I is the new image Jacobian which directly relates the changes of the image features to the end-effector velocity in robot control frame. Note that if k feature points are taken into account, e.g. s = [x₁, y₁. . . x_k, y_k]T_{, ¯}_J

I is given by the following stacked image Jacobian ¯ J_I=    ¯ J_I1 .. . ¯ J_Ik    (13)

By imposing ˙e = −Λe an exponential decrease of the error

function is realized. Solving (12), the control for end-effector motion is obtained as follow:

(3)

where Λ ∈ ℜ6×6 _{is a positive constant gain matrix,} _J_¯†

I is the pseudo-inverse of the image Jacobian and V_R = ¡

Vx Vy Vz Ωx Ωy Ωz ¢_T

.

3.3 Uncalibrated Visual Servoing

Here the composite Jacobian is unknown and it has to be esti-mated dynamically. The error function in the image plane for a moving target at position s∗(t) and an end-effector at position

s(θ) is given as

e(θ,t) = s(θ) − s∗(t) (15) where s∗(t) represents desired image features at time t. The con-trol problem can be formulated as follows: design a concon-troller that computes the velocity of joint variables u in such a way that the error disappears, i.e. e → 0.

3.3.1 Dynamic Jacobian Estimation

Since the system model is assumed to be unknown, a recursive least-squares (RLS) algorithm [6] is used to estimate the compos-ite Jacobian J. This is accomplished by minimizing the following cost function, which is a weighted sum of the changes in the affine model of error over time,

ε_k= k−1

∑

i=0 λk−i−1_k_∆ m_kik2 (16) where ∆m_ki= m_k(θ_i,t_i) − m_i(θ_i,t_i) (17) with m_k(θ,t) being an expansion of m(θ,t), which is the affine model of the error function e(θ,t), about the kthdata point as follows: m_k(θ,t) = e(θk,tk) + ˆJk(θ−θk) + ∂e_k ∂t (t − tk) (18) In light of (18), (17) becomes ∆m_ki= e(θ_k,t_k) − e(θ_i,t_i) −∂ek ∂t (tk− ti) − ˆJkhki, (19)

where h_ki=θ_k−θ_i, the weighting factorλ satisfies 0 <λ < 1, and the unknown variables are the elements of ˆJ_k.

Solution of the minimization problem yields the following recursive update rule for the composite Jacobian:

ˆ J_k= ˆJ_k−1+ (∆e − ˆJ_k−1h_θ−∂ek ∂t ht)(λ+ h T θP_k−1h_θ)−1hT_θP_k−1 (20) where P_k= 1 λ(Pk−1− Pk−1hθ(λ+ h T θPk−1hθ) −1_hT θPk−1) (21) and h_θ =θ_k−θ_k−1, ht = tk− tk−1, ∆e = ek− ek−1, and ek= s_k− s∗

k, which is the difference between the end-effector position and the target position at kthiteration. The term ∂ek

∂t predicts the change in the error function for the next iteration, and in the case of a static camera it can directly be estimated from the target im-age feature vector with a first-order difference:

∂e_k ∂t ∼ = −s ∗ k− s∗k−1 h_t (22)

The weighting factor is 0 <λ ≤ 1 and when close to 1 re-sults in a filter with a longer memory. The Jacobian estimate is used in the visual controllers to determine the joint variablesθ_k that track the target.

3.3.2 Dynamic Gauss-Newton Controller

The dynamic Gauss-Newton method [6] minimizes the following time varying objective function

E(θ,t) =1 2e

T₍_θ_,t)e(_θ_,t) ₍₂₃₎

By minimizing above objective function it computes the joint variables iteratively as follows:

θ_k+1=θ_k− ( ˆJ_kTJˆ_k)−1_JˆT k(ek+ ∂e_k ∂t ht) (24) Control is defined as u_k+1= ˙θ_k+1= −KpJˆ_k†(ek+ ∂e_k ∂t ht) (25)

where Kp and ˆJ_k† are some positive proportional gain and the pseudo-inverse of the estimated Jacobian at kthiteration, respec-tively.

4 Experiments

In this section, experimental results are presented both for cali-brated and uncalicali-brated visual servoing to demonstrate the valid-ity of the proposed scheme.

Experiments were conducted with a 7 DOF Mitsubishi PA10 robot arm and a Unibrain Fire-i400 digital camera. The camera was mounted on a tripod in eye-to-hand configuration in order to observe the motion of the end-effctor. The images were digitized at 320 × 240 resolution.The system setup is shown in Fig.4. The visual control and image processing modules were implemented in VC++ 6.0 using OpenCV library and run on P4 2.26GHz with 1GB ram personal computer.

X_R Z R Y_R X_C Z C Y_C

Figure 4. System setup.

Fig. 5 shows a test shape, which is on a plane and rigidly attached to the end-effector. Bitangent points of the shape are acquired using the proposed algorithm in this paper. For visual servoing purposes, either bitangent points or their midpoints, see points denoted by 1, 2 and 3 in Fig. 5, can be used. Unlike bitangent points, which are projective invariant, midpoints are

(4)

affine invariant. If the scene’s depth is much less than its dis-tance from the camera, a weak-perspective projection can be as-sumed. Throughout the experiments weak-perspective assump-tion is made and the visual feature vector s is constructed from the midpoints as follows:

s = [x₁, y₁, x₂, y₂, x₃, y₃]T

Figure 5. Test shape and midpoints of bitangent points

In the case of perspective projection, i.e. if the weak-perspective assumption does not hold, one can use bitangent points to construct the visual feature vector.

For alignment task, desired pose of the curve is obtained during an off-line stage by moving the robot in xz-plane of the robot control frame with some Vx, VzandΩyfor a certain time interval. Consequently, the desired feature vector s∗is constructed from this reference pose.

4.1 Calibrated Viual Servoing Results

The parameters fx= 1000, fy= 1000, xc= 160, yc= 120 are obtained by a coarse calibration of the camera and Z = 2000 mm. The robot base frame is positioned at z = 2000 mm in z-axis and

y = 1000 mm in y-axis away from the camera frame. Thus, we

have R =   −1₀ 0₀ −10 0 −1 0   , t =   ₁₀₀₀0 2000  

where R is the rotational matrix and t is the translational vec-tor that are used for the construction of robot-to-camera trans-formation matrix T . The gain matrixΛis tuned asΛ_i= 0.3 for

i = 1, 2, .., 6. The control input is defined as u =¡ Vx Vz Ωy

¢_T

where u consists of the 1stand 3rdcomponents of V_Rfor the mo-tion in xz-plane and 5thcomponent of V_Rfor the rotation around

y-axis in robot control frame, respectively. Fig.6 depicts the initial

and the desired images. Fig.7 shows feature trajectories. Align-ment errors and control signals are plotted in Figs.8-9. The norm of the resulting alignment error is found to be less than 1 pixel.

4.2 Uncalibrated Visual Servoing Results

Here we do not need the calibration parameters since the compos-ite Jacobian J ∈ℜ6×3is estimated in a recursive manner. Only 3 joints, namely the 2nd, the 4thand the 6thjoints of PA10 robot are used to steer the end-effector by locking the remaining 4 joints. The control parameters are set asλ= 0.96 and Kp= 0.6. The control input is defined as

Figure 6. Initial and desired images

Figure 7. Feature trajectories on the image plane

0 5 10 15 20 25 30 35 40 45 −100 0 100 200 t (s) error (pixels) Midpoint 1 0 5 10 15 20 25 30 35 40 45 −100 0 100 200 t (s) error (pixels) Midpoint 2 0 5 10 15 20 25 30 35 40 45 −100 0 100 200 t (s) error (pixels) Midpoint 3

Figure 8. Alignment errors

u =¡ Ω₂ Ω₄ Ω₆ ¢T

whereΩ₂, Ω₄ and Ω₆ are the joint velocities. Figs.10-11 de-pict the initial and the desired images, and the feature trajecto-ries on the image plane. Alignment errors and the control signals

(5)

0 5 10 15 20 25 30 35 40 45 −300 −200 −100 0 100 t (s) Vx , V z (mm/s)

Translational Control Signals

0 5 10 15 20 25 30 35 40 45 −0.1 0 0.1 0.2 0.3 t (s) Ωy (rad/s)

Rotational Control Signal

Figure 9. Control signals Vx, VzandΩy

are plotted in Figs.12-13, respectively. The norm of the resulting alignment error is found to be less than 1.5 pixel.

Figure 10. Initial and desired images

Figure 11. Feature trajectories on the image plane

4.3 Discussions

In both visual servoing approaches we observed that alignment task errors are less than 1.5 pixels, which corresponds to 5 mm in robot workspace. It can be seen that calibrated approach draws

0 5 10 15 20 25 30 −200 −100 0 100 t (s) error (pixels) Midpoint 1 0 5 10 15 20 25 30 −150 −100 −50 0 50 t (s) error (pixels) Midpoint 2 0 5 10 15 20 25 30 −150 −100 −50 0 50 t (s) error (pixels) Midpoint 3

Figure 12. Alignment errors

0 5 10 15 20 25 30 −4 −2 0 2 t (s) Ω2 (rad/s)

Joint Control Signals

0 5 10 15 20 25 30 −4 −2 0 2 4 t (s) Ω4 (rad/s) 0 5 10 15 20 25 30 −1 0 1 2 t (s) Ω6 (rad/s)

Figure 13. Control signalsΩ₂,Ω₄andΩ₆

more smoother trajectories while the uncalibrated one shows am-biguous behaviour until the Jacobian converges and the end-effector moves towards the desired pose. Computation times of region tracking, curve detection and bitangent extraction modules are approximately 13 ms, 5 ms and 4 ms, respectively.

(6)

5 Conclusion

In this paper, bitangents are used to design image based visual servoing schemes, both calibrated and uncalibrated, for aligning planar shapes with a fixed camera. The assumption is that the curve has at least one concavity on its boundary shape. Exper-imental results validates the proposed method. Alignment tasks are performed with approximately 5 mm accuracy.

References

[1] N.J. Ayache and O.D. Faugeras, HYPER: A new approach for the recognition and positioning of two-dimensional ob-jects, IEEE Trans. Pattern Analysis and Machine

Intelli-gence, 8(1), 1986, 4454.

[2] C. A. Rothwell, A. Zisserman, D. A. Forsyth and J. L. Mundy, Planar Object Recognition using Projective Shape Representation, International J. of Computer Vision, 16, 1995, 57-59.

[3] Hemant D. Tagare, Shape-based non-rigid correspondence with application to heart motion analysis, IEEE Trans.

Med-ical Imaging, 18(7), 1999, 570578.

[4] G. D. Hager. A modular system for robust positioning using feedback from stereo vision. IEEE Transactions on Robotics

and Automation, 13(4), 1997, 582-595.

[5] S. Hutchinson, G. D. Hager, and P. I. Corke, A tutorial on vi-sual servo control, IEEETrans. on Robotics and Automation, 12(5), 1996, 651-670.

[6] J. A. Piepmeier, H. Lipkin, Uncalibrated Eye-in-Hand Vi-sual Servoing, The International Journal of Robotics

Re-search, 2003.

[7] Y.Lamdan, J.T.Schwartz, and H.J. Wolfson. Object recog-nition by affine invariant matching. In Proc. CVPR, pages 1988, 335-344.

[8] Jacopo Piazzi, Domenico Prattichizzo, Noah J. Cowan, Auto-epipolar Visual Servoing, International Conference on

Intelligent Robots and Systems, 2004.

[9] Patrick Rives, Jose R. Azinheira, Linear Structures Follow-ing by an Airship usFollow-ing VanishFollow-ing Point and Horizon Line in a Visual Servoing Scheme, Internatlonal Conference on

Robotics & Automation, 2004.

[10] J.Sklansky.Measuring concavity on a rectangular mosaic. IEEE Trans Comput. 21, 1972, 1355-1364.

[11] J. L. Mundy, Andrew Zisserman, Geometric invariance in

computer vision, The MIT Press, 1992.

[12] S. Benhimane and E. Malis, Real-time image-based track-ing of planes ustrack-ing efficient second-order minimization, IEEE/RSJInternational Conference on Intelligent Robots