• Sonuç bulunamadı

行政院國家科學委員會補助專題研究計畫成果報告 ※


Academic year: 2021

Share "行政院國家科學委員會補助專題研究計畫成果報告 ※"


Yükleniyor.... (view fulltext now)

Tam metin




※ ※





計畫編號:NSC 89-2511-S-038-002

-執行期間: 2000 年 8 月 1 日至 2001 年 7 月 30


共同主持人:李友專 徐建業 林凱南







90 年







Building a Searchable Digital Video Database for Otolaryngology Teaching by using Speech

Recognition Technology

計畫編號:NSC 89-2511-S-038-002

執行期限:2000 年 08 月 01 日至 2001 年 07 月 30 日

主持人:李飛鵬 臺北醫學大學/臺北醫學大學附設醫院 共同主持人:李友專 臺北醫學大學醫學資訊研究所 共同主持人:徐建業 臺北醫學大學醫學資訊研究所 共同主持人:林凱南 臺灣大學耳鼻喉科部 計畫參與人員:盧建宏 臺北醫學院醫學資訊研究所 計畫參與人員:林明錦 臺北醫學院醫學資訊研究所 中文摘要 本計畫主要之目的在於利用語音辨識技術,將 耳鼻喉科在教學體影片中的語音轉換成文字. 配合本計劃所開發之軟體,將文字標註時間碼, 利用網路及多媒體資料庫技術,我們可以藉由 搜尋特定文字來觀賞影片段落,如此一來學生 及醫師即可利用網路快速,不受地點及時間限 制,來學習耳鼻喉科教學. Abstr act

The purpose of this project is to create and integrate multimedia medical resource for medical community. In this project, we realize the “speech to text” technology by voice recognition. In this project, the user can access any section of video clip randomly by searching the time code based text. Due to his, the user who using this system can watch the movie clips more efficiently.

Keywor ds

Otorhinolaryngology, voice recognition , Digital Video, WWW 二、緣由與目的 隨著數位科技的日益成熟,越來越多的影 音資料,在我們的生活中扮演了越來越重 要的角色. 也因此,人們也嘗試著利用這許 多資料,來作各方面的應用. 首先遇到的問 雜的影音資料,甚至是 3D 立體模型,比起 過去的文字檢索,其複雜度更是不可同日 而語。而手術錄影帶是專科醫學教育一項 重要的資產,傳統的錄影帶都是以類比方 式錄影,播放則是線性方式從頭開始到 尾。近年來由於數位媒體儲存的進步,影 音資料有從類比轉成數位,從線性轉成非 線性的趨勢。配合寬頻網路及影音伺服器 的架設,可使多人多處同時觀看錄影內 容,達到專科醫學遠距再教育的目的。 所謂非線性的播放方式即是可搜尋之影 音資料,使用者可依喜好找尋需要之錄影片 段,不需從頭到尾把錄影帶看一遍。這種技術 在非醫學方面有個很好的例子,就是在柯林頓 誹聞案大審判時的錄影內容,Virage公司和 AltaVista合作,成功的運用VideoLogger軟體 製作成可搜尋之錄影內容,使用者可以文字尋 找某一特定的錄影片段,這是第一個在網路可 搜尋之影片。 耳鼻喉科手術由於視野狹小,故常常使 用顯微鏡或內視鏡,接至錄影設備後便可將手 術過程完整的紀錄下來。常見的手術如耳部的 鼓室成型術(tympanoplasty),鼻部的功能性鼻 竇 內 視 鏡 手 術 (functional endoscopic sinus surgery) , 喉 部 的 喉 頭 直 達 鏡 顯 微 手 術 (larygomicroscopic surgery) 。 以 鼓 室 成 型 術


小時,如果使用者只對聽小骨中的磴骨(stapes) 處理有興趣,如何不需從頭找起呢? 手術錄影帶是醫學教育一項重要的資 產,傳統的錄影帶都是以類比方式錄影,播放 則是線性方式從頭開始到尾,這些影片在尋找 使用上非常不便,而且影帶會隨著時間耗損, 無法永久保存。近年來由於數位媒體儲存的進 步,影音資料有從類比轉成數位,從線性轉成 非線性的趨勢,而這些非線性的影音資料的 index 及search便成為一個熱門的課題。 結果與討論 1. 收集 10 小時具代表性及特殊之耳鼻喉 科手術影帶,首先將這些紀錄影帶數位化, 轉成MPEG1 檔。 2. 第二步由主刀醫師對手術過程配音,使 用向量麥克風及 MD 錄下,之後以語音辨識 軟體及 time code 程式產生有時間對應之文 字檔。 3. 配音時專有名詞及手術步驟的描述以中 文為主,為提高辨識率,需事先建立耳鼻喉 科專有詞庫,並對辨識軟體加以訓練。 4. 語音辨識成功後,使用相對的有時間點 之文字檔做索引,便可用 search program 及 mpeg control program 做特定影片片段的播 放,建立互動的查詢方式。 5. 統計不同配音者的語音辨識成功率。 本研究計畫收集(耳、鼻、喉)具代表 性及特殊之耳鼻喉科手術影帶,來源 由台大醫院及台北醫學院耳鼻喉科 資深醫師提供。首先將這些珍貴紀錄 影帶完全數位化,轉成AVI檔,解 析度為320X240,每一點3 byte 色彩,每一秒取樣30張,如此 一小時的影片未壓縮前需要24G B的儲存空間,壓縮成MPEG只需 50分之一約480MB,再轉成 Realvideo 只需80MB儲存。 第二步再由配音之耳鼻喉科專科醫 師與各個主刀者討論後,所有影片皆 由同一人配音,以利將來語音辨識軟 體之訓練。配音時專有名詞以英文為 主,而手術的步驟及描述則以中文為 主。 關於中英文夾雜的語音辨識一直是 個難題,我們將採取下列三種方法, 以順利的找到影片中的”time stamp”: 一、 只辨識英文專有名詞的配 音,來做為 time stamp,中文敘述 部分則予以忽略,這可用設定辨識 機率的 cut-off value 來達成。故需 事先建立耳鼻喉科英文專有詞 庫,以提高辨識率,拉大與中文辨 識的差距。 二、 在配音時如果遇到英文專有 名詞便做停頓(time lag),如此也可 分開中文與英文,但便不是連續語 音,如此會增加配音的困難度。 三、 分別使用英文與中文辨識軟 體處理混合語音,再將兩個結果加 以處理,選擇個別成功的部分綜合 起來。 語音辨識成功後,便可以這些語 音出現的時間來做索引,建立互動 的查詢方式。接下來第三步便是架 設影音伺服器,由於手術影片對解 析度的要求甚高,所以伺服器的儲 存空間及處理速度要能配合。而國 內寬頻網路建設尚未成熟,因所需 傳輸速度約為每秒 200Kbits,故初 期仍以提供內部網路(intranet)使用 者為主。 主要之儀器設備均將用於錄影帶 數位化處理及語音辨識,其中包括 數位影音剪接工作站,高解析度影 像擷取卡及軟體語音分析工具,筆


記型電腦及高效能麥克風用於隔 音室配音,以及與主刀者討論影片 內容,影音網路伺服器用於多媒體 計畫成果自評 本計劃之目的在於建立可搜尋耳鼻喉科的多 媒體教學網站,我們已成功的將語音辨識軟體, 應用在耳鼻喉科的手術錄影帶之中,將醫學的 專用名詞辨識出來,並配合時間碼的標定,可以 藉由 speech to text 的功能,搜尋文字來定位影 片. 在今年度計劃中,我們共完成了: 1. 耳鼻喉科多媒體遠距教學網站 (http://ent.tmu.edu.tw)

2. Intergrated timecode based annotation system

3. 耳鼻喉科關鍵辭庫

此計畫相關結果,也發表於 2000 年國 際醫學資訊研討會(Medical Informatic syposium in Taiwan 2000) 見附錄一,並獲最 佳論文獎。(題目: A Searchable Digital Video Database for Otolaryngology Teaching By using Speech Recognition Technology 作者: 盧建 宏,徐建業,李飛鵬,林凱南),本計劃團隊完 成了第一年所預期的工作目標,並有最佳論文 獎之發表,完全符合計劃書中的預估完成工作 成果. 本計劃所開發之軟體與論文也將放置 於網站 (http://ENT.TMU.EDU.TW),公開於學 術研究使用. Refer ence

1. Hauptmann, A.G “Speech recognition in the Informedia Digital Video Library: uses and limitations”, Tools with Artificial Intelligence,

1995. Proceedings., Seventh International Conference on , 1995 : 288 -294

2. 林凱南、許權振:非膽脂瘤慢性中耳炎之 臨床教學經驗。中耳醫誌 1995;30:184-191 3. K. Sparck Johns, G. J. F. Johns, J. T. Foote,

and S. J. Young, “Experiments on Spoken Document Retrieval”, Information Processing & Management, vol. 32, no. 4, pp. 399-417, 1996.

4. Young, S.J.; Brown, M.G.; Foote, J.T.; Jones, G.J.F.; Sparck Jones, K., “Acoustic indexing for multimedia retrieval and browsing”, Acoustics, Speech, and Signal Processing , 1997 : 199 -202 vol.1

5. Hauptmann, A.G.; Wactlar, H.D. “Indexing and search of multimodal information” Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on , Volume: 1 , 1997 : 195 -198 vol.1



A Sear chable Digital Video Database for Otolar yngology

Teaching by using Speech Recognition Technology

Chien-Hung Lu, Chien-Yeh Hsu, F.P. Lee, K.N. Lin










Graduate Institute of Medical Informatics, Taipei Medical College, Taiwan

Taipei Medical College Hospital

National Taiwan University Hospital

Summer y

The purpose of this project is to design and build an interactive and searchable video database for Otolaryngology Teaching. By using this system, doctors can search the otolaryngology surgery video clips they really need. After building a video server and connected to the internet or intranet, we can construct a web based multimedia medical specialist teaching system to improve the quality of medical care.

After digitalization of the video, a single otolaryngology specialist dubs all the video for the operation procedures. Speech recognition technology was applied on these voices and time markers named as “time stamps“ are generated to index the video. Incorporated with the database and web system the interactive searches become possible.

Intr oduction

The main disadvantage of medical

video data, especially the operation video, is difficult to search even after digitalization. Many times we have to search from the beginning because lack of interactivity. In this project, we design and build an

interactive searchable video database. The doctors can search the otolaryngology surgery video clips. After building a video server, we construct a web based multimedia medical specialist teaching system to improve the quality of medical care. Many otolaryngology operations need

microscope or endoscopes due to small surgery field. There are many videotape records of surgery but the searching and storage is difficult. For example, microscopic surgery of the ear, endoscopic paranasal sinus surgery and laryngomicroscopic surgery are all common ENT operation in National Taiwan University Hospital and Taipei Medical College. So the digitalization of these tapes and building of video server contribute a lot to the preservation and promoting of knowledge on this field. Currently, speaker-dependent speech

recognition software has good recognition rate up to 95%. A single otolaryngology specialist dubs all the video for the operation procedures. Speech recognition technology was applied on these voices and “time stamps“ are made to index the video.


Mater ials and Methods

We collected 60 hours of otolaryngology surgery video. The videos are provided by senior doctors in the National Taiwan

University Hospital and Taipei Medical College Hospital. The video are digitized first and transformed to AVI format with 320x240 resolution. Then the video data are compressed to MPEG format and further transfer to Realvideo format.

By using Ulead Video Studio 4.0 ( 會聲 會 影 ), a single otolaryngology specialist dubs all the video mpeg files for the operation procedures. At the same time speech recognition was applied on these voices by using SpeechPro ( 超級耳朵 ) and time markers named as “time stamps“ are generated to index the video. Chinese vocabulary library for otolaryngology was created first for better speech recognition.

A timer counter programmed by Visual Basic marked the time when key word present. Like “stapes ( 磴 鼓 )” appears at time 6’15” on the demonstration surgery video. If someone interests only in “stapes ( 磴 鼓 )” part of the whole surgery, he can look directly from time 6’15”. No need to search from the beginning.


An interactive and searchable video database for Otolaryngology surgery will be designed and built. In this project, lots of surgery video for otolaryngology has been collected and rearranged for teaching purpose. Discussion and Conclusion

The bottleneck of this method is the

dubbing. The one who dubbed the video must have the domain knowledge about

otolaryngology surgery. And he have to practice

many times for better speech recognition rate although the otolaryngology vocabulary library has been made first. Many times when he speak the wrong words he has to redo again. So long surgery video is very times consuming and the repetition of the same key word made the searching difficult. So this method can only applied on short video. More systemic method had to be developed for large amount video database.

Refer ence

1. Leland, Jon ,”Online Video: The New Medium” ,Videography, vol. 23, no. 11, pp. 48(2), Nov 1998

2. 林凱南、許權振:非膽脂瘤慢性中耳炎之 臨床教學經驗。中耳醫誌1995 30 184-191 3.K. Sparck Johns, G. J. F. Johns, J. T. Foote, and S. J. Young, “Experiments on Spoken Document Retrieval”, Information Processing & Management, vol. 32, no. 4, pp. 399-417, 1996.

4.Glavitsch, U. and Schauble, P. “A System for Retrieving Speech Documents”, ACM SIGIR Conference on R&D in Information Retrieval,


Benzer Belgeler

transcription of iNOS and COX-2 in macrophages and many other cell types (Huttunen et al.,.. Consistently, we have found that NF-B activation plays an important role in

計畫編號:NSC 89-2314-B-038-034 執行期限:88 年 12 月 1 日至 89 年 7 月 31 日 主持人:王靜瓊 台北醫學大學生藥學研究所 共同主持人:顏焜熒、楊玲玲

A biomechanical comparison of tibial inlay and tibial tunnel posterior cruciate ligament reconstruction techniques: graft pretension and knee laxity.. American Journal of

The effect of injury to the posterolateral structures of the knee on force in a posterior cruciate ligament graft: a biomechanical study..

參與本計劃的研究生從本研究計畫的執行過程中獲得良好的分子生物學 (包括 RT-PCR 與 Q-PCR),蛋白質生化學 (SDS-PAGE and Western

Consistent with prior reports, a combination of ATO and ATRA was more effective than either agent alone on NB4 cells, and the addition of ATO significantly enhanced

49 compared the effects of coadministration of clozapine and fluvoxamine (N = 11) versus clozapine monotherapy (N = 12) on plasma levels of cytokines and body weight in

咀嚼… 等動物行為,在此組括為 minimal seizure (輕微發作);在高劑量主要是產生四肢無力、四肢 划船、全身抽筋伴隨著陣攣發作… 等,在此組括為 major