VELI-TAPANI PELTOKETO Benchmarking of Mobile Phone Cameras ACTA WASAENSIA 352 COMPUTER SCIENCE 16 TELECOMMUNICATION ENGINEERING Reviewers Professor Farag Sallabi University of United Arab Emirates Department of Information Technology P.O. Box 15551 Al Ain, UAE Vice President, Dr. Lasse Eriksson Cargotec Oyj P.O. Box 61 FI-00501 Helsinki, Finland III Julkaisija Julkaisupäivämäärä Vaasan yliopisto Elokuu 2016 Tekijä(t) Julkaisun tyyppi Veli-Tapani Peltoketo Artikkeliväitöskirja Julkaisusarjan nimi, osan numero Acta Wasaensia, 352 Yhteystiedot ISBN Vaasan yliopisto Teknillinen tiedekunta Tietotekniikan yksikkö PL 700 65101 VAASA 978-952-476-684-5 (painettu) 978-952-476-685-2 (verkkoaineisto) ISSN 0355-2667 (Acta Wasaensia 352, painettu) 2323-9123 (Acta Wasaensia 352, verkkoaineisto) 1455-7339 (Acta Wasaensia. Tietotekniikka 16, painettu) 2342-0693 (Acta Wasaensia. Tietotekniikka 16, verkkoaineisto) Sivumäärä Kieli 168 Englanti Julkaisun nimike Matkapuhelinkameroiden suorituskykymittaus ja vertailu Tiivistelmä Matkapuhelinten käytön kasvu on korostanut matkapuhelimien kameroiden kuvalaadun merkitystä. Viime vuosina on julkaistu useita uusia kuvalaatuun liittyviä standardeja ja vanhoja standardeja päivitetään jatkuvasti. Standardit määrittelevät kuvalaadun mittaukset kuitenkin vain tietyille ominaisuuksille ja yleisiä mittareita koko kameran laadulle ei ole olemassa. Tässä väitöskirjassa tutkitaan ja määrittellään vaatimuksia ja haasteita, jotka liittyvät matkapuhelin- kameroiden suorituskykymittauksiin ja vertailuun. Väitöskirjassa luodaan laaja katsaus digitaalisen kuvauksen laatuun ja tekijöihin, jotka heikentävät kuvan laatua. Samalla työssä paneudutaan virheisiin, joita tavataan digitaalisessa kuvauksessa. Väitöskirja käsittelee myös kameroissa esiintyviä viiveitä ja kameroiden nopeutta. Lisäksi kameroiden laadun ja suorituskyvyn mittauksessa käytettäviä mittareita ja metodeita esitellään kattavasti. Työ sisältää viisi aiemmin julkaistua artikkelia, jotka yhdessä väitöskirjan muun tekstin kanssa muodostavat tutkimuskokonaisuuden. Väitöskirja sisältää pohdintaa erilaisista kuvalaadun ja suorituskyvyn mittareista sekä miten niitä voidaan käyttää kameroiden vertailussa. Työ esittelee myös haasteet, jotka liittyvät useiden ominaisuuksien yhdistämiseen yhdeksi suoritus- kykyä mittaavaksi arvoksi kameroiden vertailun helpottamiseksi. Lisäksi väitös- kirjassa käsitellään erilaisia ympäristömuuttujia ja niiden vaikutusta suoritus- kykymittaukseen. Tutkimuksen tuloksena väitöskirja esittelee uuden suorituskykymittausmetodin matkapuhelimien kameroille. Erittäin tärkeä osa työn tulosta on myös suoritus- kykymittaukseen liittyvien vaatimusten ja haasteiden määritys. Asiasanat Suorituskykymittaus, matkapuhelinkamera, kuvanlaatu. V Publisher Date of publication University of Vaasa August 2016 Author(s) Type of publication Veli-Tapani Peltoketo Selection of articles Name and number of series Acta Wasaensia, 352 Contact information ISBN University of Vaasa Faculty of Technology Department of Computer Science P.O. Box 700 FI–65101 VAASA FINLAND 978-952-476-684-5 (print) 978-952-476-685-2 (online) ISSN 0355-2667 (Acta Wasaensia 352, print) 2323-9123 (Acta Wasaensia 352, online) 1455-7339 (Acta Wasaensia. Computer Science 16, print) 2342-0693 (Acta Wasaensia. Computer Science 16, online) Number of pages Language 168 English Title of publication Benchmarking of Mobile Phone Cameras Abstract Great success of the mobile phone industry has highlighted image quality of mobile phone cameras. Several new standards are published during recent years and old standards are continuously updated. Nevertheless, each standard measures and validates a certain image quality feature or artefact and there is a lack of generic metrics to validate a whole camera system. This thesis investigates and defines the requirements and challenges of benchmarking when mobile phone cameras are compared. This thesis contains a comprehensive introduction of image quality factors, image quality distortions and artefacts, which are common in digital imaging. In addition, performance issues like delays and slowness of camera systems are investigated. Correspondingly, quality metrics and methods, which are used to validate image quality and performance of digital cameras are described. This work includes five previously published articles, which are an essential part of the thesis work. This thesis includes considerations of different image quality and performance metrics and how they should be used in benchmarking. The challenges of a single number benchmarking score is discussed. The environmental factors, like lightness are evaluated, too and their influence to benchmarking is discussed. The outcome of the research is a novel benchmarking method for mobile phone cameras which includes both quality and performance metrics. Even more important this thesis highlights the requirements and challenges of a mobile phone camera benchmarking. Keywords Benchmarking, mobile phone cameras, image quality, performance. VII PREFACE It is contradictory to look back the time period which I have spent in University of Vaasa. Contradictory, because the time from the autumn 2009 is at the same time very short but contains so many exciting and rewarding moments. Moments, when I have been tested myself and qualified, how far I can force and motivate myself. In 2009 I started to study again, after 16 years in the working life. From the very first steps, the spirit and extremely beautiful environment of Vaasa University fascinated and convinced me that I am the right place to learn more and try something new. When I graduated as a master of science in 2011, it was quite obvious to ask a permission to continue my studies as a postgraduate student. However, a university, how beautiful it could be, is just a bunch of empty buildings without inspiring people. I would like to thank all the personnel and student colleagues of Vaasa University because they made the University of Vaasa a good place to be and study. Particularly, I am very grateful to Professor Mohammed Elmusrati, who has supported, advised, and guided me during the doctoral work. I have had a privilege to work and study at the same time. The twofold role has given me perspective to focus to the essential areas in the studies and research. At the same time, my role in the working life has enabled to use learned things and research results at once in real products and vice versa, validate the research methods using real data got from the product development. Therefore I have a great pleasure to thank the whole personnel of Sofica Ltd. The atmosphere of the company encouraged me to target higher than I would have never expected. Especially I would like to thank Mr. Marko Nurro, who partially forced but mostly motivated me to face new challenges. Without the exciting and rewarding moments in demanding customer meetings and negotiations, I would not have been ready to present my research and papers in several conferences. I would also thank my colleagues in Nokia Technology. I have worked in Nokia Technology and particularly in Digital Media group, when I have been writing my doctoral thesis. Again, I have learned new facts of image quality and new approaches to tackle challenging problems. I am also grateful to Mr. Graham Soundy, who has shown extremely patience to review and correct my English, which is – I afraid – far from the beautiful British English. VIII Every manuscript requires a proper review. Professor Farag Sallabi and Doctor Lasse Ericsson made a valuable work to review and comment my thesis. Especially I want to thank Doctor Lasse Ericsson, whose review was extremely comprehensive and denoted a great commitment to pre-examination work as well as great understanding of the research area. Finally, it is difficult to highlight enough the importance of the most loved ones. The pages of this thesis would not be enough to express my gratefulness and love which I feel to my wife Arja and my sons Matias, Tuomas and Mikael. All in all, the time from year 2009 has been very fun, which is the main motivator to make something new. Nurmo, Finland, 1st June, 2016 Veli-Tapani Peltoketo IX Contents PREFACE ........................................................................................................... VII 1 INTRODUCTION ........................................................................................... 1 1.1 Background and motivation .................................................................. 1 1.2 Objectives and contributions ................................................................ 2 1.3 Methods ................................................................................................ 3 1.4 Structure of thesis ................................................................................. 4 2 PRINCIPLES OF MODERN MOBILE PHONE CAMERA .......................... 5 2.1 Glance at history of photography .......................................................... 5 2.1.1 Film era ................................................................................ 5 2.1.2 Digitalization ........................................................................ 6 2.1.3 From CCD to CMOS ........................................................... 7 2.1.4 Mobile phone cameras ......................................................... 8 2.2 Generic structure of mobile phone camera ........................................... 9 2.2.1 Image sensor ...................................................................... 10 2.2.2 Camera module .................................................................. 11 2.2.3 Image processing pipeline .................................................. 12 2.3 Future trends ....................................................................................... 13 2.3.1 Sensor innovations ............................................................. 13 2.3.2 New steps in lens systems .................................................. 14 2.3.3 From one sensor to sixteen................................................. 15 3 IMAGE QUALITY, DISTORTIONS AND ARTEFACTS OF MODERN DIGITAL CAMERA ..................................................................................... 16 3.1 Image quality – problematic abstract .................................................. 17 3.2 Image quality entities .......................................................................... 20 3.2.1 Resolution .......................................................................... 21 3.2.2 Color accuracy ................................................................... 22 3.2.3 Dynamic range ................................................................... 24 3.2.4 ISO speed ........................................................................... 25 3.2.5 Image processing ............................................................... 26 3.2.6 Summary of image quality entities .................................... 26 3.3 Artefacts of digital imaging ................................................................ 27 3.3.1 Sensor based artefacts ........................................................ 28 3.3.1.1 Fixed pattern noise ........................................ 28 3.3.1.2 Temporal noise ............................................. 29 3.3.1.3 Banding ......................................................... 30 3.3.1.4 Green imbalance ........................................... 31 3.3.1.5 Moiré ............................................................ 32 3.3.1.6 Blooming ...................................................... 32 3.3.1.7 Black sun ...................................................... 32 3.3.1.8 Rolling shutter .............................................. 33 3.3.2 Camera module based artefacts.......................................... 35 3.3.2.1 Lens aberrations ............................................ 35 X 3.3.2.2 Defocus ......................................................... 38 3.3.2.3 Vignetting ...................................................... 39 3.3.2.4 Color shading ................................................ 39 3.3.2.5 Short focal length issues ................................ 39 3.3.2.6 Other lens artefacts ........................................ 40 3.3.3 Image processing pipeline based artefacts.......................... 40 3.3.3.1 Compression .................................................. 41 3.3.3.2 Color inaccuracy ........................................... 42 3.3.3.3 Sharpening artefacts ...................................... 42 3.3.3.4 Noise removal artefacts ................................. 43 3.3.3.5 Demosaicing .................................................. 44 3.3.3.6 Over processed images .................................. 45 3.3.4 Summary of digital imaging artefacts ................................ 45 3.4 Video quality and artefacts .................................................................. 47 3.5 Is camera performance part of image quality? .................................... 48 4 IMAGE QUALITY MEASUREMENT METHODS AND METRICS OF MOBILE PHONE CAMERAS ...................................................................... 51 4.1 Standardization and current tools ........................................................ 52 4.2 Traditional objective quality metrics ................................................... 53 4.2.1 Color measurements ........................................................... 53 4.2.2 Noise measurements ........................................................... 56 4.2.3 Dynamic range measurements ............................................ 58 4.2.4 Resolution measurements ................................................... 59 4.3 Metrics for image quality artefacts ...................................................... 63 4.3.1 Lens distortions .................................................................. 63 4.3.2 Vignetting and color shading .............................................. 65 4.3.3 Flare and blooming ............................................................. 66 4.3.4 Other artefacts..................................................................... 66 4.4 From objective to subjective metrics ................................................... 66 4.5 New features and algorithms require new metrics .............................. 69 4.6 Video metrics ...................................................................................... 69 4.7 Performance metrics ............................................................................ 70 5 FROM MEASUREMENTS TO BENCHMARKING ................................... 73 5.1 Benchmarking in general ..................................................................... 73 5.2 Existing benchmarking metrics for digital cameras ............................ 74 5.3 Challenges of camera benchmarking .................................................. 75 5.3.1 Which metrics to select....................................................... 76 5.3.2 Metrics of different environments ...................................... 77 5.3.3 Perceptual benchmarking ................................................... 78 5.3.4 Several metrics to single score ........................................... 79 5.3.5 Practical issues of benchmarking ....................................... 79 5.3.6 Static benchmarking, compatibility requirement or trap? .. 80 5.4 Proposal for mobile phone camera benchmarking .............................. 81 6 INTRODUCTION TO ORIGINAL PUBLICATIONS.................................. 85 XI 6.1 Article I: Objective verification of audio-video synchronization ....... 85 6.2 Article II: Mobile phone camera benchmarking – Combination of camera speed and image quality ......................................................... 86 6.3 Article III: Evaluation of mobile phone camera benchmarking using objective camera speed and image quality metrics ............................. 87 6.4 Article IV: Mobile phone camera benchmarking in low light environment ........................................................................................ 88 6.5 Article V: SNR and visual noise of mobile phone cameras ............... 92 7 CONCLUSIONS, DISCUSSION AND FUTURE ........................................ 94 7.1 Future .................................................................................................. 96 REFERENCES ..................................................................................................... 98 REPRINTS OF PUBLICATIONS ..................................................................... 110 XII Figures Figure 1 History of photography a) The first photograph, captured by Joseph Nicéphore Niépce in 1826 or 1827 and b) Boulevard du Temple by Louis-Jaques-Mandé Daguerre in 1838 ............................................. 5 Figure 2 Probably the first digital camera. (Photograph by Eastman Kodak) .. 7 Figure 3 CMOS sensor a) Side view, picture by Samsung and b) Nayer filter, picture by Adimec .................................................................. 10 Figure 4 Camera module of modern mobile phone: a) Simplified example and b) Camera module of Lumia 1020 by Microsoft ....................... 12 Figure 5 Example of an image processing pipeline ........................................ 13 Figure 6 Color differences between mobile phone cameras. ......................... 24 Figure 7 Noise in a picture captured in 30 lux ............................................... 29 Figure 8 A maze pattern caused by green imbalance artefact ........................ 31 Figure 9 Black sun image artefact in the video stream of IAAF World Championships in Beijing 2015 (Youtube) ..................................... 33 Figure 10 The principle of the rolling shutter defect. The image is based on paper by Sun et al 2012 .................................................................... 34 Figure 11 Monochromatic aberrations a) Spherical aberration, b) Positive coma, c) Astigmatism, and d) Field curvature (Hecht 2002; Kingslake 1992) ............................................................................... 36 Figure 12 Distortions a) Pincushion, b) Barrel, and c) Moustache .................. 37 Figure 13 Chromatic aberrations a) Axial and b) Lateral ................................. 37 Figure 14 Image processing pipeline example: a) RAW image from sensor and b) Processed image .................................................................... 41 Figure 15 Sharpening artefacts ......................................................................... 43 Figure 16 Noise removal artefacts, blurring and blockiness: a) Original scene and b) Aggressive denoising ............................................................ 44 Figure 17 Macbeth color chart ......................................................................... 54 Figure 18 ISO 15739:2013 noise chart (Danes Picta) ...................................... 57 Figure 19 MTF curves of three mobile phones captured from a low contrast slanted edge chart: (a) Very discreet sharpening, 8 mega pixels, (b) Over sharpening, 13 mega pixels (c) Poor resolution performance, 20 megapixels. ................................................................................. 61 Figure 20 Examples of the resolution test charts: (a) High contrast slanted edge, (b) Low contrast slanted edge, (c) Detail of sinusoidal Siemens star and (d) Colored dead leaves. The image is based on paper by Peltoketo 2014 ................................................................... 62 Figure 21 Lateral chromatic aberration and geometric distortion in a dot test chart. ................................................................................................. 64 Figure 22 Hyperbolic zone plates of ISO 12233:2000 test chart. .................... 68 Figure 23 Test scene of benchmarking ............................................................. 87 Figure 24 Measured devices in speed-quality coordinate system .................... 88 Figure 25 Mobile phone camera parametrization in different illumination environments: a) ISO speed and b) Exposure time .......................... 89 Figure 26 Quality and speed metrics in different illumination environments: a) Spatial resolution and b) Focus time ............................................ 90 XIII Figure 27 Benchmarking in 30 and 1000 lux illumination environments: a) Speed score and b) Quality score .................................................... 91 Figure 28 SNR based noise and visual noise in different illumination environments: a) 1000 lux and b) 30 lux ......................................... 93 XIV Tables Table 1. Summary and a short description of image quality entities ................... 26 Table 2. Summary and a short description of sensor based artefacts in digital imaging ................................................................................................. 45 Table 3. Summary and a short description of camera module based artefacts in digital imaging ...................................................................................... 46 Table 4. Summary and a short description of image processing pipeline based artefacts in digital imaging ................................................................... 46 XV Terms and abbreviations 3D Three Dimensions ANSI American National Standards Institute API Application Programming Interface BAPCo Business Applications Performance Corporation Bayer filter Color filter array for red, green, and blue pixels BM3D Block-Matching and 3D filtering Bokeh The way the lens renders out-of-focus points of light BSI Back Side Illumination CCD Charge-Coupled Device CFA Color Filter Array CIE International Commission of Illuminance CIEDE International Commission of Illuminance, Delta E CIPA Camera & Imaging Products Association CMOS Complementary Metal Oxide Silicon CPIQ Camera Phone Image Quality CTA Consumer Technology Association DCT Discrete Cosine Transform DSLR Digital Single-Lens Reflex camera DSNU Dark Signal Non Uniformity DSP Digital Signal Processor EBU European Broadcast Union EEMBC Embedded Microprocessor Benchmark Consortium Euro NCAP The European New Car Assessment Programme FPGA Field-Programmable Gate Array FPN Fixed Pattern Noise FR Full Reference GFS Glare Spread Function GPU Graphics Processing Unit H.26x Series of video coding standards HDR High Dynamic Range HVS Human Vision System I3A International Imaging Industry Association IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers ISO International Organization for Standardization ISP Image Signal Processor ITU-T International Telecommunication Union-Telecommunication JND Just Noticeable Difference JPEG Joint Photographic Experts Group LCD Lateral Chromatic Displacement LGD Local Geometric Distortion MEMS Micro Electro-Mechanical System XVI MOS Mean Opinion Score MPEG Moving Picture Experts Group MTF Modulation Transfer Function NR No Reference OECF Opto-Electronic Conversion Function OIS Optical Image Stabilizer P1858 IEEE Work group for Camera Phone Image Quality (CPIQ) work PRNU Photo Response Non Uniformity RAW Raw image data without image processing RR Reduced Reference S-CIELAB CIELAB with spatial extension SMI Sensitivity Metamerism Index SMIA Standard Mobile Imaging Architecture SPEC Standard Performance Evaluation Corporation TC Technical Committee TOF Time Of Flight TPC Transaction Processing Performance Council VCM Voice Coil Motor VGI Veiling Glare Index VQEG Video Quality Experts Group WDR Wide Dynamic Range WoWCA Workshop on Wireless Communication and Applications XVII Equations (1) ......................................................................................................................... 38 (2) ......................................................................................................................... 38 (3) ......................................................................................................................... 39 (4) ......................................................................................................................... 54 (5) ......................................................................................................................... 55 (6) ......................................................................................................................... 56 (7) ......................................................................................................................... 65 (8) ......................................................................................................................... 83 (9) ......................................................................................................................... 83 (10) ....................................................................................................................... 83 XIX List of publications This thesis consists of a literature review of image quality issues and metrics, an introductory section about digital camera benchmarking and five published articles. The bibliographic data of the articles reprinted in this thesis are as follows: I. Peltoketo, V-T. (2012). Objective Verification of Audio-Video Synchronization. The 3rd Workshop on Wireless Communication and Applications, WoWCA. II. Peltoketo, V-T. (2014). Mobile phone camera benchmarking – Combination of camera speed and image quality. Proceedings of the Electronic Imaging conference. In Image Quality and System Performance XI. San Francisco, USA: SPIE 9016. DOI: 10.1117/12.2034348. III. Peltoketo, V-T. (2014). Evaluation of mobile phone camera benchmarking using objective camera speed and image quality metrics. In Journal of Electronic Imaging. 23:6. DOI: 10.1117/1.JEI.23.6.061102. IV. Peltoketo, V-T. (2015). Mobile phone camera benchmarking in low light environment. In Image Quality and System Performance XII. San Francisco, USA: SPIE 9396. DOI: 10.1117/12.2075630. V. Peltoketo, V-T. (2015). SNR and Visual Noise of Mobile Phone Cameras. In Journal of Imaging Science and Technology. 59:1. DOI: 10.2352/J.ImagingSci.Technol.2015.59.1.010401. The articles are reprinted unchanged at the end of this thesis starting from page 110. The chapters of the thesis with the reprinted articles constitute an entity which defines the contributions of the work. The included articles start at the following pages of this thesis: Article I……………………………………………………………................... 110 Article II………………………………………………………………………. 116 Article III……………………………………………………………………… 125 Article IV..………………….……………………………………...………….. 132 Article V………………………………………………………………………. 142 1 INTRODUCTION The huge success of the mobile phone industry has pushed the use of digital imaging toward a skyrocketing growth. In year 2010, YouTube estimated that during every minute, 20 hours of video was uploaded to their server. Samsung estimates that in year 2015 880 billion digital images will be captured. The rate of still image capturing and video recording is not decreasing, rather vice versa. Now that digital imaging has become an ordinary event for storing and sharing moments of everyday life, the importance of image quality and thereby the importance of camera quality has increased. Several new standards and image quality measurement methods have been published during recent years and old standards are continuously being updated. Still, each standard measures and validates a certain quality feature of a camera system and there is a lack of generic metrics to validate a whole camera system. Benchmarking is an approach to validate and compare whole camera systems and to help an end user to select the best camera for his or her purposes. If benchmarking is an independent metric, it could be used by mobile phone operators and vendors to advertise their products. The thesis contains a comprehensive introduction to image quality factors, image quality distortions and artefacts which are common in digital imaging. In addition, performance issues like delays and slowness of the camera system are investigated. Correspondingly, quality metrics and methods, which are used to validate the quality of the digital cameras are described. The work concentrates on benchmarking of mobile phone cameras and for that introduces a novel solution. The benchmarking includes different image quality metrics, performance metrics and methods that are used to create a straightforward score for comparing cameras. Also environmental factors are considered. The thesis is mainly focused on still image quality and performance though some video related metrics are used. 1.1 Background and motivation The origins of the thesis are partially related to the work of the author in the Finnish startup company Sofica. The first product from the company was an automated test system for digital cameras and a logical step from this testing and measuring was a comparison of camera devices. At that time the mobile phone vendors were deeply involved in the mega pixel competition of mobile phone cameras and it was reasonable to expect that the use of mobile phone cameras would dramatically expand. 2 Acta Wasaensia Though there were products in the markets that measured image quality, it was surprising that there were no companies, tools or standards making more comprehensive ranking or providing benchmarking of mobile phone cameras. The gap between the expectations of mobile phone cameras and the impossibility of properly comparing cameras encouraged the author to start developing a benchmarking system for mobile phone cameras. Although this work is partly based on work of the author in the company, the proposed solution for the benchmarking is not a product of the company. 1.2 Objectives and contributions In general, the objective of the work is to introduce a comprehensive benchmarking system which can be used to compare and rank mobile phone cameras. Since benchmarking is a combination of numerous image quality and camera performance factors, the work required a significant effort to inspect and validate the different quality elements of a mobile phone camera. During the research following research questions emerged: 1. Which requirements should a comprehensive benchmark system of mobile phone cameras fulfill? 2. Which metrics should be included in a benchmarking system? 3. How should different environmental factors be taken into account in a benchmarking system? 4. How would the evolution of digital cameras, algorithms and testing methods affect the benchmarking system? The research questions were answered during the years of the dissertation work. Certainly, numerous new questions were raised during the work and it is difficult or even impossible to give comprehensive answers to all the research questions. Still the work defines extensively the diversity and complexity of camera quality factors, considers the requirements of a benchmarking system and finally, introduces a solution for benchmarking mobile phone cameras. The thesis is a workflow of the investigations, considerations, trials and conclusions required to find answers to the research questions. The main tasks and contributions of the thesis are: - To create a comprehensive summary of image quality factors, image quality distortions, artefacts and performance issues, which are related to digital imaging, Acta Wasaensia 3 - to collect and validate image quality metrics and methods available in different standards, papers and literature, - to inspect the requirements a comprehensive benchmark system should fulfill, and - to introduce a solution for a generic and public benchmarking method. The published articles attached at the end of the work and the content of the thesis create an entity which answers the research questions, defines the tasks made during the work and finally, constitutes the contributions of the dissertation. 1.3 Methods In general, evaluation of digital cameras can be divided to two main methods: objective and perceptual i.e. subjective methods. The objective methods are traditionally related to measurements and statistical analysis of image data whereas the perceptual methods use observers, which validate the quality and functionality of images and cameras. Characteristics of objective methods enable to use automated measurements and calculations which make this approach very efficient. On the other hand, the correlation between objective methods and human inspection is not always good enough. For this reason, perceptual methods are used. However, perceptual methods require a significant amount of human work and make this method inefficient and time consuming. To combine the pros of both methods, conversion algorithms have been built to use efficient objective methods and convert, if required, results to perceptual ones. This is nowadays one of the main research area especially in image quality inspection. Furthermore, evaluation of digital cameras can be divided according to the existence of original data. No-reference, reduced-reference and full-reference methods can be used. The full-reference method uses the original data, i.e. original image of a scene, whereas no-reference method has no information about original scene. The reduced-reference method uses certain pre-calculated characteristics of original data and compares them to corresponding ones of captured data. This thesis is primarily based on objective quality and performance methods which are used by an automated measurement system. However, conversion algorithms have been used to certain image quality metrics to achieve better correlation towards perceptual inspection. 4 Acta Wasaensia 1.4 Structure of thesis The thesis consists of an introductory section on image quality in general and different metrics, measurements and artefacts which could be related to mobile phone cameras. Moreover, benchmarking challenges are described and discussed. Finally, five publications are reprinted in their original form at the end of this thesis to describe the research into benchmarking of mobile phone cameras. The first chapter introduces briefly the topic of the thesis, describes the background and motivation and specifies the research questions. Chapter two describes the principles of a modern mobile phone camera starting from the early history of photography and following the technology steps through to recent models of mobile phone cameras. The generic structure of a mobile phone camera is discussed as well as future trends. Chapter three concentrates on image quality features, distortions of image quality and quality artefacts. The chapter includes a broad literature review of image quality features like color, resolution, dynamic range and ISO speed. Also different image quality and video artefacts are described. Finally the chapter considers whether the performance of a camera should be part of the image quality. The fourth chapter, Image quality measurement methods and metrics of modern mobile phone camera, defines how the features and artefacts of chapter three can be measured and which kind of metrics can be used. The chapter includes a view of current standards and tools. It also defines needs for new metrics due to new features of cameras. Chapter five includes the challenges faced when individual metrics are combined into a benchmarking score. The chapter defines the tasks for suitable metric selection, environmental factors of a benchmarking, and how to create a benchmarking system when the features and requirements of mobile phone cameras are changing. Finally, the chapter introduces a solution for mobile camera benchmarking. The sixth chapter, Introduction to the original publications, includes short summaries of the attached articles, plus the main objectives, tasks and results of each individual item of work. Conclusions of the study and this thesis are finally drawn in chapter seven. The articles are reprinted unchanged at the end of the work. Acta Wasaensia 5 2 PRINCIPLES OF MODERN MOBILE PHONE CAMERA 2.1 Glance at history of photography 2.1.1 Film era Looking back at history gives a perspective on today’s research. The oldest photograph, which has survived up to present times is an image captured by Joseph Nicéphore Niépce in 1826 or 1827 in the Burgundy region of France. The image is shown in Figure 1. Niépce used a pewter plate covered by bitumen in a camera obscura. The exposure time was at least eight hours. After the exposure, Niépce washed the plate using a mixture of oil of lavender and white petroleum and removed the bitumen which was not hardened by light. Thus the first image was a direct positive picture. The original pewter plate is held at University of Texas, Austin. (University of Texas; Tom A. 2014; Peres, M. 2007) Niépce continued his photography development with Louis-Jaques-Mandé Daguerre and they created a method using a copper plate covered by silver and iodine. Daguerre managed to improve the method using mercury and finally captured very detailed pictures as shown in Figure 1b. (Tom A. 2014; Peres, M. 2007) (a) (b) Figure 1 History of photography a) The first photograph, captured by Joseph Nicéphore Niépce in 1826 or 1827 and b) Boulevard du Temple by Louis-Jaques-Mandé Daguerre in 1838 6 Acta Wasaensia The first real manufactured camera was built by Alphonse Giroux, who got a license from Daguerre and Niépce’s son to use their technique in this camera. The camera was made using wooden boxes, and included a real 380 mm objective having an f-number between 14 and 15. The focus was adjusted by sliding the inner part of the camera in which the photography plate was mounted. Even though the price of the camera was notable high, 400 francs, the camera was still a great success. (Tom A. 2014) At the same time British chemist, Henry Fox Talbot created a method using silver nitrate and captured the first negative images in 1835. He developed also a process where positive images were created from negative ones. The negative/positive method based on work of Talbot, dominated the photography industry more than 150 years. (Tom A. 2014; Peres, M. 2007) Photography was an instant success, cameras were spread around the world and new inventions were made all the time. New film materials, new camera techniques and even 3D cameras were implemented. Wilhelm Rollman created a 3D camera technique as early as 1852 and the same technique is still in use today (Tom A. 2014). Finally, based on Hannibal Godwin’s work, George Eastman created a roll film in 1888 and the regime of modern film based photography started. (Tom A. 2014; Peres, M. 2007) It seems shameful to bypass the golden times of film photography but because it is not the topic of this work and it would require several books to highlight the importance of that time, we have to step directly into the late 20th century and into the era of the first digital cameras. 2.1.2 Digitalization It is symbolic that the first known digital image was not captured from a real world scene but from a readymade photograph captured using an analog film. Russel Kirsch made a digital image which was scanned from a photograph of his son in 1957. The size of the image was 176x176 pixels. Probably the first digital camera was developed by Steve Sasson, an engineer of Eastman Kodak in 1975. A charge- coupled device (CCD) camera with 10 000 pixels was mounted on several circuit boards and the result was stored on a cassette tape as shown in Figure 2. (PetaPixel) The CCD was invented by George E. Smith and Willard Boyle in 1969. The invention was the basis of modern digital imaging and CCD sensors are still used in astronomy and scientific imaging due to their superior noise characteristics. The Acta Wasaensia 7 importance and revolutionary impact of the invention was highlighted by the Nobel Prize awarded to Smith and Boyle in 2009. (Nobel Prize) The first digital cameras were published during the 1990-2000 decade. The first camera for consumer use was the Apple QuickTake 100 with 640x480 resolution. This camera was produced by Apple, though it was designed by Kodak. (Imaging resource) Figure 2 Probably the first digital camera. (Photograph by Eastman Kodak) 2.1.3 From CCD to CMOS In 1968, a year before CCD was invented, a complementary metal oxide silicon (CMOS) sensor was also published. However, CMOS based sensors suffered from poor fabrication processes and the fixed pattern noise of these sensors was extremely high (Wang 2008). It was more than 25 years before CMOS sensors were improved so much they could seriously thread the dominance of the CCD sensors. In year 1995, a CMOS sensor based camera-on-chip solution was published. The same chip contained several features like timing, control block, sampling and noise suppression logic (Nixon et al. 1995). The low power consumption of the CMOS 8 Acta Wasaensia sensor and the potential to add more logic to the sensor chip and therefore lower the costs of the chip pushed the use of CMOS images to a rapid growth. According to the latest Yole’s report, CMOS image sensor revenues were bypassing the revenue of CCD sensors in 2010 (Yole 2015). The second significant performance step of CMOS cameras was the invention of the back side illumination (BSI) technique. Earlier, the photo sensitive elements, photodiodes, were located at the bottom of the chip whereas all the wirings were between the light and photodiodes. Swapping the chip upside down, the photodiodes were located at the right side, i.e. from where the light was coming. This technique increased significantly the amount of photons hitting the photodiodes and therefore also increasing the quantum efficiency of the sensor. Sony published the Exmor R sensor family in 2008 which included the first BSI sensors. Five years later, Yole reported that the revenue of BSI based CMOS sensors was more than 50% of all CMOS image sensors (Yole 2015). During the development period of CMOS sensors, pixel sizes decreased and this enabled a higher and higher pixel count. When the sensor by Nixon et al. had pixel size of 20 μm, the latest CMOS sensors have reached 1 μm pixel size. Low cost and low power consumption has made the CMOS technique very suitable for mobile phone cameras. Due to continuous research and new inventions, the image quality of CMOS sensors has also reached the quality of CCD sensors. 2.1.4 Mobile phone cameras The first prototype of a mobile phone camera was introduced in Telecom 95 by Panasonic. Depending on the references, the honor of being the first commercial mobile phone camera goes to the Sharp Corporation J-SH04 model or to the Samsung SCH-V200 model. The former had a 0.1 mega pixel CMOS sensor and the latter a 0.35 megapixel CCD sensor and they were both launched in 2000. However, the first picture captured by a mobile phone and shared using the phone was taken in 1997, when Philippe Kahn captured an image of his newborn baby using a camera integrated into his phone. (EETimes; Sharp; Samsung). Since then the evolution of the mobile phone camera has been breathtaking fast. The pixel count is only one feature of a phone, though the development of this feature gives a useful overview of the mobile phone camera development: - 1.3 megapixel camera released in 2003 by Sprint, model PM8920, the same year that Sony Ericsson released the first phone model, Z1010 with a front face camera. Acta Wasaensia 9 - 2.0 megapixel in 2004 by Nokia N90 - 3.2 megapixel in 2006 by Sony Ericsson K800i model - 5 megapixels in 2007 by Nokia N95 - 8 megapixels in 2008 by Samsung i8510 - 12 megapixels in 2009 by Samsung M8910 - 13 megapixels in 2010 by Sony Ericsson S006 Finally, to underline the madness of the megapixel race, Nokia released a 41 megapixel camera in 2012, the Nokia 808 PureView model. (Digital Trends) Obviously, pixel count is not the only feature revolutionized during the recent years. The latest mobile phone cameras may include optical image stabilizer, sensor based autofocus, new color filters, multiple cameras, a global shutter, a high dynamic range and several other new features. Especially, recent mobile phone cameras have new image processing algorithms making images even better. Yole forecasts that in year 2015 the revenue of CMOS image sensors will be 10 billion dollars, and 60% of this revenue will come from mobile devices. (Yole 2015) There has been a huge advance in photography since the days of Niépce. However, history seems to repeat itself, and direct positive images are being captured once again. 2.2 Generic structure of mobile phone camera In general a mobile phone camera can be divided into three logical parts: the sensor itself, the camera module, the image processing pipeline, or image signal processor (ISP), and the flash system. The quality and benchmarking of the flash system is not part of this work. When the flash system is used, it generates a whole new dimension to still imaging. A proper investigation of a camera system with flash would require several new measurements like color temperature, uniformity, and magnitude of the flash system as well as several different environment should be noted. Even if the flash system is nowadays an essential part of mobile phone cameras, the evaluation of the flash would complicate benchmarking significantly and should be investigated in a different research. 10 Acta Wasaensia 2.2.1 Image sensor An image sensor is an essential part of a camera system. It gets light through the lens system and transforms light first into analog signal and afterwards into digital numbers. Since CMOS sensors dominate mobile phone cameras, this section concentrates on CMOS technology. Figure 3a shows the simplified inner structure of a CMOS sensor. The example is from Samsung ISOCELL technology, where photodiodes are isolated from each other (Samsung ISOCELL). The topmost element of the sensor is a micro lens, which collects light and bends it onto a pixel below. Use of a micro lens reduces optical crosstalk and allows use of a wider field of view in a camera system. A color filter array (CFA) below the micro lenses filters the light into different components. Usually a Bayer filter with red, green and blue filters is used (Peres, M. 2007). Without the CFA, the sensor would take monochromatic images. (a) (b) Figure 3 CMOS sensor a) Side view, picture by Samsung and b) Nayer filter, picture by Adimec Figure 3b shows an example of the Bayer filter. The number of green pixels is double relative to other colors, and this correlates with the color sensitivity of the human vision system (HVS). Obviously, each color filter will absorb part of the incoming photons and will therefore decrease the quantum efficiency of the sensor. Several different studies are ongoing to replace the technique, but currently the Bayer filter is the main method (Business Wire; Sony; Invisage; Foveon). When a photon hits to the silicon below the color filter array, it creates an electron- hole pair which can be electrically detected. To eliminate an electron leak between pixels i.e. electronic crosstalk, Samsung with other sensor vendors has made boundaries between pixels. Samsung calls this method the ISOCELL technique. Acta Wasaensia 11 CMOS pixels are active pixels i.e. each pixel has its amplifier. Until now, the voltages of each pixel are read line by line, converted by an analog to digital converter and sent to the image processing pipeline. However, this rolling shutter method has weaknesses. When rows are read at different time, fast moving objects are distorted in the final image. Due to this, several global shutter CMOS sensors have been recently published (Sony IMX174LLJ, CMOSIS Global Shutter). The global shutter method requires more logic per pixel. While the first CMOS pixels included three transistors, a global shutter version now requires at least five. Finally, the bottom level of the sensor contains metal wirings which transfer the information from a pixel. 2.2.2 Camera module A camera module packages the image sensor with the lens system and with mechanical parts which are required for features like auto focus, optical image stabilizer and aperture adjustments. It is also possible to integrate a digital signal processor into the camera module. Figure 4a shows a simplified example of the camera module. Firstly, the package contains a lens system, nowadays mobile phone cameras with auto focus have 5-6 lens components. Secondly, the moving lens components have their own holders and controllers. Voice coil motor (VCM) is a widely used technique to adjust lenses, but new methods like micro electro-mechanical systems (MEMS) are coming to the markets. Thirdly, an infrared filter is mounted on top of the sensor to prevent saturation due to infrared light. Finally, the sensor is wired and mounted onto a circuit board and the whole system is protected by a package. The module offers a connector which enables control of the camera and transfer of the image data. Probably the most complicated mobile phone camera module, the camera module of Lumia 1020 phone is shown in Figure 4b. Among others, it includes a 41 mega pixel sensor, VCM based autofocus and optical image stabilizer where the whole lens system is resting on ball bearings. The size of the package is 25mm by 17mm and it contains over 130 individual components. (Microsoft) 12 Acta Wasaensia (a) (b) Figure 4 Camera module of modern mobile phone: a) Simplified example and b) Camera module of Lumia 1020 by Microsoft 2.2.3 Image processing pipeline An image processing pipeline has a significant role in modern mobile phone cameras. Unfortunately, the quality of an image without image processing (RAW image) is quite poor due to a small lens system, small pixel size and sensor artefacts. In practice, the image processing pipeline recreates the image using a large number of different algorithms. The image processing pipeline can be implemented by a specific processor, digital signal processor (DSP) or graphics processing unit (GPU). Also field- programmable gate array (FPGA) are used in some cases. Moreover, the pipeline can be implemented in software and using the application processor of the phone. However, the image processing pipeline tends to be such a heavy process that it usually executes on a separate processor or chip. Acta Wasaensia 13 Figure 5 gives an example of algorithms that the image processing pipeline may contain. The process can be divided into correction, conversion and controlling tasks, like denoising, demosaicing and auto focus correspondingly. The algorithms have many connections between each other and the actions of one quality algorithm may reduce quality of another feature. The parameterization of the algorithms is a trade-off between different quality features. Figure 5 Example of an image processing pipeline Auto focus and auto exposure especially, have critical roles because they control the camera functionality and they are very time critical processes. All in all the quality of the image processing pipeline defines largely the quality of the whole camera system. 2.3 Future trends The future of digital imaging looks bright. Not only because of its own great success but due to the huge amount of new innovations. New methods and approaches in camera sensors and camera modules force engineers to implement new image processing algorithms that can comprehensively utilize new features. This section defines some of the trends, which can change the way images are captured and video is recorded. 2.3.1 Sensor innovations In the sensor area alone, there are tens of different new methods which challenge current Bayer type sensors. Aptina and Sony have introduced their clear pixel sensors which have even replaced green pixels with white pixels, like Aptina or added extra white pixels to existing filters, like Sony (Business Wire, Sony). The 14 Acta Wasaensia use of white i.e. unfiltered pixels increases the sensitivity of sensors. Sony has also published a patent which defines triangular and hexagonal pixels with seven different pixel types (Patent US 20130153748 A1). Another approach is a quantum film invented by InVisage. The quantum film is a photosensitive layer which may replace the silicon from traditional sensors (Invisage). InVisage published a quantum film sensor with 13 mega pixels in late 2015. The sensor should provide a dynamic range which is three f-stops better than conventional CMOS sensors. Moreover, Foveon has published its X3 sensor with stacked photodiodes. The sensor is based on the fact that light with longer wavelengths penetrates silicon more deeply than light with shorter wavelengths. Using this phenomenon, Foveon has implemented a layered pixel, where blue light is detected on the top of the pixel, green in the middle and red wavelengths at the base of the pixel (Foveon). Obviously this kind of sensor does not need a color filter array at all and should be more sensitive than sensors which are using one. On the other hand, Xerox PARC and IMEC develop multispectral and hyperspectral sensors mainly for industrial use, but they could also add interesting features to mobile phone cameras (GlobeNewswire, IMEC). Finally, very recently Panasonic published an organic CMOS sensor which dynamic range should be significantly better than any other conventional sensor (Panasonic). 2.3.2 New steps in lens systems Sensors are not the only area, where innovations of new techniques occurs. In optics, LensVector has released a liquid lens. A single lens component contains electrically controlled liquid crystals and the focus can be adjusted not by moving the lens but controlling the crystals which makes the focus adjustments very fast (LensVector). On the other hand, the micro electro-mechanical system technology (MEMS) has superior performance features over current voice coil motor (VCM) methods, but manufacturing problems still prevent the approach from reaching greater success (DigitalOptics). Rambus has a technology called lensless smart sensor, which includes a spiral grating of diffractive optics and sophisticated algorithms to capture an image without lenses (Rambus<). Finally, Sony has released a market ready product with an optical variable low pass filter, where a user may control the filter to find the balance between resolution and aliasing artefacts like Moiré (Sony Optics). Acta Wasaensia 15 2.3.3 From one sensor to sixteen Multiple cameras and array imaging are related to three dimensional (3D) imaging but there are also other features, which can be made using multiple sensors. Pelican Imaging has introduced a compact sensor matrix containing sixteen sensors mainly targeting 3D imaging. However the technique offers also high resolution imaging by combining the information from the multiple sensors and post-capture refocus for still images and videos (Pelican Imaging). Altek has made a system containing two 13 mega pixel sensors where one is chromatic, the other is monochromatic. Altek advertise this as instant auto focus, high resolution, good low light performance with low noise and high dynamic range (Altek). Light co. released in year 2015 perhaps the largest array imaging product: Their L16 camera with sixteen 13 mega pixel camera modules using three different focal length (Light). The product may challenge traditional digital single-lens reflex (DSLR) cameras. Also several time-of-flight (TOF) solutions are developed for 3D and depth imaging (Lytro, Heptagon). Finally, what is the role of presence capture cameras? These cameras will record the whole environment, capturing 360 degrees or even 720 degrees in 3D including surround sound. The end user will be able to re-experience the original moment in a very new and comprehensive way. However, the solution requires new infrastructures for cameras, data transfer and displays. All in all, digital imaging is rapidly changing. New products with astonishing features are already available or just around the corner and the markets and end users will decide the next successful trend. The rate of digital camera evolution challenges image quality metrics and measurements, too. When new techniques are taken into use, they will generate new features to validate and new artefacts to measure. 16 Acta Wasaensia 3 IMAGE QUALITY, DISTORTIONS AND ARTEFACTS OF MODERN DIGITAL CAMERA In a perfect world, a digital camera would reproduce exactly the photographed real world scene. The image would present all the smallest details, reproduce exact colors, without any noise and other artefacts, and in the case of consumer cameras, use the whole spectrum of the human eye. Also the dynamic range of the camera would be at least as good as the human eye and the image processing pipeline would mimic the brain’s visual processing in a perfect way. A quick glance at the endless image galleries of the Internet reveals that obviously we do not live in a perfect world. Limitations of cameras’ hardware and software, manufacturing issues with camera sensors and lenses and problems in image processing pipelines cause different issues in images. The issues can be content destroying problems like out of focus adjustments and wrong exposure values or very small and even artistic faults like an unnatural bokeh or a slightly wrong color tint. Also nature itself creates final boundaries on image quality by limiting the performance of lenses and defining the smallest objects that can be observed using the wavelengths of the human vision system, for example. When the concept of image quality is considered more closely, it can be seen problematic or even controversial. Though the image quality can be measured very comprehensively, in case of consumer products, images are ultimately judged by the human eye and by the human vision system. Although perceptual image quality and measured image quality correlates well, they definitely are not the same thing. Strictly considering image quality as a measurable entity, image quality can be defined as an overall performance of the camera in reproducing the captured scene in an image. A quality distortion can be specified as a lack of performance and image artefacts are explicit errors in the images. However, image quality is not only an objective and measurable number, it is also a perceptual view of the image. There have been several attempts to bind objective and perceptual quality metrics together. For example Keelan defined a specific method and function, the integrated hyperbolic increment function (IIHF) to transform any objective metric into a perceptual one (Keelan 2002). Also many current image quality standards have been updated to measure the perceptual image quality, too. This chapter defines the problematic concept of image quality in general. The content is not limited to mobile phone cameras, because the quality entities are generic to most of digital cameras. The chapter specifies the image quality entities, distortions of image quality and common artefacts of digital imaging. The purpose Acta Wasaensia 17 of the chapter is not just to describe or list the quality issues and artefacts but highlight the diversity and number of quality problems in digital imaging and the challenges, which different issues present in quality measurement and benchmarking. 3.1 Image quality – problematic abstract How can image quality be defined or quantified? The question is a fundamental one, when a camera system is investigated from the image quality point of view. The literature gives different approaches to the definition of image quality. Keelan divides image quality to four attribute groups to help with the clarification of image quality (Keelan 2002): - Artifactual attributes, like unsharpness and digital artefacts - Preferential attributes, like color balance and contrast - Aesthetic attributes, like composition - Personal attributes, like how a person remembers certain cherished event Obviously, artefactual attributes can be measured objectively by searching certain errors in captured images. Preferential attributes are still objectively measurable, but they also contain perceptual components like color saturation. Although aesthetic attributes are very perceptual or even personal attributes, still some evaluation can be made for example by investigating the usage of the golden ratio in captured images. Finally, personal attributes are so related to the history and emotions of a person that they cannot be measured by the image quality methods and rated as image quality attributes. However, personal attributes can be the most important factors when images are rated. Specifically, Keelan defines image quality as follows: “The quality of an image is defined to be an impression of its merit or excellence, as perceived by an observer neither associated with the act of photography, nor closely involved with the subject matter depicted.” (Keelan 2002) In his book, Handbook of Image Quality, Keelan defines an image quality unit, just noticeable difference (JND) to specify the smallest image quality difference which is noticeable to a human being. In practice, one JND is valid, if 75% of observers notices the difference (to get the specific definition of JND, see pages 35-45 from Keelan’s book). JNDs can be used separately for each quality attribute or a combination of attributes. Keelan defines also a method, where objective image quality measurement results can be transformed into JND units. (Keelan 2002) 18 Acta Wasaensia Wang and Bovik concentrate strictly on objective image quality in their book Modern Image Quality Assessment (Wang and Bovik 2006). They specify a fundamental requirement of the image quality attribute: an image quality attribute is useless, if it does not correlate well with human subjectivity. Moreover, they define three uses for objective image quality measurements: They can be used to monitor the quality of the system, benchmark devices towards each other, and to optimize the camera system. (Wang and Bovik 2006) Umbaugh defines a different objective image quality criteria in his book (Umbaugh 2005). The objective image quality is defined as an amount of error in a captured image compared with a known image, which is a logical approach. He defines several well-known statistical methods for the measurements: root mean square error, root mean square error signal to noise ratio and peak signal to noise ratio (Umbaugh 2005). Use of the equations reveals that the original images have to pre- exist and so called full-reference image quality method is used. In practice, the full- reference method is quite difficult to use in image quality measurements not only because the images are captured from a scene and exact reference image does not exist, but also because a modern image processing pipeline recreates the scene so fundamentally that a straight comparison at the pixel level is not sensible. In addition, measurements like mean square error do not always correlate with perceptual quality (Wang and Bovik 2009). In the case of subjective image quality tests, Umbaugh relies on group of observers and how they rate the images. He divides the subjective image quality tests into three categories: an impairment test to rate images in terms of how bad they are, a quality test to rate how good they are, and a comparison test to evaluate images side by side. Surprisingly, he does not refer to known standards ITU-T P.800, ITU-T Rec. BT.500-11, and ITU-T Rec. P.910, which define very comprehensively the subjective image quality methods and environments. According to the name of the book, Perceptual Digital Imaging – Methods and Applications, Lukac concentrates fully on subjective image quality (Lukac 2013). Like Wang and Bovik, Lukac divides subjective image quality into full reference (FR), reduced reference (RR), and no reference (NR) methods. However, it is notable that Lukac uses only FR and NR methods. The reduced reference method is completely omitted from the perceptual image quality assessments. The FR methods of the book are not based on the pixel level difference but more sophisticated algorithms like structural similarity and wavelet transform methods. On the other hand, the NR methods are extremely interesting ones as they evaluate the image without any information of the image content but fully rely on statistical analysis of the image data. The NR approach is recognized as Holy Grail of image Acta Wasaensia 19 quality assessment. If it reaches only moderate reliability someday, it will revolutionize the whole area of image quality measurement. (Lukac 2013) Finally, several standards define both objective and subjective image quality approaches. A mean opinion score (MOS) has been used to specify the subjective quality of images and videos. The origin of the MOS rating comes from telecommunications and quality observations of telephony networks. MOS has a five step validation for quality ranging from bad to excellent quality. MOS is an arithmetic mean of all scores given by observers. (ITU-T P.800) In addition, several perceptual video quality standards have been published by the International Telecommunication Union, Telecommunication Standardization Sector: ITU-T Rec. BT.500-11 and ITU-T Rec. P.910 in particular. ISO standards specifically define several objective and also perceptual image quality methods for specific features of digital cameras. The methods are defined, for example, for features like color fidelity, noise and resolution. The quality entities and corresponding metrics of the standards are discussed later in the thesis. As a summary, it can be said that division into subjective and objective image quality methods is widely accepted. Obviously perceptual or subjective image quality is the goal that should be pursued, because ultimately, consumer camera images are judged by the human vision system. However, there are several ways to measure subjective quality. One approach is to measure objective metrics and then convert results to perceptual ones (Keelan 2002). A group of observers can be used to rate images (ITU-T Rec. P.910). Also, image quality evaluation can mimic the human vision system and rate images accordingly (Wang and Bovik 2006). Finally, if the no-reference perceptual quality approach works reliably someday, it might replace all existing methods. All methods have pros and cons. The objective measurements are easier and cheaper to make because they can be automated at least to some level, but they do not fully correlate with perceptual image quality even if conversion algorithms are used. The subjective measurements are definitely perceptual ones, but they are expensive and time consuming and the reliability of the measurements depends on the observers. A good example of a reliability problem of subjective measurements can be found in Winklers book Digital video quality – vision models and metrics: Video Quality Experts Group (VQEG) ran several studies to find the best metric to measure subjective video quality. The methods were tested in a co-operation of several laboratories in identical environments. Finally, when the results were evaluated, it was noted that the test results between laboratories varied significantly (Winkler 2005). 20 Acta Wasaensia Moreover, subjective testing has always a variable called human being that may distort test results. Even though a large group of observers should reduce the effect of individuals, some collective phenomena can still happen. An example of a factor which may affect subjective image quality testing can be found in an article of Current Biology where it was noted that the human color perception may change between seasons (Welbourne et al. 2015). This kind of phenomenon may change the results of subjective image quality measurement. As a conclusion it can be said that several different approaches have been developed in the image quality area. However, two main research paths can be derived from the numerous image quality books, articles and papers. Firstly, to find a reliable method for measuring the image quality from no-reference data and secondly, how to convert existing objective image quality metrics into perceptual ones. The conversion between objective and perceptual metrics has been taken into account in this thesis. The latest color difference metrics as well as visual noise metrics are used in the benchmarking proposal of the research. Both the color difference and visual noise metrics represent the latest knowledge of objective image quality metric adjustment to perceptual one. However, the majority of metrics have been used in this thesis are still objective ones. Even if the conversion work is one of the main research path in image quality area, there are still comparably few acknowledged metrics which are acceptably converted. Even if the no-reference methods are very interesting approach for image quality measurement, they are not mature enough to give comprehensive and reliable results. Therefore the methods are not used in this research. 3.2 Image quality entities There are numerous image quality factors associated with modern digital cameras and each of them has some effect on the final quality. To manage the large number of factors, it is reasonable to make some classification. Keelan divides the device specific attributes into artefactual and preferential ones (Keelan 2002). An equivalent approach would be division to image quality artefacts and image quality performance of a camera system. Image quality defines the ability of a camera system to produce high quality images whereas quality artefact defines an error which may limit and violate the image quality. This section defines image quality factors. Acta Wasaensia 21 3.2.1 Resolution When digital cameras and especially mobile phone cameras are advertised, the number of pixels seems to be the main attribute. This is understandable in advertising because a single number is easy to explain and it defines, at some level, the resolution of captured images. Still, the number of pixels, even though it seems to be a very straightforward metric, can be noted in several different ways. According to the Camera & Imaging Products Association (CIPA) guideline, the term ‘number of effective pixels’ should be used when an image capture performance is clarified. Number of effective pixels is clearly a different metric to total number of pixels, because total number of pixels defines the maximum number of pixels in a camera sensor but number of effective pixels declares the number of pixels used to create an image. How can there be a difference between these metrics? For example, the mechanics of the camera system can be designed so that only part of the pixels receive light through the lens system. The Nokia 1020, of which the resolution was advertised as 41 mega pixels, the real maximum resolution of the image is 38.2 mega pixels or 33.6 mega pixels depending on the aspect ratio of the image (Nokia 2013). However, there are several other factors which affect the final image resolution and pixel count is only one of them. Also the definition of resolution is not unambiguous as it can involve to some extent to the sharpness of the image. According to the ISO 12233:2014 standard, the resolution is “an objective analytical measure of a digital capture device’s ability to maintain the optical contrast of modulation of increasingly finer spaced details in a scene.” Moreover, the sharpness, or acutance, is strictly separated from resolution and it is defined as the subjective impression of details and edges of the image. (ISO 12233 2014) Like the ISO standard, DxO separates resolution and sharpness, too. According to the DxO, resolution defines the smallest detail a camera can separate while the definition of sharpness is identical to the ISO standard one. Moreover, DxO defines the acutance as an objective measure of sharpness. (DxO Sharpness) In contrast, Imatest uses sharpness as a synonym for resolution defining it as the amount of details an imaging system can reproduce. (Imatest Sharpness) As a summary, resolution can be defined as an objective metric which defines the level of details which a camera system may produce. Still, the factors of the resolution are not fully clarified. 22 Acta Wasaensia The three main components of a camera system; camera module, sensor, and image processing pipeline have their own effects on resolution. Firstly, the lens system has a limiting resolution which can be smaller than the maximum resolution of the sensor. Moreover, the lens system has always aberrations which decrease the resolution. It is notable, that lens aberrations affect more areas far from the center of the lens (optical axis) and therefore corners and border area resolution of an image is usually poorer than the center area. Secondly, the effective pixel count of the sensor limits the resoultion. Even though the pixel count is the main characteristics of the sensor, artefacts like cross talk and noise reduces the maximum resolution. Thirdly, the image processing pipeline includes several algorithms that may affect the final resolution. Especially the autofocus algorithm has a crucial role when the final resolution is validated. If autofocus does not work correctly, the result is a blurry image whatever the resolution capabilities of other components. Moreover, algorithms like demosaicing, denoising and compression can be characterized as filtering algorithms which may filter out the smallest details from images. On the other hand, artificial sharpening algorithms may increase the subjective sharpness, even if they cannot improve objective resolution. The final resolution of an image is definitely not the pixel count of the sensor but a combination of limiting the resolutions of each component of a camera system. 3.2.2 Color accuracy The origins of color recreation in a digital camera are in camera sensor’s color filter. A color filter array (CFA) filters the light on top of a monochromatic sensor and generates normally green, red and blue color channels and correspondingly colored pixels. A demosaicing algorithm interpolates the color of an individual pixel from the single colored pixel values around it. Finally, auto white balance and color correction methods of an image processing pipeline estimate the ambient light and correct the colors correspondingly. Also a lens system may change the colors by vignetting and color shading artefacts. The final color accuracy is a combination of all these factors. The color accuracy, or fidelity, is an essential image quality feature of digital imaging and it can be defined as an ability of camera systems to reproduce colors as they exist in the original scene. In the case of objective color accuracy, the definition is quite clear, being the color difference between the scene and captured image. However, the perceptual color accuracy is a much more ambiguous metric, because it can vary between individuals, cultures or even seasons. Also it has been Acta Wasaensia 23 noted that some amplification of color saturation gives the best perceptual color rate. The rate of the amplification varies between studies. Where Keelan et al. ended up with 10% amplification, the Camera Phone Image Quality (CPIQ) study does not recommend such a high value (Keelan 2012, CPIQ 2016). Color itself can be divided in different components depending on the color space used. CIE XYZ or RGB can be defined as standardized color spaces whereas CIE L*a*b* or L*u*v* are perceptual ones (Lukac 2013). Since the most widely acknowledged color accuracy method is based on L*a*b* color space, it should represent perceptual color difference as discussed later in section 4.2.1. However, if observers prefer an image which does not replicate the colors exactly but has amplified colors, then color accuracy is probably the wrong method for measuring perceptual colors or at least, some weights should be added to match colorfulness requirements of observers. When L*a*b* and L*u*v* color spaces are investigated, they have beside the chromatic components, the luminance (L*) component. While a* and b*, or u* and v* components define the colorfulness and color balance, L* defines the lightness of the image, correlating strongly with the exposure time and ISO speed. When the color accuracy is measured from L*a*b* color space, it also measures luminance accuracy expressing how well the captured image represents the brightness of the original scene. The asterisks (*) are part of the color space names and they are used for historical reasons. In L*a*b* they have been used to distinguish them from the Lab presentation by Hunter (Hunter 1958). The origin of L*u*v* asterisks is harder to locate, they are probably used because L*u*v* color space is an improvement over CIE U*V*W* color space from year 1964. Color accuracy is an even more problematic entity from a camera point of view, because the colors of the scene are combination of the ambient light and the original colors of the scene. The human vision system knows how to compensate the effect of ambient light, but for the camera system the task is difficult. In practice, the camera has to estimate the ambient light temperature or even its spectrum and adjust colors accordingly. The success of color correction can be judged in Figure 6 where four different mobile phone models have captured images in the same ambient light environment. 24 Acta Wasaensia Figure 6 Color differences between mobile phone cameras. The worst light environment is a situation where there are two or more different light sources, for example sunlight and fluorescent light and the camera system has to interpolate color correction factors between them. All in all, the color accuracy evaluation of a camera system requires measurements in several different ambient light environments. 3.2.3 Dynamic range Dynamic range of a camera system represents the ratio between measured maximum and minimum light intensity in an image. In practice, the dynamic range defines how well the details are reproduced in the dark and bright areas in the same image. Normally the dynamic range is presented by decibels or f-stops (powers of two). Literature defines several values for dynamic range for a human eye, varying between 24-30 f-stops in situation, when the eye can adapt to the ambient light and 10-14 stops in a static light environment (Hoefflinger 2007; Cambridge in colour). The best DSLRs may have a dynamic range about 15 stops (DxO Mark) though the test results tend to vary between measurement software. According to the ISO standard, dynamic range is: “ratio of the maximum exposure level that provides a pixel value below the highlight clipping value to the minimum exposure level that can be captured with an incremental signal-to-temporal-noise ratio of at least 1” (ISO 15739 2013). In practice, the dark end is reached when the temporal noise has same value as the signal. Acta Wasaensia 25 Dynamic range can be artificially improved using high dynamic range (HDR), or wide dynamic range (WDR) techniques. The use of HDR and WDR terms vary a lot and they are also used as synonyms. Usually HDR is defined as a technique where several images are captured using different exposure times. The images are combined using dark end details of long exposure times and bright end details from short exposure images. In practice, this method can be used only in very static scenes, because any movement between images will ruin the result. WDR images are captured by using a nonlinear sensor where the differences in dark and bright areas are amplified (CMOSIS 2012). Finally, an image processing pipeline may include tone mapping algorithms which implement the same nonlinearity as the nonlinear sensor, but using software (Mantiuk, 2008). 3.2.4 ISO speed Sensitivity of a camera, ISO speed, is an interesting feature especially in digital cameras because it is strongly related to the analog era of cameras. Originally ISO speed defined the sensitivity of an analog film towards light. At the same time when the sensitivity of the film increased the granularity of the film increased, too and the quality of images decreased. In practice, when the ISO speed changed, the physical composition of the film changed. During the analog film era, ISO speed was defined as a number, which was doubled when it increased, i.e. 50, 100, 200, 400 etc. In the case of digital cameras, the ISO speed is purely a gain of the signal. Depending on the camera system, part of the gain can be added to the analog signal, before analog to digital conversion and rest to the digital signal. Since the ISO speed is only a coefficient, it affects the noise of an image significantly especially when it is added to the digital signal. The coefficient characteristics of the ISO speed in digital cameras has changed the traditional numbering of ISO speed. Quite often the ISO speed is handled as pure integer without the old rule of doubled values. In general, the ISO speed of a digital camera has quite similar characteristics to an analog film: it increases the sensitivity but decreases the quality. Since the ISO speed is an adjustable parameter, like exposure time, one may ask if the ISO speed is a quality entity of a digital camera. However, a digital camera system has some native sensitivity. All components of the camera build up some generic base sensitivity which can be then amplified with an analog or digital gain and this base ISO, or native ISO, is definitely a quality factor of a digital camera. To maintain the equivalence of ISO speed characteristics between analog film devices and digital cameras, ISO standard 12232 and CIPA DCC-004 define an 26 Acta Wasaensia environment and equations to harmonize ISO speed ratings. Using the standards, the base ISO can be measured, too. The ISO speed can be calculated from a saturation based ISO speed or noise based ISO speed. The former is based on an exposure environment that produces an image, which has the maximum value, but is not saturated. The latter measurement is based on the signal to noise ratios (SNR), where an environment with SNR 40 defines the ISO speed. (ISO 12232 2006, CIPA DC-004 2004) 3.2.5 Image processing As defined in section 2.2, the image processing pipeline of a digital camera has great number of algorithms which improve both objective and subjective image quality. Since the image processing pipeline may decrease the noise level significantly or increase the sharpness of images, it might be tempting to define the image processing efficiency as a quality entity. Particularly, in mobile phone cameras, the role of the image processing is crucial due to demanding environmental requirements of the sensor and lens system. However, the qualification of the pipeline would be difficult, because it should measure the efficiency of the image processing. It would require an access to RAW images and in the case of mobile phone cameras, they are rarely available. On the other hand, image processing is a non-removable part of mobile phones and from a consumer point of view, the final quality is much more interesting. In the case of digital single-lens reflex cameras, this kind of measurement would be reasonable, because they offer RAW images and image processing can be done using external image processing tools. 3.2.6 Summary of image quality entities Table 1 gives a summary of image quality entities related to digital cameras and discussed in this section. Table 1. Summary and a short description of image quality entities Entity Description Resolution A feature which defines the level of details which a camera system may produce. Color accuracy A camera ability to reproduce colors as they exist in the original scene. Acta Wasaensia 27 Dynamic range A feature which defines how well a camera can reproduce details both in dark and bright areas in a same image. ISO speed Analog or digital gain which amplifies an image data. On the other hand, base ISO speed or native ISO speed defines a native sensitivity of a digital camera without any amplification. Image processing A significant quality entity in digital cameras which includes several image quality improvement algorithms improving both objective and subjective image quality. 3.3 Artefacts of digital imaging As discussed before, the concept of image quality is quite a difficult entity to specify accurately. Even if image quality can be measured in several ways, perceptual image quality always entails a problematic extension to the evaluation. An artefact of digital imaging is slightly easier to describe because the artefact is always an error in the image. However, one may still ask what is an image artefact? Logically it would be a digression from a perfect image, a golden sample, which exactly represents the photographed scene. Still, here one may face a problem again, because image processing pipeline may boost colors a little bit or create high dynamic range images to increase the perceptual quality. A better description of the image artefact would then be an unwanted digression from the perfect image. And how it can be decided which change is an unwanted one? Again, we can observe that even image artefacts may have perceptual characteristics. Like the imaging quality entities, imaging artefacts can be categorized in different ways. One approach is location based, which classifies artefacts by the location where the artefacts originate from (Imatest Image quality factors). As described in the section 2.2, a modern digital camera can be divided into camera sensor, camera module and image processing pipeline entities. A subset of imaging artefacts can be strictly assigned to specific camera parts, but usually an artefact is generated by a combination of several of them. However, the source based classification is straightforward and also pragmatic way to understand numerous sources of image artefacts. 28 Acta Wasaensia 3.3.1 Sensor based artefacts A logical starting point for the artefact evaluation is the sensor of the camera, because it is the most essential part of a digital imaging. The sensor converts an analog photon flow to an electrical signal and finally to digital numbers, generating the first version of RAW image which is then processed by imaging pipeline. 3.3.1.1 Fixed pattern noise One of the most obvious artefacts of a sensor itself is a bad pixel, or in more generic form, fixed pattern noise (FPN). Fixed pattern noise can be divided into two entities depending on the characteristics of the defective pixels. If the pixel has always a static value regardless of the input signal i.e. photon flow, the artefact is described as a dark signal non uniformity (DSNU). On the other hand, if the pixel value varies, but not according to the other pixels, the defect is categorized as a photo response non uniformity (PRNU). The ISO 13406-2 standard defines artefacts of display panels, and the same definition of DSNU pixels is used in digital imaging. DSNU pixels can be categorized as hot, dead or stuck pixels, which always have the maximum, the minimum or a constant value, correspondingly. (ISO 13406-2 2001) PRNU defects are more difficult to detect because the defective pixels do not have a static value. Typically for PRNU pixels, the error of the pixel depends on temperature, exposure time and ISO settings (Theuwissen PRNU). Obviously, more heuristics algorithms are needed for a PRNU pixel than a DSNU pixel. The source of the fixed pattern noise is in the manufacturing process of the sensor, where the pixel construction in silicon is not always a perfect one. Quite often the sensor itself may remove DSNU pixels using calibration data got from the production line testing. Single bad pixels are not a major problem in a sensor with several million pixels, because they are almost impossible to detect in a non- zoomed image and they are easy to correct. However, several DSNU pixels can be located side-by-side creating a cluster, when the defect is more visible and more severe. There are also several special cases of fixed pattern noise. A common hardware logic of pixel rows or columns may cause variation between rows and columns which cause column or row fixed noise. These can cause severe quality issues, since they create vertical or horizontal lines in the image and the human vision system is very sensitive to straight lines. Acta Wasaensia 29 3.3.1.2 Temporal noise Unlike fixed pattern noise, a temporal noise varies over time and thus it is much more difficult to remove from images. The origins of temporal noise are mainly in the camera sensor even though the lens system may generate some. However, the image processing pipeline may affect the noise level in a significant way. Several algorithms in image processing add digital gain to the image, thus the gain of the noise component increases too and makes the noise more visible. On the other hand, denoising algorithms may reduce the noise significantly from the final image but too aggressive noise removal may reduce, for example, image resolution and sharpness. Generally, noise is an unwanted variance in the image and affects the sensitivity and dynamic range of a camera system. Noise can be visible especially in low light images where a low signal level, a long exposure time and a high ISO value increases the noise as in Figure 7, which is captured in a 30 lux environment. The camera adjusted the exposure time to 63 milliseconds and the ISO speed was 1665. To visualize the noise pattern, an originally uniform gray patch is magnified. Figure 7 Noise in a picture captured in 30 lux Roughly speaking, temporal noise can be divided into photon shot noise and read noise (Adimec Noise). More precisely, temporal noise can be divided into photon 30 Acta Wasaensia shot noise, dark current shot noise, reset noise, and 1/f noise (Wang 2008). Even though the terminology for temporal noise varies, the read noise can still be defined as a combination of dark current shot noise, reset noise and 1/f noise. Also the quantization noise of the analog to digital converter can be defined as a form of temporal noise (Tian 2000). The photon shot noise is related to the randomness of photons. The photon shot noise is a special noise, because it is a natural process of photons and it does not depend on the design of the sensor. There will be always photon shot noise in the RAW images and the photon shot noise follows the Poisson distribution. Thus the level of the photon shot noise is the square root of the mean signal level. Dark current shot noise, or thermal noise, depends exponentially on the temperature and it can be partially controlled by design of the sensor (Wang 2008). The dark current defines the black level of the sensor. The black level is the mean value which a camera sensor generates without any light. The black level can be, for example, 5% of the maximum value of pixel, but it depends on the exposure time and temperature. The black level together with the white level affects the dynamic range of the sensor because they limit the true pixel value scale. Reset noise, 1/f noise and quantization noise represent the rest of the read noise component, which can be reduced by good design of a sensor. The noise characteristics of the sensor define in part the performance of the sensor by limiting the sensitivity and dynamic range of the sensor. Even when the denoising algorithms are efficient, they can still reduce other quality metrics of the image. All in all, a proper design of the sensor is essential for noise free and high quality images. 3.3.1.3 Banding Every camera system has a certain bit depth, i.e. digital accuracy of a pixel. In the sensor, an analog to digital converter performs a quantization where analog signal i.e. electron flow, is changed to a digital number. Normally, a pixel has bit depth values from eight to sixteen meaning different pixel values from 256 to 65535 correspondingly. If the bit depth is too small, the quantization may come visible in the image; this effect is called a banding or contouring artefact (Fenimore and Nikolaev 2003, Bhagavathy et al. 2007). Especially when an image contains an almost uniform area, small differences in the scene, for example in the sky, are not smooth but they generate visible edges in the image. Acta Wasaensia 31 Bit depth is not the only variable to cause this artefact. Image processing algorithms like gamma correction and tone mapping may strengthen the banding artefact in bright and dark areas of images by stretching pixel value distances between corresponding illuminations. 3.3.1.4 Green imbalance Even if green imbalance can be understood as a special case of photo response non uniformity PRNU, it is such a noticeable artefact that it should be discussed separately. Green imbalance origins are in a Bayer filter, where green has two different color channels: green in red rows gr and green in blue rows gb and in demosaicing algorithms. The green imbalance becomes visible when there is a mismatch between the green channels. Technically, the green imbalance is PRNU between two green channels and it is part of the noise entity of an image. The main reason for green imbalance is different cross talk between red and green rows (Guarnera et al. 2010) or an improper demosaicing method. Green imbalance causes a maze-type pattern in images as shown in Figure 8. Figure 8 A maze pattern caused by green imbalance artefact 32 Acta Wasaensia 3.3.1.5 Moiré Every sensor has its resolution limit specified by its pixel size and pixel pitch i.e. the distance between individual pixels and other limitations of the camera system. When the details of the captured scene are smaller than the resolution limit multiplied by two, according to the Nyquist law, the image sensor cannot reproduce the details of the image (Imatest Moiré). High frequency details, for example textiles, can produce stripes to captured image. These stripes are called as Moiré artefact. Quite often Moiré artefacts are avoided by using an optical low pass filters in the lens system. Especially in video broadcasting, high frequency details may cause flickering in the stream and be a very annoying issue. In the case of still imaging, Moiré causes stripes across an area originally containing high frequency details. 3.3.1.6 Blooming Blooming is defined as an artefact which causes blurry borders in a highly exposured objects. In the worst case, the shape of the bright object will become unrecognizable and the saturated area will spread across the whole image. When blooming has occurred, pixels which have absorbed high number of photons and therefore have become saturated start to crosstalk i.e. spill electrons over to adjacent pixels. This may cause problems especially in outdoor imaging due to high illumination by the sun and on the other hand, in security systems where the low light performance is crucial, bright objects may corrupt captured images or a video stream. Arganov et al. defines three different crosstalk components in a CMOS sensor: spectral crosstalk, optical spatial crosstalk and electrical crosstalk all of which cause different artefacts in images (Arganov et al. 2003). Even though electrical crosstalk is the main reason for blooming, it is not the only one. Theuwissen defines in his famous blog seven different mechanisms, which causes blooming (Theuwissen Blooming). Fortunately, due to the design of CMOS sensors, the blooming is no longer such a severe problem as it is for the CCD sensors. In CCD sensors blooming may cause overflow of the whole vertical pixel line, which causes bright columns over the whole image (Adimec Blooming). 3.3.1.7 Black sun In a black sun artefact, an extremely highly exposured object turns from white to black in the captured image. This often happens when the capturing scene contains Acta Wasaensia 33 the sun and the circle of the sun becomes black in the captured image. One may think the artefact is due to overflow in the image processing pipeline, but the origins of the defect are inside the sensor’s logic. When a pixel exposure starts, some sensors read the black level value of a pixel by exposing it for a very short time (CMOSIS 2012). This is done to reduce the black level noise by subtracting the black level from the real exposured value. However, if a certain pixel is illuminated by an extremely bright object i.e. the sun, the reset level may rise so high, that the final pixel value is subtracted to zero and therefore the pixel contains only black color. This is not so rare a problem as one may think. The issue was visible for example in the broadcast of IAAF World Championships in Beijing 2015, see Figure 9. Figure 9 Black sun image artefact in the video stream of IAAF World Championships in Beijing 2015 (Youtube) 3.3.1.8 Rolling shutter