Digital Camera Systems

Digital cameras convert light into electronic signals. The two types of image sensors most often used are: (a) Charged Coupling Devices (CCD); and (b) Complementary Metal Oxide Semiconductors (CMOS). Both of these sensor types are fabricated in silicon and rely on photo diodes that release electrons when light photons collide with the diode (Sarkar & Theuwissen, 2013; Botha, 2015).

Working Principles

CCD sensors have diodes and storage cells (or buffer bins) for each pixel. Each diode accumulates a charge (proportional to the light intensity) once the shutter opens. The charge is then shifted to neighbouring storage cells and subsequently converted to a digital value through the use of a single Analogue to Digital Converter (ADC). The digital values typically range from zero (no illumination) to 255 (full saturation). CMOS sensors, while similar to CCD in terms of the conversion process of incident photons into electron-hole pairs, convert the charges within the pixel itself, rather than during the readout phase. It is important to note that, while CMOS sensors are known to be: (a) cheaper and simpler in terms of manufacturing; (b) efficient in terms of power consumption; and (c) capable of operation at very high frame rates (at megapixel resolution), they are susceptible to fixed pattern noise due to their many amplifiers at each pixel, with more susceptibility to noise than CCD (Sarkar & Theuwissen, 2013; Botha, 2015).

Regardless of the sensor type (CCD or CMOS), there are additional parameters that contribute to the overall image quality (Botha, 2015):

Sensor sensitivity (also known as the ISO) is a measure of amplification used prior to the digital conversion. A higher ISO sensitivity requires less light to achieve the same effect as a lower ISO sensitivity.
Shutter exposure time is the time that a photo diode is exposed to light; a higher exposure time gives the photo diode a longer period to build a charge. Low shutter times in dark conditions will lead to under exposure of the camera sensor, compared to high shutter times which lead to overexposure. Furthermore, the type of shutter used (global or rolling) will also affect the ability to capture motion of an object, with rolling shutters prone to image distortion with faster moving objects (Sarkar & Theuwissen, 2013).
The lens aperture is the opening through which light travels to hit the light sensor. A larger opening allows more light photons to enter the camera. The aperture opening is specified by the F-number. The F-number is the ratio of lens focal length to effective aperture opening diameter. A lower F-number denotes a larger aperture opening. Increasing the F-number decreases the image exposure but increases the image field depth.
Lens focal length is the optical distance from the plane at which the light rays converge from the lens. This is related to the magnification of the lens. Focal length thus affects the image region.

Application to Proximity Detection

Camera sensors, similar to RADAR and LIDAR, can be stand-alone systems, with no strict requirement to communicate on infrastructure between the Local Object (LO) and the Remote Object(s) (RO). However, unlike many of the sensor modalities documented in this toolkit (i.e. near and far field applications using Electromagnetic (EM) Radio Frequency (RF) sensors), camera sensors do not inherently provide object pose or state (position, orientation, velocity and acceleration) estimates. This is a noted limitation of a camera sensor, with additional design requirements (i.e. purpose-built software or algorithms) before a Proximity Detection System (PDS) can be fully developed.

This being said, there are numerous methods towards the estimation of object pose and state using cameras, and this can range from: (a) monocular (single camera) implementations to determine range information (Taylor, Geva, & Boles, 2004); to (b) stereography, a technique using multiple cameras with an established baseline, for estimation of range (including direction) to objects (Botha, 2015). Furthermore, the classification and detection of objects can be performed by a number of different techniques, most notably through the use of feature detections and Deep Learning-based methods (Wu, Sahoo, & Hoi, 2020). Regardless of the technique(s) used, it is important to design a system that can robustly perceive the environment, with an understanding of any immediate collision threats, whether they be moving or stationary objects of interest, in a timely and efficient manner.

Advantages

Large Field-of-View (FOV) possible with the use of wide-angle lenses (i.e. Fisheye lens) or multi-camera systems
No infrastructure is required onto remote object(s) (visually distinct stickers may be used to enhance performance)
Suitable for both surface and underground applications
Low power requirements and low-cost relative to other sensors

Limitations

Camera-based systems are likely to be affected by harsh environmental conditions (e.g. light, dust, rain, fog)
Camera-based systems require maintenance to clear dirt and other debris from the lens
Detection, classification and tracking in some cases, can be computationally expensive
Detection limited to Line-of-Sight
Stereo-based solutions have either limited range (compared to LIDAR-based solutions) or reduced FOV