-
Notifications
You must be signed in to change notification settings - Fork 2
Computer Vision: Image Analysis
UA DATALAB WORKSHOP SERIES, FALL 2023
📓 These notes: https://github.com/ua-datalab/Workshops/wiki/Computer-Vision:-Image-Analysis
📈 Presentation pdf: https://github.com/ua-datalab/Workshops/blob/main/Computer_Vision_Image_Analysis/Computer_Vision_Image_Analysis.pdf
💻 Codebase examples: https://github.com/ua-datalab/Workshops/blob/main/Computer_Vision_Image_Analysis/computer_vision_image_analysis.ipynb
📺 Zoom recording link (11-21-2023)
In today's technological landscape, the implementation of Computer Vision (CV) stands as a transformative force, wielding a myriad of possibilities and advancements across diverse domains. Its applicability spans a wide spectrum, from bolstering medical diagnostics by swiftly analyzing MRI scans to fortifying autonomous vehicles with the vision to navigate complex terrains.
Computer Vision has already made significant strides, revolutionizing fields like retail through automated checkout systems and facial recognition for enhanced security measures. It holds the promise of simplifying daily life, automating tasks such as fruit recognition for sorting in agriculture or enhancing augmented reality experiences for immersive interactions.
Yet, this dynamic field bears its challenges, as interpreting and understanding visual data under varying conditions remains intricate, while also offering accessible tools and methods for those delving into its realm. From the ease of leveraging Python libraries like OpenCV, known for its robust image processing capabilities, to the deep learning prowess of TensorFlow and PyTorch for intricate neural network designs, aspiring graduates and computer engineering enthusiasts have an array of sophisticated tools at their disposal.
The flexibility and accessibility of these libraries render the entry into Computer Vision both engaging and intellectually stimulating, fostering a rich ground for exploration and innovation among budding professionals.
In the realm of Computer Vision, an array of challenges persists, transcending both classical and CNN approaches.
-
Variability and Invariance: Images vary significantly in terms of lighting, object poses, backgrounds, and other environmental factors. Example: Using images of apples taken in various lighting conditions (bright sunlight, dim indoor light, and shadowed areas).
-
Object Recognition and Classification: Accurately recognizing and categorizing objects within images, especially in cluttered scenes or for fine-grained recognition. Example: Presenting images of apples placed amidst a cluttered environment with other fruits or objects.
-
Semantic Understanding and Context: Understanding the context and semantics of images, including distinguishing objects and their relationships. Example: Showcasing images where apples are placed in diverse contexts: some in a fruit bowl, some on a tree, and some in a kitchen setting.
-
Data Annotation and Labeling: Annotating and labeling large datasets for training deep learning models is particularly challenging for CNNs. This includes the need for large volumes of accurately labeled data, which is essential for training CNNs. Example: Providing datasets of various apple varieties, sizes, and conditions (ripe, unripe, different colors).
-
Interpretability of AI Models: Creating AI models that can explain their decision-making processes is more pertinent to CNNs. Ensuring interpretability, especially in critical applications like healthcare and autonomous vehicles, is an essential challenge for deep learning models. Example: Showcasing how a CNN model identifies apples in images and discussing the challenge of explaining why the model identified specific regions or features as apples.
One of the most popular libraries of classical CV is OpenCV. OpenCV works alongside NumPy, used to quantify the image files and apply changes. Initially developed by Intel, OpenCV had first been given the open-source BSD License (versions <=4.4.0), then the Apache License (versions >=4.5.0) allowing the software to maintain an Open status and be open source.
OpenCV focuses on the manipulation and image processing prior to extracting information. As a library, OpenCV is extensively used today in well established pipelines of object detection, as it helps with preparing and modifying images for more modern methods of CV such as CNN. Among other functions, OpenCV can be used for image filtering, transformations (geometric, miscellaneous), motion analysis and object tracking, image segmentation, and feature and object detection.
In the example Jupyter Notebook, we use OpenCV to count and detect objects (i.e. apples) in an image. To achieve this, OpenCV is used in such a manner where the edges of an object are detected first and then the object is counted. This process involves the following steps:
-
Convert image to black and white. Since our goal is to count still objects in an image, the conversion to black and white helps with removing not needed features (colors).
-
Blurring the image using Gaussian Blur: this helps with the reduction of noise in the image
-
Finding the edges of the blurred objects using the Canny edge detector
-
Finding and counting the contours of the objects
CNNs are widely used in computer vision tasks like image classification, object detection, and segmentation. Revisiting CNN:
- CNNs are a class of deep neural networks designed for processing grid-like data, primarily used for image analysis.
- They consist of layers that automatically learn hierarchical features from input data, reducing the need for manual feature engineering.
- Convolutional layers use learnable filters to extract features from local regions of the input.
- Pooling layers downsample feature maps, preserving important information and reducing spatial dimensions.
- Fully connected layers at the end of the network make class predictions based on the learned features.
- They excel at capturing intricate patterns and are robust to variations like object positioning in images.
- Benefits include hierarchical feature learning, translation invariance, scalability, and state-of-the-art performance in visual tasks.
- CNNs have transformed how we approach image analysis, automating feature extraction and enabling end-to-end learning.
(image credits: Towards Data Science. Original image developed by MathWorks)
The convolutional layer is a fundamental component of CNNs designed to efficiently process grid-like data, such as images. It plays a crucial role in extracting meaningful features from input data through the application of convolution operations, which use filters (kernels) to scan across the input creating feature maps from the original image.
(image source: StackOverflow)
(image credits: Convolution, Wikipedia)
In the figure above, a 3x3 kernel is applied to the values of the image. This is called a convolutional operation and the resulting output is referred as a feature map.
An additional layer, Rectified Linear Unit (ReLU) replaces negative pixes with zeroes.
The Pooling Layer is often also called the downsampling layer, as it reduces the spatial size of the image. This helps with retaining important features and lowering the complexity of the image. Pooling can help with preventing overfitting by "summarizing a region" and overall computational efficiency by reducing the computational requirements.
The Flattening layer converts the 2D feature maps into a 1D vector. This transformation prepares the extracted features for input to the fully connected layers, which make global predictions based on the flattened features. Neurons in these layers are connected to all neurons from the previous layer.
(image credits: The Most Intuitive and Easiest Guide for Convolutional Neural Network, Towards Data Science)
-
OpenCV:
- OpenCV can also be used for video processing, the OpenCV website offers some tutorials in the Other tutorials section:
- A good step by step how-to on video analysis using OpenCV by Kardi Teknomo
- Video Data Processing with Python and OpenCV
-
CNN:
- What are Convolutional Neural Networks? An accessible and simple explananation of CNN
- How convolutional neural networks work, in depth A more advanced video explaining CNN
- But what is convolution? Popular science and math YouTuber 3Blue1Brown covering the topic of convolution
- Stanford's excellent Deep Learning for Computer Vision course
Paperswithcode offers an excellent section on real life applications of CV.
UArizona DataLab, Data Science Institute, University of Arizona, 2025.