I spent a year during my masters researching methodologies for the adaptation of machine learning to the interpretation of seismic data. During this experience, I accumulated new knowledge in programming languages such as python, new knowledge in geophysics, in traditional machine learning and deep learning, and the Linux OS to name a few. I came into the research experience with little to no knowledge on the topics of research, what to expect, or how to use the tools needed; I left with a whole new skillset that can be applied in many applications. I’ve utilized python and MATLAB, for some research tasks. I’ve selected two project tasks from my research in which I utilized the programming languages below. A description of and link to the codes from one of the projects can be found below.
clipDL – Python
With autonomous roadway vehicles becoming more and more common in everyday lives, their ability to react to an event that is unexpected is critical to the lives of others. In order to train neural networks, however, large labeled datasets that contain positive and negative examples of such events are required supervised learning techniques such as R-CNNs. Thus, in an attempt to solve the problem of needing video footage of unexpected events in driving conditions such as those listed in the figure below in the four categories, I created a tool to download and clip YouTube videos containing dashcam footage of the events. This accompanied a CSV file containing links, duration, and category information for 238 clips with a total duration of 1 hour, 10 minutes, and 31 seconds that exemplified the aberrant events that occur while driving listed below. The categories in the general classification here reflect that something that may be actionable is happening or not, where the specific classification would give more insight to what’s happening in the event to be able to determine if the event requires action by the vehicle or not. This specific event could be the addition of the location of a car accident, for example, being in a lane that the vehicle needs to adjust its position from or if it can stay traveling in its lane.
As dashcam footage is readily available in YouTube compilation videos, I utilized them to create a dataset for deep learning. In order to get the videos to train deep learning frameworks with, I needed to download the videos and clip them due to the footage being in a compilation of interleaved videos. I utilized pythons packages FFMPEG, CSV, and youtube_dl to have code automatically read the list of videos to be downloaded, download the videos, and clip them to the specified times. I have included a link to the code on my GitHub here. Follow the readme for more information on the code and how to use it.
Three Stage Similarity Search – MATLAB
I created and filed a provisional patent for a search algorithm based on three main components: locating a maximum similarity index value, windowing, and an overlapping process (to minimize computations and maximize robustness of the search). This content-based retrieval system can search a full image for a small region that is most similar to the features present in the reference image and return its coordinates and similarity value. I utilized the curvelet transform to compute the similarity metric proposed by Alfarraj et al. between the search areas current window and the reference image due to its ability to effectively capture the curved features of post stack seismic data.
The system obtains high accuracy in recovering the location of a random 99×99 patch of data extracted from a larger seismic image (76.1%) with a small displacement (1.77 pixels row, 9.27 pixels column) from the true location, proving its robustness in finding the global minima in dissimilarity over a local minima in dissimilarity. It was found that the similarity metric must be capable of handling small shifts into null regions of no texture in order to perform as necessary to find global minima over local minima. This makes the algorithm very efficient for object tracking in 3D volumes, such as seismic or MRI. In the below image, the green box shows the location a reference image was extracted from within the volume 5 slices back. The red box shows the highest probably location and returned similarity value of that reference image, or object, in the current slice which is 5 slices away. It can be noticed that the texture is starting to shift slightly to the right.
For this algorithm, the MATLAB programming language was utilized for computation and visualization purposes. The code is designed to be computationally efficient in that it minimizes the amount of required similarity computations necessary to find the highest probable location of the reference image. The curvelab matlab software was utilized to compute the curvelet coefficients of the data in MATLAB as well as standard MATLAB packages.