Two dimensional polyacrylamide gel electrophoresis (2-D PAGE) is a powerful technique in proteomics aiming at protein separation and identification. The image processing of these gels has been playing a big rule in the success of protein identification, mainly because noise, dust particles, spot overlapping, fingerprints, and cracks can reduce the capacity of detection of true spots.
Approaches as thresholding and watershed algorithms have been used in spot detection, both have their pros and cons and it’s important to know which one should be used depending on the quality of the image. Two programs of those algorithm were implemented in python language. The most used libraries were skicit-image (for image processing in python), numpy to deal with arrays, scipy to easily located the centroids of each spot and some other modules. To be able to run the program it is necessary that both image and python script be placed in the same folder. The command used to run the thresholding algorithm is:
python sd_thresholding.py inimage.png outimage.png
Where the first argument is the name of the script, the second argument is the name of the 2DPAGE image to be processed and the last argument is the name of the output image. The type of image (gif, png, jpg…) it is up to de user and can be modified. Besides the image output, the program prints the number of spots/segments detected in the input image and the correspondent intensity of that spot in the original image. To run the watershed program is similar:
python sd_watershed.py inimage.png outimage.png
The implementation of the thresholding algorithm was not hard, especially with the help of the available libraries in python for image processing. The algorithm is dependent of defining the ‘right’ threshold, therefore it is necessary to be very careful with this decision because relaxing the threshold may increase the number of false positive. Another problem of threshold is that fail in the presence of artifacts and noise, since we are just looking at the intensity of the colors it gets harder to detect what is or not a segment. The implementation of watershed algorithm was bit a more complicated, at the beginning was hard to understand the method but until the end I was able to see that watershed was more sophisticated. To reduce the noise of the matrix M was easier than I expected with the help of skimage.morphology. Depending on the way that we pick the markers, watersheds may lead to the detection of false positives. Furthermore, smoothing the image didn’t help me to increase the quality of the detection.
These images are representing the three steps used in thresholding: gel image input, the detection of the spots by intensity and the identification of the centroids (colored with red)
These images are representing the three steps used in thresholding: gel image input, the detection of the spots by intensity and the identification of the centroids (colored with red)
Both algorithms are found in the github repository: https://github.com/izabelcavassim/Proteomics/