Artwork identification from wearable camera images for enhancing experience of museum audiences

Rui Zhang, Hubei University of Arts and Science, China, Yusuf Tas, Australian National University, Australia, Piotr Koniusz, Australian National University, Australia


Recommendation systems based on image recognition could prove a vital tool in enhancing the experience of museum audiences. However, for practical systems utilizing wearable cameras, a number of challenges exist which affect the quality of image recognition. In this pilot study, we focus on recognition of museum collections by using a wearable camera in three different museum spaces. We discuss the application of wearable cameras, and the practical and technical challenges in devising a robust system that can recognize artworks viewed by the visitors to create a detailed record of their visit. Specifically, to illustrate the impact of different kinds of museum spaces on image recognition, we collect three training datasets of museum exhibits containing variety of paintings, clocks, and sculptures. Subsequently, we equip selected visitors with wearable cameras to capture artworks viewed by them as they stroll along exhibitions. We use Convolutional Neural Networks (CNN) which are pre-trained on the ImageNet dataset and fine-tuned on each of the training sets for the purpose of artwork identification. In the testing stage, we use CNNs to identify artworks captured by the visitors with a wearable camera. We analyze the accuracy of their recognition and provide an insight into the applicability of such a system to further engage audiences with museum exhibitions.

Keywords: Wearable camera, image recognition, museum artworks, audiences, experience, CNN

1. Introduction

A vast number of approaches exist dedicated to engaging and educating audiences in museums, e.g. augmented reality, mobile guides, interactive collections and 3D displays, to name a few. Artworks in museums engage visitors with their past experiences and trigger effective response which constitutes a vital aspect of a positive museum experience (Alelis et al., 2015). The value of emotional experiences in museums has been linked to reinforced trust, increased chances of recurring visits, as well as gaining donations (Suchy, 2006).

However, the experience of visitors is often incomplete because of the limited space dedicated to museum exhibitions, and personal time constraints during the visit. Beer (1987) pointed out that museum visitors spend less than one minute with each artwork during a typical visit. To a large extent, the audience has a limited idea of artworks they want to view or topics they are excited to cover. Therefore, they visit museums based on personal recommendations, advertisements, or a rough idea of the topics a museum covers. Viewers often adopt a fast pace as they stroll along through exhibition space, giving an incomplete or repetitive experience. Moreover, museums and cultural sites often lack interactive or personalized entertainment gadgets, guideline systems, and other technology to customize visits efficiently (Baraldi et al., 2007).

It is undeniable that museum audiences have access to smart phones and virtual interactive technology. However, robust guide systems that help satisfy their expectations and enhance their emotional experience are still rare. Kuflik et al. (2007) proposed a system customizing user’s experience which employs statistical machine learning capable of inferring visitors’ interests, based on their answers to a pre-specified questionnaire. By analogy, in order to aid a museum curator’s work, wearable or security cameras could provide an input to autonomous software which in turns would perform an analysis of audiences’ preferences inside the museum. Such a system could count numbers of visitors, capture the time they spend with specific artworks, or even attempt to recognize their mood based on facial expressions, in order to isolate the most popular artworks, as well as consider visitor’s likes and dislikes. However, wearable devices have limited processing power, and memories which are based on so-called local feature descriptors (Dalens et al., 2014). Nonetheless, more robust end-to-end recognition systems such as Convolutional Neural Networks (CNN) have been shown to be particularly well suited for object category recognition (Krizhevsky et al., 2012). We therefore assess the suitability of CNNs for image recognition of museum artworks captured with wearable cameras. CNNs require a lot of computational resources at their training stage. However, they can perform real-time recognition on Android-based systems with a camera.

In this work, we use wearable cameras for capturing images of artworks captured “in-the-wild” by audiences as they stroll along three different museum spaces and interact with various artworks. We used the data we collected to study the ability of CNNs to identify specific artworks in images.

As artworks vary from paintings, to sculptures, to other unusual rigid and non-rigid shapes and texture forms, we illustrate the impact of different types of museum spaces on image recognition. Specifically, we first collect non-occluded images of art pieces in each exhibition space with a phone camera. Next, we use the database of images collected by the audiences as they stroll with wearable cameras for testing recognition accuracy.

In the training stage, we use CNN pre-trained on the ImageNet dataset (Russakovsky et al., 2015) and fine-tune such a pre-trained CNN on each of our datasets for the purpose of artwork identification. Due to the major technical challenges in image recognition such as non-planar sculptures, glare of protective cabinets, reflective properties of surfaces, background clutter, occlusions, rotations, scale changes, viewpoint changes, lighting variations, motion blur, and other limiting factors, this work is conducted as a pilot study to identify the impact of these phenomena on recognition. The results will provide a better understanding of whether a wearable camera-based system can be used to help audiences engage with museum exhibitions, and if they reliably identified artworks from wearable cameras that could be used as an input for a recommendation system.

2. Artwork identification with wearable cameras

Our work aims to identify artworks using wearable cameras in the context of the museum. Our hope is that museums might benefit from wearable technology in order to improve guidance and management of audiences. For this purpose, we choose three different types of museum spaces that pose varied challenges in terms of image capturing with wearable cameras.

Shenzhen Art Museum, located in Shenzhen, Guangdong Province, China, has a diverse collection of artworks such as traditional Chinese paintings, oil paintings, prints, sculptures, calligraphy, watercolors, caricatures, paper-cuttings, and photographic works. For this study, we capture the Chinese traditional paintings from this museum.

The Palace Museum, located in Beijing, China, is a home to the Clock and Watch Gallery as well as the Indian and Chinese Sculpture Exhibition (AD400-700). The collections in the Clock and Watch Gallery consist of more than two hundred clocks from the 18th century. The sculptures of the Indian and Chinese Sculpture Exhibition mainly include Buddhist statues from India and China from AD400 to AD700.

2.1 Data collection

In order to train a recognition algorithm, we needed to collect a dataset of “objects to identify.” For this purpose, we used an ordinary Android phone. To account for viewpoint and scale changes, we captured between two and six photos of each artwork viewed from different viewpoints and distances. For testing purposes, we equiped six volunteers with a wearable camera and asked them to walk the exhibition space and interact with artworks. Afterwards, we annotated these images with labels assigned to the artworks that can be seen in each image. The wearable camera is configured to capture a picture every 10 seconds.

Shenzhen Paintings consists of 79 distinct paintings that were displayed in the museum during the capturing process, each photographed several times, resulting in the total of 369 images. Figure 1 illustrates that these paintings were captured under several viewpoints. We also included a background category representing museum surroundings, which consists of 27 images, and a spurious category of 170 miscellaneous paintings that were not on display. The latter subset helps to refine the classifier which has to distinguish between the 79 specific instances of paintings, other possible artworks, and the background. This resulted in 566 training images. For the testing set, we equipped six volunteers with the wearable camera and collected six different splits, as detailed below.

Figure 1: examples of Shenzhen paintings. For the training set, we captured paintings from various viewpoints

Split 1 contains 86 images from the wearable camera, which was mounted at the right-hand side pocket at upper chest height. Split 2 contains 93 images from the camera mounted on the right-hand side of a jacket zipped up to chest height. Split 3 contains 54 images from the partially rotated camera mounted on the left-hand side belt of a backpack at the mid-chest height. Split 4 includes 86 close-up images from the camera mounted on the collar. Splits 5 and 6 contain 91 and 105 images from the camera mounted on a handbag strap at chest height and left-hand side bottom, respectively.

Figure 2: Shenzhen paintings. Top and middle rows show that the geometric transformations resulting from capturing the test set by the wearable camera are large. They include perspective changes, zoom, rotations and cropping. The bottom row also shows an occlusion by person, glare, motion blur and an occlusion by hand

Figure 2 illustrates images captured by the wearable camera and resulting transformations which make recognition a challenge. In total, the testing set resulted in 515 images of paintings. We annotated each image with ground truth labels that indicate the paintings which are visible in these images (ordered from the most visible artwork to the least visible one). During the training stage, we chose one of the splits for testing and the remaining five splits for validation. Therefore, to obtain accuracy on all six splits, we had to repeat the training six times. To enhance our study by recognizing artworks other than paintings, which are planar, we collected the following datasets:

The Clocks dataset consists of 113 distinct clocks, each photographed several times, resulting in 394 images. Additionally, we captured 27 images of backgrounds not containing any clocks. For validation, we captured a separate set with the Android camera, which contains 259 images. Lastly, for testing, we devised two splits captured by two volunteers consisting of 182 and 141 images. They were captured with a camera mounted on the pocket (the top of chest) and on the handbag belt (mid-chest) with straight and rotated orientations, respectively. Overall, this resulted in 653 training and 323 testing images. Examples of clocks from training and testing sets are shown in figure 3 (top) and figure 4 (top).

The Sculptures dataset consists of 44 distinct sculptures, each photographed several times, resulting in 206 images. An additional two categories were created which consist of photos of sculpture descriptions which may contain only tiny fragments of sculptures and 27 images of background. Two testing splits were captured by volunteers and resulted in 80 and 50 images, respectively. The cameras were mounted on the handbag belt (mid-chest) with clockwise and counterclockwise orientations, respectively. When testing on the first split, the second one is used for validation, and vice versa. Overall, training and testing sets resulted in 233 and 130 images, respectively. While this is the smallest testing set, it is also the most challenging, due to large nonplanar sculptures on display in several locations, which include other sculptures in the background. Examples of sculptures from training and testing sets are shown in figure 3 (bottom) and figure 4 (bottom).

Figure 3: examples of pieces from the Clock and Sculptures training sets are given in the top and bottom row, respectively. Note the non-planarity of these pieces as well as glare from the protective glass
Figure 4: examples from the Clock and Sculptures testing sets are given in the top and bottom row, respectively. Glares, viewpoint changes, rotations, background clutter, occlusions, and salt-and-pepper noise (Gonzalez & Woods, 2006) occur in large quantities

2.2 Image recognition

For the purposes of artwork identification, we employ one of the latest CNN architectures known as VGG16 (Simonyan & Zisserman, 2014) which consists of 13 so-called convolutional layers and three fully connected layers which results in an extremely large number of network parameters that need to be inferred in the training stage. Therefore, we pre-train it with the ImageNet dataset containing over 14 million images and 1000 object categories. Subsequently, we utilize the training data we collected to perform image augmentations (Krizhevsky et al., 2012) and we fine-tune the VGG16 network on these images. Details and discussions on fine-tuning can be found in numerous literature (Chu et al., 2016). The hyper-parameters are selected in the cross-validation process by used validation sets, as described in section 2.1. Lastly, in the testing stage, we applied the trained network to our test sets in a feed-forward manner, quantifying whether identification agrees with the ground truth. This outcome is indicative of whether CNN can reliably recognize what visitors see in wearable camera images. Figure 5 illustrates the pipeline used in our experiments:

Figure 5: training of CNN. In figure 5a, the network is first pre-trained on augmented images from the ImageNet dataset. Then, the augmented training set is used for so-called fine-tuning to adapt the network to recognize the training set. Figure 5b shows the testing stage.

Data augmentation. A standard technique to train CNN representations, which are somewhat invariant to partial image translation, rotation, scale and viewpoint changes, is to augment the training dataset with multiple crops of images (e.g. left, right, top, bottom, center crops), mirroring images by left-right flips, arbitrary rotations, and contrast changes. We apply this technique to each training set to replicate expected variations between training and testing splits resulting from the capturing process.

3. The impact of different types of museum spaces on data capturing and image recognition

In the museum context, design decisions such as interior light, visitor circulation quality, audiences’ time limitations, layout of showcases, size of artwork and type of artwork have been shown to affect the number of pieces that visitors will encounter. Without a doubt, these factors will also impact the quality of images captured by wearable cameras. For instance, good quality uniform illumination will be positively correlated with the acquisition of crisp images. However, artworks with scarce lighting will result in images that show signs of the sensor noise, e.g. the salt-and-pepper noise known in digital photography (Gonzalez & Woods, 2006).

Figure 6: the layouts of Shenzhen Art Museum and the Palace Museum are given in Figure 6a and 6b, respectively

3.1 Paintings

Paintings in Shenzhen Art Museum are displayed in an ordered manner within the given exhibition space. The arrangement of paintings in the exhibition halls influences to what degree audiences interact with these paintings, and it imposes some natural order in which artworks are viewed and captured by our wearable cameras.

Specifically, paintings that are located close to each other on the wall are usually captured in one shot. Therefore, pictures from wearable cameras often contain more than one paining per image. While it may be hard to determine which paintings in an image are of direct interest to visitors, some paintings are captured more than once, thus increasing a chance of successful identification. Moreover, the number of captures of the same painting potentially correlates with the time and interest dedicated to such a painting. Additionally, some paintings are captured partially, e.g. they are truncated, making recognition even harder. Other practical issues include shadows cast on artworks due to lighting and proximity of the painting to the viewer, as well as motion blur. The layout of the museum imposes some partial order in which paintings are displayed and captured by wearable cameras. According to the path delineated in figure 6a, we see that the visitor’s route is clear and easy to follow, so that audiences are not likely to miss many artworks. Volunteers who walked around the exhibition wearing cameras tended to stop next to various paintings for various durations of time. Moreover, they could easily avoid revisiting the same artworks, unless they desired to approach some of them again. Note that audiences in this museum are not allowed to touch any artworks, however, they can look at any of the paintings close-up, which may result in a partial capture, i.e. zooming in at a fragment of a painting. The spacious exhibition hall provided good conditions for our volunteers to capture images in a varied manner; some viewers preferred to approach artworks, others just strolled along at a steady pace. Therefore, we were able to collect six testing sets, as detailed in section 2.1.

3.2 Clocks

The clocks in the Palace Museum are displayed in cases, under the necessary preservation conditions. Because they are located behind glass, the clocks cannot usually be interacted with through touch or seen from extremely close up; and because it is difficult to get clear shots of artworks due to the low-light interior environment and reflections from the glass surfaces, the photos may be blurry due to overexposure. Moreover, we also noted that it was hard to take shots from acute viewpoints. This complication was due to clocks being located close to each other; taking photos of the rear side of these clocks was often impossible, as only their frontal parts were clearly exposed to the visitors. Of course, this specific constraint on viewpoints seems to have a positive effect in the sense that it limits the number of views an object can be seen from, while the front view remains very distinct. However, three clocks are displayed without any glass case due to their large size, and can be seen by viewers from all sides. These artworks are still protected from audiences by handrails. In this case, the artworks stretch beyond the field of view of the wearable camera, making it difficult to capture good images of entire objects. This is especially undesirable, because if only partial views are being captured, the representations of these artworks are much less distinct.

The Palace Museum is a very popular attraction, with large numbers of tourists visiting everyday. This crowded space resulted in some photos of timepieces that were partially occluded by visitors. Therefore, adverse conditions described above differ from the case outlined in section 3.1 and should affect, to some degree, identification of the artworks. Lastly, the red dotted line in figure 6b illustrates visitor circulation in the Clock and Watch Gallery.

3.3 Sculptures

In the Indian and Chinese Sculpture Exhibition Hall, many sculptures are located in the middle of the exhibition space; therefore they are set against a background cluttered by other sculptures. This makes both the annotation of ground truth data and its identification a challenging process, because numerous artworks are often captured at once. The way audiences move in this museum space has a more complex pattern compared to the case study in section 3.2.; volunteers often exhibited counterclockwise movement around the perimeter if they turned right at the entrance, and clockwise movement otherwise. In the hall area with artworks located on both sides, volunteers often followed a zigzag path between these artworks. Moreover, volunteers often circled smaller patches emerging between artworks. Therefore, in this dataset, one cannot expect a clear order in which artworks were captured, or a clear correlation between frequently viewed sculptures and audiences’ preference. Another adverse factor included large-sized sculptures which did not fit well into wearable camera’s field of vision, occlusions, and poor lighting. In our opinion, such factors make this exhibition space the most challenging for the purpose of capturing images with wearable cameras.

3.4 Other exhibition spaces

During the capturing process in the three different art museum spaces, we observed that viewers are actively enjoying the parts of exhibitions they are interested in, while ignoring the others. Using this process, artworks can be identified using scientific tools that give museums the opportunity to re-think the way they communicate, i.e. beyond offering the standard guided tours and fixed exhibitions (Balsamo, 2012). However, each museum space poses unique challenges for artwork identification. For instance, science museum exhibitions often include items which are large and may look very similar to the untrained eye, such as engines, pumps, radio-communication equipment, etc. These items may be bulky, highly non-planar, and not clearly localized; they may emit light, change appearance during interaction, and so forth. Other artworks such as crafts are also likely to be highly non-planar, e.g. miniature replicas of houses, famous buildings, and monuments. Exhibitions with non-rigid objects such as carpet, Gobelin tapestry, and clothing are a further example of artworks of varied nature in the exhibition spaces. Modern art may include objects that lack texture, making them harder to recognize, while porcelain and glass work are likely to be the source of glares. Hieroglyphs, ancient books, and even jewelry may all look similar to a non-expert eye. Exhibits in natural history museums such as birds, insects, butterflies, rodents, etc. may pose similar challenges. These last two are examples of so-called fine grained image recognition (Wah et al., 2011) which requires an algorithm match and expert knowledge about what makes these exhibits differ between many similar items. However, we leave these challenges for future work.

4. Experiments

To conclude our work, below we present experimental findings from our study. We separately fine-tuned three VGG16 networks for the paintings, clocks and sculptures, respectively. To achieve this, we followed the augmentation and cross-validation process as detailed in section 2.1. Below, we report results in terms of mean accuracy, which quantifies how many test images on average were assigned labels agreeing with our ground truth annotations. Note that some images annotated by us contained more than one museum item. We assigned ground truth labels to these images in descending order; that is, the central artwork was assigned its ground truth label first while less visible peripheral pieces were assigned their ground truth labels next.

Figure 7: the Paintings dataset. Figure 7a illustrates accuracy in percents (the higher the better) for each of six testing splits–each collected by a different volunteer. Figure 7b shows the average over the six splits as well as the standard deviation

Figure 7a illustrates performance obtained on the Paintings dataset for each of six testing splits from six volunteers. We count prediction as a valid piece of identification if the predicted label is within top-k ground truth labels (k being a number along axis) assigned by us in the data annotation process. As demonstrated, most of the predictions point to the central pieces in images from wearable cameras; therefore, accuracy improves only marginally as the top-k value increases in the plot. For instance, split sp4 shows no variation w.r.t. the top-k value. However, splits sp3 and sp5 show close to 4% variation. This can be explained by the fact that volunteers who collected the data for these two splits tended to stroll along the exhibition space away from paintings. Therefore, many images collected this way contained several paintings. Figure 7b shows the average performance over the six splits. As demonstrated, due to differences in how volunteers explored the museum space and mounted wearable cameras on their clothing, the standard deviation between results varies by up to +/-6.7%. The best performing split, sp3, scored 51.8% accuracy while the worst performing scored only 33.3% accuracy. This highlights the difficulty in attaining equally good recognition rates for the data from every visitor. The average accuracy for top-1 labels obtained in this experiment is 42.6%, which means that exactly such a portion of all images from wearable cameras were recognized correctly.

Figure 8: the Clocks and Sculpture datasets are evaluated in Figures 8a and 8b, respectively. The mean accuracy in percents is indicated by the bar plots

Figure 8a shows performance on the Clock dataset for both testing splits. As demonstrated, recognition rates differ by 6.1% between these two testing sets. We suspect this highlights a big difference in how the two volunteers explored this museum space. Another explanation is that recognition is affected by the way visitors mounted wearable cameras. However, we also note that additional ground truth labels (when multiple clocks were visible in an image) turned out to be not needed as the accuracy for larger top-k values (e.g. top-2, . . . ,top-10) increases by up to 3.3%. We suspect that because clocks were located behind protective glass, visitors approached each artwork and explored it up-close. Therefore, wearable cameras were able to obtain clear, well centered pictures of most of the timepieces. The average accuracy for top-1 labels obtained in this experiment is 40.9%, which is slightly below the average accuracy of the Paintings dataset. We note that this dataset constitutes a contrast with the Paintings dataset. We expected that recognition of non-planar artworks behind the protective glass in a darker and more crowded environment would be a harder task; however, the need to approach these pieces helped the cameras capture their clear close-up pictures.

Figure 8b shows the performance on the Sculptures dataset for both testing splits. Firstly, we note that in some cases the difference in accuracy for top-1 vs. top- 10 measure differs by up to 4%. We expect this is due to other sculptures present in the background. The CNN network was very likely unable to distinguish between the central object and other surrounding items. Moreover, we also expect some noise in our ground truth annotations, as sometimes it was not clear which object in an image was the central object approached by the volunteer. Lastly, we note that the average accuracy for top-1 labels obtained in this experiment is only 30.7% which is a drop of over 10% compared to results on the Paintings and Clocks datasets. This highlights a challenge of identifying non-planar artworks in cluttered exhibition spaces.

Because we are interested in identifying artworks that the volunteers interacted with, for each dataset, we asked one of the volunteers to approach all artworks in a given museum space. For paintings, clocks and sculptures, we were able to recognize 36, 54, and 15 distinct paintings, clocks and sculptures out of 79, 113, and 44 distinct art pieces in each exhibition space. This means that the fine-tuned CNN was able to recognize 45.6%, 47.8% and 34.1% of all distinct artworks.

5. Conclusions

This work addresses the challenging problem of artwork identification in museum spaces. We have shown that, with state-of-the-art computer vision CNN algorithms, we are able to reliably identify up to half of the artworks that audiences interact with in various museum spaces. We found that our discussion of the challenges posed by the various types of exhibition spaces (and specific artworks) to the capturing and recognition process are indeed reflected by the quantitative results we obtained. It appears that for now, identification of paintings is perhaps the simplest task due, to their planarity. However, non-planar items such as clocks and sculptures pose a somewhat bigger challenge. Above all, this pilot study reveals that the off-the-shelf fine-tuning so popular in computer vision is perhaps still insufficient, and requires a more customized recognition algorithm. Suitable modifications may include a variation of CNN (Mairal et al., 2014) and so-called bag-of-words or domain adaptation approaches (Koniusz & Cherian, 2016; Koniusz & Mikolajczyk; Koniusz et al., 2012, 2016). With just below half of the artworks identified correctly, it may be sufficient to combine the artwork identification module with a recommendation system, though the need for further improvement is clear. In the future, we plan to extend the current dataset to contain pictures from more kinds of exhibition spaces, as well as investigate new classification algorithms.


Alelis, G., A. Bobrowicz, & C.S. Ang. (2015). “Comparison of engagement and emotional responses of older and younger adults interacting with 3d cultural heritage artefacts on personal devices.” Behaviour and Information Technology 34(11), 1064.

Balsamo, A. (2011). Designing culture: The technological imagination at work. Durham, NC: Duke University Press, 63:1899.

Baraldi, L., F. Paci, G. Serra, L Benini, & R. Cucchiara. (2007). “Gesture recognition using wearable vision sensors to enhance visitors’ museum experiences.” Sensors Journal 15(5), 2705.

Beer, V. (1987). “Great expectations: Do museums know what visitors are doing?” Curator 30(3), 206-15.

C. Gonzalez R., & R. E. Woods. (2006). Digital Image Processing (3rd Edition). Prentice-Hall, Inc., Upper Saddle River, NJ, USA. ISBN 013168728X.

Chu, B., V. Madhavan, O. Beijbom, J. Hoffman, & T. Darrell. (2016). “Best practices for fine-tuning visual classifiers to new domains.” In Proceeding of the European Conference on Computer Vision 2016 (ECCV 2016), Part III,  435–42.

Dalens, T., J. Sivic, I. Laptev, & M Campedel. (2014). “Painting recognition from wearable cameras.” Available

Koniusz, P. & A. Cherian. (2016). “Sparse Coding for Third-order Super-symmetric Tensor Descriptors with Application to Texture Recognition.” Computer Vision and Pattern Recognition (CVPR), 5395.

Koniusz, P., F. Yan, P.-H. Gosselin, & K. Mikolajczyk. (2016). “Higher-order Occurrence Pooling for Bags-of-Words: Visual Concept Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 39(2), 313.

Koniusz, P., F. Yan, & K. Mikolajczyk. (2012). “Comparison of Mid-Level Feature Coding Approaches And Pooling Strategies in Visual Concept Detection.” Computer Vision and Image Understanding 117(5), 479.

Koniusz, P., & K. Mikolajczyk. (2010). “On a Quest for Image Descriptors Based on Unsupervised Segmentation Maps.” International Conference on Pattern Recognition (ICPR), 762.

Koniusz, P., Y. Tas, & F. Porikli. (2016). “Domain Adaptation by Mixture of Alignments of Second- or Higher-Order Scatter Tensors.” CoRR, abs/1611.08195.

Krizhevsky, A., I. Sutskever, & G. E. Hinton. (2012). “ImageNet classification with deep convolutional neural networks.” Advances in Neural Information Processing Systems (NIPS), 1106–1114.

Kuflik, T., J. Sheidin, S. Jbara, D. Goren-Bar, P. Soffer,  O. Stock, & M. Zancanaro. (2007). “Supporting small groups in the museum by context-aware communication services.” Proceedings of the 2007 International Conference on Intelligent User Interfaces,  305-8. Available

Mairal, J., P. Koniusz, Z. Harchaoui, & C. Schmid. (2014). “Convolutional Kernel Net-works.” Advances in Neural Information Processing Systems (NIPS). HAL-01005489v2. Available

Russakovsky, O., J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, & L. Fei-Fei. (2015). “ImageNet large scale visual recognition challenge.” International Journal of Computer Vision (IJCV) 115(3), 211–52. DOI: 10.1007/s11263-015-0816-y.

Simonyan, K., & A. Zisserman. (2014). “Very deep convolutional networks for large-scale image recognition.” CoRR, abs/1409.1556.

Suchy, S. (2006). “Museum management: Emotional value and community engagement.” INTERCOM, 354-62.

Wah, C., S. Branson, P. Welinder, P. Perona, & S. Belongie. (2011). “Caltech-UCSD Birds-200-2011 dataset.” CVPR Workshop on Fine-Grained Visual Categorization.

Zhang, R., & A. Russo. (2015). “Towards comparative methods for evaluating cross-cultural digital creativity in museum exhibitions.” MWA2015: Museums and the Web Asia 2015. Published August 9, 2015. Available

Cite as:
Zhang, Rui, Yusuf Tas and Piotr Koniusz. "Artwork identification from wearable camera images for enhancing experience of museum audiences." MW17: MW 2017. Published February 1, 2017. Consulted .

2 thoughts on “Artwork identification from wearable camera images for enhancing experience of museum audiences”

  1. Dear editor, I have re-edit the paper due to your suggestions. There were some data errors of the previous edition, so we have to correct. And can you figure out what citations are missing? I think we have completed all citations.

Leave a Reply