Annotation formats for machine learning

Third-party service annotations

We saw from our last exercise, that the Microsoft Cognitive Services Computer Vision API can return annotations about the image. The annotation response format is unique to Microsoft.

[
  {
    "rectangle": {
      "x": 207,
      "y": 119,
      "w": 53,
      "h": 168
    },
    "object": "person",
    "confidence": 0.638
  },
  {
    "rectangle": {
      "x": 0,
      "y": 91,
      "w": 115,
      "h": 267
    },
    "object": "person",
    "confidence": 0.848
  },
  {
    "rectangle": {
      "x": 334,
      "y": 134,
      "w": 43,
      "h": 169
    },
    "object": "person",
    "confidence": 0.661
  }
]

As you work with other machine learning cloud providers you will find that many have their own custom responses that they use.

Training data annotation formats

Several other formats exist for providing annotations in training data.

Common Objects in Context (COCO) - A JSON format for providing annotations about the COCO image dataset.

   annotation{
    "id": int,
    "image_id": int,
    "category_id": int,
    "segmentation": RLE or [polygon],
    "area": float,
    "bbox": [x,y,width,height],
    "iscrowd": 0 or 1,
   }

PASCAL VOC format - Used by ImageNet to provide annotations about objects in images

<annotation>
  <folder>GeneratedData_Train</folder>
  <filename>000001.png</filename>
  <path>/my/path/GeneratedData_Train/000001.png</path>
  <source>
    <database>Unknown</database>
  </source>
  <size>
    <width>224</width>
    <height>224</height>
    <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>21</name>
    <pose>Frontal</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occluded>0</occluded>
    <bndbox>
      <xmin>82</xmin>
      <xmax>172</xmax>
      <ymin>88</ymin>
      <ymax>146</ymax>
    </bndbox>
  </object>
</annotation>

Open Images Dataset - "Open Images is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, and visual relationships"

ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsroupOf,IsDepiction,IsInside
0001eeaf4aed83f9,xclick,/m0cmf2,1,0.022673031,0.9642005,0.07103825,0.80054647,0,0,0,0,0
000595fe6fee6369,xclick,/m02xwb,1,0.45655376,0.6097202,0.20399113,0.50554323,0,0,1,0,0
00075905539074f2,xclick,/m04yx4,1,0.020477816,0.32935154,0.0956023,0.665392,0,0,0,1,0
000a1249af2bc5f0,xclick,/m/ 09j2d,1,0.56911767,0.99852943,0.0022172949,0.93569845,1,1,0,0,0

Question Compare and contrast these formats with Open Annotation / Web Annotation.

Question Could IIIF be used to help standardize this?

IIIF for Machine Learning

IIIF Introduction

IIIF Image API

IIIF Presentation API

Other IIIF APIs

Third-party service annotations

Training data annotation formats

Next up, lets look at some case studies of machine learning / IIIF projects