This project is about a unified method to load and save a json dataset into Libia.

That way you could just describe with a few options how to load a dataset and it would fit nicely into libia.Dataset’s self.images and self.columns DataFrames

TL;DR : seems much harder than intially thought with already existing json manipulation tools. To be able to use a set of parameters to load any json could require some very complex dictionary manipulation, which would either require a whole new xpath language (not great for sharing and obviously a very big project) or some lambda functions as paremter to change values programmatically which makes the whole parameter idea useless compared to actual code

Motivation : Every dataset stored in json files, regardless of format is a variation of COCO

The main idea is that a dataset must follow a basic structure, wich is linked data between frame information and annotation informations.

For every image, you have a list of linked annotations.

As such, you will either have the decoupled form (Similar to COCO) or the nested form (Similar to Caipy)

Decoupled form (COCO like) :

{
  images: [
  {id: 0, path: somefile.jpg, ...},
  {id: 1, path: somefile2.jpg, ...}
  ],
  annotations: [
  {id: 0, image_id: 0, ...},
  {id: 1, image_id: 0, ...},
  {id: 2, image_id: 1, ...}
  ]
}

Nested form (CaipyJson like) :

[
  {
    image: {
      id: 0
      ...
    },
    annotations: [
      {id: 0, ...},
      {id, 1, ...}
    ]
  },
  {
    image: {
      id: 1
      ...
    },
    annotations: [
      {id: 2, ...},
      {id, 3, ...}
    ]
  }
]

Question 1: Given that COCO format is the closest to Libia internal format, can we easily convert a nested dataset into a decoupled dataset ?

Yes, it’s possible thanks to panda’s json normalize function :

pandas.json_normalize — pandas 2.2.2 documentation

You can directly load the annotation from the the nested form with the path option:

annotations = pd.json_normalize(dataset, path="annotations", meta=[["image", "id"]])

This function is very powerful given that it also automatically flatten the dict structure, which is very useful for CAV5’s tags and attributes.

Question 2: Can we programmatically filter images or annotations that we don’t want ?

It’s possible with JSONPath, a proposed syntaxe by Stefan Goessner in 2007 JSONPath - XPath for JSON

It’s not a W3 consortium backed standard, but many implementation do exist, both in Python (jsonpath-ng) and C# (its main JSON lib is already compatible with it : Json.NET - Newtonsoft )