(053.80, 723.51) (246.24, 723.51) (246.24, 731.44) (053.80, 731.44)      /F219 KDD '22, August 14-18, 2022, Washington, DC, USA	<|special_separator|>
(253.14, 723.51) (558.20, 723.51) (558.20, 731.44) (253.14, 731.44)      /F219 Birgit Pfitzmann, Christoph Auer, Michele Dolfi, Ahmed S. Nassar, and Peter Staar	<|special_separator|>
(053.50, 698.82) (558.49, 698.82) (558.49, 706.27) (053.50, 706.27)      /F138 Table 1: DocLayNet dataset overview. Along with the frequency of each class label, we present the relative occurrence (as %	<|special_separator|>
(053.80, 687.86) (558.20, 687.86) (558.20, 695.32) (053.80, 695.32)      /F138 of row 'Total') in the train, test and validation sets. The inter-annotator agreement is computed as the mAP@0.5-0.95 metric	<|special_separator|>
(053.80, 676.90) (469.85, 676.90) (469.85, 684.36) (053.80, 684.36)      /F138 between pairwise annotations from the triple-annotated pages, from which we obtain accuracy ranges.	<|special_separator|>
(233.94, 643.54) (270.04, 643.54) (270.04, 651.35) (233.94, 651.35)      /F134 % of Total	<|special_separator|>
(329.05, 643.54) (407.37, 643.54) (407.37, 651.35) (329.05, 651.35)      /F134 triple inter-annotator	<|special_separator|>
(411.60, 643.54) (483.40, 643.54) (483.40, 651.35) (411.60, 651.35)      /F134 mAP @ 0.5-0.95 (%)	<|special_separator|>
(104.83, 632.59) (141.71, 632.59) (141.71, 640.39) (104.83, 640.39)      /F134 class label	<|special_separator|>
(175.95, 632.59) (198.71, 632.59) (198.71, 640.39) (175.95, 640.39)      /F134 Count	<|special_separator|>
(213.79, 632.59) (233.69, 632.59) (233.69, 640.39) (213.79, 640.39)      /F134 Train	<|special_separator|>
(249.37, 632.59) (264.50, 632.59) (264.50, 640.39) (249.37, 640.39)      /F134 Test	<|special_separator|>
(283.54, 632.59) (295.31, 632.59) (295.31, 640.39) (283.54, 640.39)      /F134 Val	<|special_separator|>
(314.01, 632.59) (324.98, 632.59) (324.98, 640.39) (314.01, 640.39)      /F134 All	<|special_separator|>
(343.01, 632.59) (354.65, 632.59) (354.65, 640.39) (343.01, 640.39)      /F134 Fin	<|special_separator|>
(367.84, 632.59) (384.32, 632.59) (384.32, 640.39) (367.84, 640.39)      /F134 Man	<|special_separator|>
(407.54, 632.59) (418.16, 632.59) (418.16, 640.39) (407.54, 640.39)      /F134 Sci	<|special_separator|>
(432.30, 632.59) (447.83, 632.59) (447.83, 640.39) (432.30, 640.39)      /F134 Law	<|special_separator|>
(465.73, 632.59) (477.51, 632.59) (477.51, 640.39) (465.73, 640.39)      /F134 Pat	<|special_separator|>
(493.52, 632.59) (507.18, 632.59) (507.18, 640.39) (493.52, 640.39)      /F134 Ten	<|special_separator|>
(104.83, 621.23) (134.01, 621.23) (134.01, 629.03) (104.83, 629.03)      /F134 Caption	<|special_separator|>
(177.87, 621.23) (198.71, 621.23) (198.71, 629.03) (177.87, 629.03)      /F134 22524	<|special_separator|>
(219.21, 621.23) (233.69, 621.23) (233.69, 629.03) (219.21, 629.03)      /F134 2.04	<|special_separator|>
(250.02, 621.23) (264.50, 621.23) (264.50, 629.03) (250.02, 629.03)      /F134 1.77	<|special_separator|>
(280.83, 621.23) (295.31, 621.23) (295.31, 629.03) (280.83, 629.03)      /F134 2.32	<|special_separator|>
(305.27, 621.23) (324.98, 621.23) (324.98, 629.03) (305.27, 629.03)      /F134 84-89	<|special_separator|>
(334.94, 621.23) (354.65, 621.23) (354.65, 629.03) (334.94, 629.03)      /F134 40-61	<|special_separator|>
(364.61, 621.23) (384.32, 621.23) (384.32, 629.03) (364.61, 629.03)      /F134 86-92	<|special_separator|>
(398.45, 621.23) (418.16, 621.23) (418.16, 629.03) (398.45, 629.03)      /F134 94-99	<|special_separator|>
(428.12, 621.23) (447.83, 621.23) (447.83, 629.03) (428.12, 629.03)      /F134 95-99	<|special_separator|>
(457.80, 621.23) (477.51, 621.23) (477.51, 629.03) (457.80, 629.03)      /F134 69-78	<|special_separator|>
(495.32, 621.23) (507.18, 621.23) (507.18, 629.03) (495.32, 629.03)      /F134 n/a	<|special_separator|>
(104.83, 610.27) (137.33, 610.27) (137.33, 618.07) (104.83, 618.07)      /F134 Footnote	<|special_separator|>
(182.03, 610.27) (198.71, 610.27) (198.71, 618.07) (182.03, 618.07)      /F134 6318	<|special_separator|>
(219.21, 610.27) (233.69, 610.27) (233.69, 618.07) (219.21, 618.07)      /F134 0.60	<|special_separator|>
(250.02, 610.27) (264.50, 610.27) (264.50, 618.07) (250.02, 618.07)      /F134 0.31	<|special_separator|>
(280.83, 610.27) (295.31, 610.27) (295.31, 618.07) (280.83, 618.07)      /F134 0.58	<|special_separator|>
(305.27, 610.27) (324.98, 610.27) (324.98, 618.07) (305.27, 618.07)      /F134 83-91	<|special_separator|>
(342.80, 610.27) (354.65, 610.27) (354.65, 618.07) (342.80, 618.07)      /F134 n/a	<|special_separator|>
(371.81, 610.27) (384.32, 610.27) (384.32, 618.07) (371.81, 618.07)      /F134 100	<|special_separator|>
(398.45, 610.27) (418.16, 610.27) (418.16, 618.07) (398.45, 618.07)      /F134 62-88	<|special_separator|>
(428.12, 610.27) (447.83, 610.27) (447.83, 618.07) (428.12, 618.07)      /F134 85-94	<|special_separator|>
(465.65, 610.27) (477.51, 610.27) (477.51, 618.07) (465.65, 618.07)      /F134 n/a	<|special_separator|>
(487.47, 610.27) (507.18, 610.27) (507.18, 618.07) (487.47, 618.07)      /F134 82-97	<|special_separator|>
(104.83, 599.31) (135.34, 599.31) (135.34, 607.11) (104.83, 607.11)      /F134 Formula	<|special_separator|>
(177.87, 599.31) (198.71, 599.31) (198.71, 607.11) (177.87, 607.11)      /F134 25027	<|special_separator|>
(219.21, 599.31) (233.69, 599.31) (233.69, 607.11) (219.21, 607.11)      /F134 2.25	<|special_separator|>
(250.02, 599.31) (264.50, 599.31) (264.50, 607.11) (250.02, 607.11)      /F134 1.90	<|special_separator|>
(280.83, 599.31) (295.31, 599.31) (295.31, 607.11) (280.83, 607.11)      /F134 2.96	<|special_separator|>
(305.27, 599.31) (324.98, 599.31) (324.98, 607.11) (305.27, 607.11)      /F134 83-85	<|special_separator|>
(342.80, 599.31) (354.65, 599.31) (354.65, 607.11) (342.80, 607.11)      /F134 n/a	<|special_separator|>
(372.47, 599.31) (384.32, 599.31) (384.32, 607.11) (372.47, 607.11)      /F134 n/a	<|special_separator|>
(398.45, 599.31) (418.16, 599.31) (418.16, 607.11) (398.45, 607.11)      /F134 84-87	<|special_separator|>
(428.12, 599.31) (447.83, 599.31) (447.83, 607.11) (428.12, 607.11)      /F134 86-96	<|special_separator|>
(465.65, 599.31) (477.51, 599.31) (477.51, 607.11) (465.65, 607.11)      /F134 n/a	<|special_separator|>
(495.32, 599.31) (507.18, 599.31) (507.18, 607.11) (495.32, 607.11)      /F134 n/a	<|special_separator|>
(104.83, 588.35) (137.71, 588.35) (137.71, 596.15) (104.83, 596.15)      /F134 List-item	<|special_separator|>
(173.70, 588.35) (198.71, 588.35) (198.71, 596.15) (173.70, 596.15)      /F134 185660	<|special_separator|>
(215.04, 588.35) (233.69, 588.35) (233.69, 596.15) (215.04, 596.15)      /F134 17.19	<|special_separator|>
(245.85, 588.35) (264.50, 588.35) (264.50, 596.15) (245.85, 596.15)      /F134 13.34	<|special_separator|>
(276.66, 588.35) (295.31, 588.35) (295.31, 596.15) (276.66, 596.15)      /F134 15.82	<|special_separator|>
(305.27, 588.35) (324.98, 588.35) (324.98, 596.15) (305.27, 596.15)      /F134 87-88	<|special_separator|>
(334.94, 588.35) (354.65, 588.35) (354.65, 596.15) (334.94, 596.15)      /F134 74-83	<|special_separator|>
(364.61, 588.35) (384.32, 588.35) (384.32, 596.15) (364.61, 596.15)      /F134 90-92	<|special_separator|>
(398.45, 588.35) (418.16, 588.35) (418.16, 596.15) (398.45, 596.15)      /F134 97-97	<|special_separator|>
(428.12, 588.35) (447.83, 588.35) (447.83, 596.15) (428.12, 596.15)      /F134 81-85	<|special_separator|>
(457.80, 588.35) (477.51, 588.35) (477.51, 596.15) (457.80, 596.15)      /F134 75-88	<|special_separator|>
(487.47, 588.35) (507.18, 588.35) (507.18, 596.15) (487.47, 596.15)      /F134 93-95	<|special_separator|>
(104.83, 577.39) (147.35, 577.39) (147.35, 585.19) (104.83, 585.19)      /F134 Page-footer	<|special_separator|>
(177.87, 577.39) (198.71, 577.39) (198.71, 585.19) (177.87, 585.19)      /F134 70878	<|special_separator|>
(219.21, 577.39) (233.69, 577.39) (233.69, 585.19) (219.21, 585.19)      /F134 6.51	<|special_separator|>
(250.02, 577.39) (264.50, 577.39) (264.50, 585.19) (250.02, 585.19)      /F134 5.58	<|special_separator|>
(280.83, 577.39) (295.31, 577.39) (295.31, 585.19) (280.83, 585.19)      /F134 6.00	<|special_separator|>
(305.27, 577.39) (324.98, 577.39) (324.98, 585.19) (305.27, 585.19)      /F134 93-94	<|special_separator|>
(334.94, 577.39) (354.65, 577.39) (354.65, 585.19) (334.94, 585.19)      /F134 88-90	<|special_separator|>
(364.61, 577.39) (384.32, 577.39) (384.32, 585.19) (364.61, 585.19)      /F134 95-96	<|special_separator|>
(405.65, 577.39) (418.16, 577.39) (418.16, 585.19) (405.65, 585.19)      /F134 100	<|special_separator|>
(428.12, 577.39) (447.83, 577.39) (447.83, 585.19) (428.12, 585.19)      /F134 92-97	<|special_separator|>
(465.00, 577.39) (477.51, 577.39) (477.51, 585.19) (465.00, 585.19)      /F134 100	<|special_separator|>
(487.47, 577.39) (507.18, 577.39) (507.18, 585.19) (487.47, 585.19)      /F134 96-98	<|special_separator|>
(104.83, 566.43) (150.10, 566.43) (150.10, 574.24) (104.83, 574.24)      /F134 Page-header	<|special_separator|>
(177.87, 566.43) (198.71, 566.43) (198.71, 574.24) (177.87, 574.24)      /F134 58022	<|special_separator|>
(219.21, 566.43) (233.69, 566.43) (233.69, 574.24) (219.21, 574.24)      /F134 5.10	<|special_separator|>
(250.02, 566.43) (264.50, 566.43) (264.50, 574.24) (250.02, 574.24)      /F134 6.70	<|special_separator|>
(280.83, 566.43) (295.31, 566.43) (295.31, 574.24) (280.83, 574.24)      /F134 5.06	<|special_separator|>
(305.27, 566.43) (324.98, 566.43) (324.98, 574.24) (305.27, 574.24)      /F134 85-89	<|special_separator|>
(334.94, 566.43) (354.65, 566.43) (354.65, 574.24) (334.94, 574.24)      /F134 66-76	<|special_separator|>
(364.61, 566.43) (384.32, 566.43) (384.32, 574.24) (364.61, 574.24)      /F134 90-94	<|special_separator|>
(394.28, 566.43) (418.16, 566.43) (418.16, 574.24) (394.28, 574.24)      /F134 98-100	<|special_separator|>
(428.12, 566.43) (447.83, 566.43) (447.83, 574.24) (428.12, 574.24)      /F134 91-92	<|special_separator|>
(457.80, 566.43) (477.51, 566.43) (477.51, 574.24) (457.80, 574.24)      /F134 97-99	<|special_separator|>
(487.47, 566.43) (507.18, 566.43) (507.18, 574.24) (487.47, 574.24)      /F134 81-86	<|special_separator|>
(104.83, 555.48) (130.81, 555.48) (130.81, 563.28) (104.83, 563.28)      /F134 Picture	<|special_separator|>
(177.87, 555.48) (198.71, 555.48) (198.71, 563.28) (177.87, 563.28)      /F134 45976	<|special_separator|>
(219.21, 555.48) (233.69, 555.48) (233.69, 563.28) (219.21, 563.28)      /F134 4.21	<|special_separator|>
(250.02, 555.48) (264.50, 555.48) (264.50, 563.28) (250.02, 563.28)      /F134 2.78	<|special_separator|>
(280.83, 555.48) (295.31, 555.48) (295.31, 563.28) (280.83, 563.28)      /F134 5.31	<|special_separator|>
(305.27, 555.48) (324.98, 555.48) (324.98, 563.28) (305.27, 563.28)      /F134 69-71	<|special_separator|>
(334.94, 555.48) (354.65, 555.48) (354.65, 563.28) (334.94, 563.28)      /F134 56-59	<|special_separator|>
(364.61, 555.48) (384.32, 555.48) (384.32, 563.28) (364.61, 563.28)      /F134 82-86	<|special_separator|>
(398.45, 555.48) (418.16, 555.48) (418.16, 563.28) (398.45, 563.28)      /F134 69-82	<|special_separator|>
(428.12, 555.48) (447.83, 555.48) (447.83, 563.28) (428.12, 563.28)      /F134 80-95	<|special_separator|>
(457.80, 555.48) (477.51, 555.48) (477.51, 563.28) (457.80, 563.28)      /F134 66-71	<|special_separator|>
(487.47, 555.48) (507.18, 555.48) (507.18, 563.28) (487.47, 563.28)      /F134 59-76	<|special_separator|>
(104.83, 544.52) (159.56, 544.52) (159.56, 552.32) (104.83, 552.32)      /F134 Section-header	<|special_separator|>
(173.70, 544.52) (198.71, 544.52) (198.71, 552.32) (173.70, 552.32)      /F134 142884	<|special_separator|>
(215.04, 544.52) (233.69, 544.52) (233.69, 552.32) (215.04, 552.32)      /F134 12.60	<|special_separator|>
(245.85, 544.52) (264.50, 544.52) (264.50, 552.32) (245.85, 552.32)      /F134 15.77	<|special_separator|>
(276.66, 544.52) (295.31, 544.52) (295.31, 552.32) (276.66, 552.32)      /F134 12.85	<|special_separator|>
(305.27, 544.52) (324.98, 544.52) (324.98, 552.32) (305.27, 552.32)      /F134 83-84	<|special_separator|>
(334.94, 544.52) (354.65, 544.52) (354.65, 552.32) (334.94, 552.32)      /F134 76-81	<|special_separator|>
(364.61, 544.52) (384.32, 544.52) (384.32, 552.32) (364.61, 552.32)      /F134 90-92	<|special_separator|>
(398.45, 544.52) (418.16, 544.52) (418.16, 552.32) (398.45, 552.32)      /F134 94-95	<|special_separator|>
(428.12, 544.52) (447.83, 544.52) (447.83, 552.32) (428.12, 552.32)      /F134 87-94	<|special_separator|>
(457.80, 544.52) (477.51, 544.52) (477.51, 552.32) (457.80, 552.32)      /F134 69-73	<|special_separator|>
(487.47, 544.52) (507.18, 544.52) (507.18, 552.32) (487.47, 552.32)      /F134 78-86	<|special_separator|>
(104.83, 533.56) (124.63, 533.56) (124.63, 541.36) (104.83, 541.36)      /F134 Table	<|special_separator|>
(177.87, 533.56) (198.71, 533.56) (198.71, 541.36) (177.87, 541.36)      /F134 34733	<|special_separator|>
(219.21, 533.56) (233.69, 533.56) (233.69, 541.36) (219.21, 541.36)      /F134 3.20	<|special_separator|>
(250.02, 533.56) (264.50, 533.56) (264.50, 541.36) (250.02, 541.36)      /F134 2.27	<|special_separator|>
(280.83, 533.56) (295.31, 533.56) (295.31, 541.36) (280.83, 541.36)      /F134 3.60	<|special_separator|>
(305.27, 533.56) (324.98, 533.56) (324.98, 541.36) (305.27, 541.36)      /F134 77-81	<|special_separator|>
(334.94, 533.56) (354.65, 533.56) (354.65, 541.36) (334.94, 541.36)      /F134 75-80	<|special_separator|>
(364.61, 533.56) (384.32, 533.56) (384.32, 541.36) (364.61, 541.36)      /F134 83-86	<|special_separator|>
(398.45, 533.56) (418.16, 533.56) (418.16, 541.36) (398.45, 541.36)      /F134 98-99	<|special_separator|>
(428.12, 533.56) (447.83, 533.56) (447.83, 541.36) (428.12, 541.36)      /F134 58-80	<|special_separator|>
(457.80, 533.56) (477.51, 533.56) (477.51, 541.36) (457.80, 541.36)      /F134 79-84	<|special_separator|>
(487.47, 533.56) (507.18, 533.56) (507.18, 541.36) (487.47, 541.36)      /F134 70-85	<|special_separator|>
(104.83, 522.60) (120.78, 522.60) (120.78, 530.40) (104.83, 530.40)      /F134 Text	<|special_separator|>
(173.70, 522.60) (198.71, 522.60) (198.71, 530.40) (173.70, 530.40)      /F134 510377	<|special_separator|>
(215.04, 522.60) (233.69, 522.60) (233.69, 530.40) (215.04, 530.40)      /F134 45.82	<|special_separator|>
(245.85, 522.60) (264.50, 522.60) (264.50, 530.40) (245.85, 530.40)      /F134 49.28	<|special_separator|>
(276.66, 522.60) (295.31, 522.60) (295.31, 530.40) (276.66, 530.40)      /F134 45.00	<|special_separator|>
(305.27, 522.60) (324.98, 522.60) (324.98, 530.40) (305.27, 530.40)      /F134 84-86	<|special_separator|>
(334.94, 522.60) (354.65, 522.60) (354.65, 530.40) (334.94, 530.40)      /F134 81-86	<|special_separator|>
(364.61, 522.60) (384.32, 522.60) (384.32, 530.40) (364.61, 530.40)      /F134 88-93	<|special_separator|>
(398.45, 522.60) (418.16, 522.60) (418.16, 530.40) (398.45, 530.40)      /F134 89-93	<|special_separator|>
(428.12, 522.60) (447.83, 522.60) (447.83, 530.40) (428.12, 530.40)      /F134 87-92	<|special_separator|>
(457.80, 522.60) (477.51, 522.60) (477.51, 530.40) (457.80, 530.40)      /F134 71-79	<|special_separator|>
(487.47, 522.60) (507.18, 522.60) (507.18, 530.40) (487.47, 530.40)      /F134 87-95	<|special_separator|>
(104.83, 511.64) (121.82, 511.64) (121.82, 519.44) (104.83, 519.44)      /F134 Title	<|special_separator|>
(182.03, 511.64) (198.71, 511.64) (198.71, 519.44) (182.03, 519.44)      /F134 5071	<|special_separator|>
(219.21, 511.64) (233.69, 511.64) (233.69, 519.44) (219.21, 519.44)      /F134 0.47	<|special_separator|>
(250.02, 511.64) (264.50, 511.64) (264.50, 519.44) (250.02, 519.44)      /F134 0.30	<|special_separator|>
(280.83, 511.64) (295.31, 511.64) (295.31, 519.44) (280.83, 519.44)      /F134 0.50	<|special_separator|>
(305.27, 511.64) (324.98, 511.64) (324.98, 519.44) (305.27, 519.44)      /F134 60-72	<|special_separator|>
(334.94, 511.64) (354.65, 511.64) (354.65, 519.44) (334.94, 519.44)      /F134 24-63	<|special_separator|>
(364.61, 511.64) (384.32, 511.64) (384.32, 519.44) (364.61, 519.44)      /F134 50-63	<|special_separator|>
(394.28, 511.64) (418.16, 511.64) (418.16, 519.44) (394.28, 519.44)      /F134 94-100	<|special_separator|>
(428.12, 511.64) (447.83, 511.64) (447.83, 519.44) (428.12, 519.44)      /F134 82-96	<|special_separator|>
(457.80, 511.64) (477.51, 511.64) (477.51, 519.44) (457.80, 519.44)      /F134 68-79	<|special_separator|>
(487.47, 511.64) (507.18, 511.64) (507.18, 519.44) (487.47, 519.44)      /F134 24-56	<|special_separator|>
(104.83, 500.28) (123.43, 500.28) (123.43, 508.08) (104.83, 508.08)      /F134 Total	<|special_separator|>
(169.53, 500.28) (198.71, 500.28) (198.71, 508.08) (169.53, 508.08)      /F134 1107470	<|special_separator|>
(208.68, 500.28) (233.69, 500.28) (233.69, 508.08) (208.68, 508.08)      /F134 941123	<|special_separator|>
(243.65, 500.28) (264.50, 500.28) (264.50, 508.08) (243.65, 508.08)      /F134 99816	<|special_separator|>
(274.46, 500.28) (295.31, 500.28) (295.31, 508.08) (274.46, 508.08)      /F134 66531	<|special_separator|>
(305.27, 500.28) (324.98, 500.28) (324.98, 508.08) (305.27, 508.08)      /F134 82-83	<|special_separator|>
(334.94, 500.28) (354.65, 500.28) (354.65, 508.08) (334.94, 508.08)      /F134 71-74	<|special_separator|>
(364.61, 500.28) (384.32, 500.28) (384.32, 508.08) (364.61, 508.08)      /F134 79-81	<|special_separator|>
(398.45, 500.28) (418.16, 500.28) (418.16, 508.08) (398.45, 508.08)      /F134 89-94	<|special_separator|>
(428.12, 500.28) (447.83, 500.28) (447.83, 508.08) (428.12, 508.08)      /F134 86-91	<|special_separator|>
(457.80, 500.28) (477.51, 500.28) (477.51, 508.08) (457.80, 508.08)      /F134 71-76	<|special_separator|>
(487.47, 500.28) (507.18, 500.28) (507.18, 508.08) (487.47, 508.08)      /F134 68-85	<|special_separator|>
(053.80, 229.77) (295.65, 229.77) (295.65, 237.22) (053.80, 237.22)      /F138 Figure 3: Corpus Conversion Service annotation user inter-	<|special_separator|>
(053.80, 218.81) (295.65, 218.81) (295.65, 226.26) (053.80, 226.26)      /F138 face. The PDF page is shown in the background, with over-	<|special_separator|>
(053.80, 207.85) (294.04, 207.85) (294.04, 215.30) (053.80, 215.30)      /F138 laid text-cells (in darker shades). The annotation boxes can	<|special_separator|>
(053.80, 196.89) (294.04, 196.89) (294.04, 204.34) (053.80, 204.34)      /F138 be drawn by dragging a rectangle over each segment with	<|special_separator|>
(053.80, 185.93) (252.79, 185.93) (252.79, 193.38) (053.80, 193.38)      /F138 the respective label from the palette on the right.	<|special_separator|>
(053.47, 149.48) (294.05, 149.48) (294.05, 157.28) (053.47, 157.28)      /F134 we distributed the annotation workload and performed continuous	<|special_separator|>
(053.80, 138.52) (294.05, 138.52) (294.05, 146.32) (053.80, 146.32)      /F134 quality controls. Phase one and two required a small team of experts	<|special_separator|>
(053.80, 127.56) (294.04, 127.56) (294.04, 135.36) (053.80, 135.36)      /F134 only. For phases three and four, a group of 40 dedicated annotators	<|special_separator|>
(053.47, 116.60) (170.59, 116.60) (170.59, 124.40) (053.47, 124.40)      /F134 were assembled and supervised.	<|special_separator|>
(063.76, 105.77) (226.72, 105.77) (226.72, 113.22) (063.76, 113.22)      /F138 Phase 1: Data selection and preparation.	<|special_separator|>
(229.07, 105.64) (295.56, 105.64) (295.56, 113.44) (229.07, 113.44)      /F134 Our inclusion cri-	<|special_separator|>
(053.80, 094.68) (294.05, 094.68) (294.05, 102.48) (053.80, 102.48)      /F134 teria for documents were described in Section 3. A large effort went	<|special_separator|>
(053.80, 083.72) (294.05, 083.72) (294.05, 091.52) (053.80, 091.52)      /F134 into ensuring that all documents are free to use. The data sources	<|special_separator|>
(317.95, 471.69) (481.03, 471.69) (481.03, 479.49) (317.95, 479.49)      /F134 include publication repositories such as arXiv	<|special_separator|>
(481.04, 475.31) (484.42, 475.31) (484.42, 481.64) (481.04, 481.64)      /F134 3	<|special_separator|>
(484.92, 471.69) (559.18, 471.69) (559.18, 479.49) (484.92, 479.49)      /F134 , government offices,	<|special_separator|>
(317.95, 460.73) (558.20, 460.73) (558.20, 468.53) (317.95, 468.53)      /F134 company websites as well as data directory services for financial	<|special_separator|>
(317.95, 449.77) (558.37, 449.77) (558.37, 457.57) (317.95, 457.57)      /F134 reports and patents. Scanned documents were excluded wherever	<|special_separator|>
(317.95, 438.81) (558.20, 438.81) (558.20, 446.61) (317.95, 446.61)      /F134 possible because they can be rotated or skewed. This would not	<|special_separator|>
(317.95, 427.85) (558.20, 427.85) (558.20, 435.65) (317.95, 435.65)      /F134 allow us to perform annotation with rectangular bounding-boxes	<|special_separator|>
(317.95, 416.89) (496.72, 416.89) (496.72, 424.70) (317.95, 424.70)      /F134 and therefore complicate the annotation process.	<|special_separator|>
(327.92, 405.94) (558.21, 405.94) (558.21, 413.74) (327.92, 413.74)      /F134 Preparation work included uploading and parsing the sourced	<|special_separator|>
(317.95, 394.98) (558.20, 394.98) (558.20, 402.78) (317.95, 402.78)      /F134 PDF documents in the Corpus Conversion Service (CCS) [22], a	<|special_separator|>
(317.95, 384.02) (558.20, 384.02) (558.20, 391.82) (317.95, 391.82)      /F134 cloud-native platform which provides a visual annotation interface	<|special_separator|>
(317.95, 373.06) (559.71, 373.06) (559.71, 380.86) (317.95, 380.86)      /F134 and allows for dataset inspection and analysis. The annotation in-	<|special_separator|>
(317.95, 362.10) (558.20, 362.10) (558.20, 369.90) (317.95, 369.90)      /F134 terface of CCS is shown in Figure 3. The desired balance of pages	<|special_separator|>
(317.95, 351.14) (559.71, 351.14) (559.71, 358.94) (317.95, 358.94)      /F134 between the different document categories was achieved by se-	<|special_separator|>
(317.95, 340.18) (558.37, 340.18) (558.37, 347.98) (317.95, 347.98)      /F134 lective subsampling of pages with certain desired properties. For	<|special_separator|>
(317.95, 329.22) (558.20, 329.22) (558.20, 337.02) (317.95, 337.02)      /F134 example, we made sure to include the title page of each document	<|special_separator|>
(317.95, 318.26) (558.37, 318.26) (558.37, 326.07) (317.95, 326.07)      /F134 and bias the remaining page selection to those with figures or	<|special_separator|>
(317.95, 307.31) (558.20, 307.31) (558.20, 315.11) (317.95, 315.11)      /F134 tables. The latter was achieved by leveraging pre-trained object	<|special_separator|>
(317.95, 296.35) (558.53, 296.35) (558.53, 304.15) (317.95, 304.15)      /F134 detection models from PubLayNet, which helped us estimate how	<|special_separator|>
(317.95, 285.39) (488.47, 285.39) (488.47, 293.19) (317.95, 293.19)      /F134 many figures and tables a given page contains.	<|special_separator|>
(327.92, 274.55) (482.42, 274.55) (482.42, 282.01) (327.92, 282.01)      /F138 Phase 2: Label selection and guideline.	<|special_separator|>
(484.47, 274.43) (559.72, 274.43) (559.72, 282.23) (484.47, 282.23)      /F134 We reviewed the col-	<|special_separator|>
(317.95, 263.47) (559.72, 263.47) (559.72, 271.27) (317.95, 271.27)      /F134 lected documents and identified the most common structural fea-	<|special_separator|>
(317.95, 252.51) (558.20, 252.51) (558.20, 260.31) (317.95, 260.31)      /F134 tures they exhibit. This was achieved by identifying recurrent layout	<|special_separator|>
(317.95, 241.55) (559.59, 241.55) (559.59, 249.35) (317.95, 249.35)      /F134 elements and lead us to the definition of 11 distinct class labels.	<|special_separator|>
(317.69, 230.59) (404.14, 230.59) (404.14, 238.40) (317.69, 238.40)      /F134 These 11 class labels are	<|special_separator|>
(406.27, 230.55) (433.85, 230.55) (433.85, 238.46) (406.27, 238.46)      /F148 Caption	<|special_separator|>
(433.85, 230.59) (435.78, 230.59) (435.78, 238.40) (433.85, 238.40)      /F134 ,	<|special_separator|>
(437.91, 230.55) (467.24, 230.55) (467.24, 238.46) (437.91, 238.46)      /F148 Footnote	<|special_separator|>
(467.24, 230.59) (469.17, 230.59) (469.17, 238.40) (467.24, 238.40)      /F134 ,	<|special_separator|>
(471.30, 230.55) (500.45, 230.55) (500.45, 238.46) (471.30, 238.46)      /F148 Formula	<|special_separator|>
(500.45, 230.59) (502.38, 230.59) (502.38, 238.40) (500.45, 238.40)      /F134 ,	<|special_separator|>
(504.50, 230.55) (535.74, 230.55) (535.74, 238.46) (504.50, 238.46)      /F148 List-item	<|special_separator|>
(535.74, 230.59) (537.68, 230.59) (537.68, 238.40) (535.74, 238.40)      /F134 ,	<|special_separator|>
(539.80, 230.55) (559.10, 230.55) (559.10, 238.46) (539.80, 238.46)      /F148 Page-	<|special_separator|>
(317.95, 219.59) (338.81, 219.59) (338.81, 227.50) (317.95, 227.50)      /F148 footer	<|special_separator|>
(338.81, 219.63) (340.82, 219.63) (340.82, 227.44) (338.81, 227.44)      /F134 ,	<|special_separator|>
(343.61, 219.59) (387.96, 219.59) (387.96, 227.50) (343.61, 227.50)      /F148 Page-header	<|special_separator|>
(387.96, 219.63) (389.97, 219.63) (389.97, 227.44) (387.96, 227.44)      /F134 ,	<|special_separator|>
(392.77, 219.59) (417.85, 219.59) (417.85, 227.50) (392.77, 227.50)      /F148 Picture	<|special_separator|>
(417.85, 219.63) (419.86, 219.63) (419.86, 227.44) (417.85, 227.44)      /F134 ,	<|special_separator|>
(422.65, 219.59) (475.56, 219.59) (475.56, 227.50) (422.65, 227.50)      /F148 Section-header	<|special_separator|>
(475.56, 219.63) (477.57, 219.63) (477.57, 227.44) (475.56, 227.44)      /F134 ,	<|special_separator|>
(480.37, 219.59) (499.82, 219.59) (499.82, 227.50) (480.37, 227.50)      /F148 Table	<|special_separator|>
(499.82, 219.63) (501.83, 219.63) (501.83, 227.44) (499.82, 227.44)      /F134 ,	<|special_separator|>
(504.63, 219.59) (519.79, 219.59) (519.79, 227.50) (504.63, 227.50)      /F148 Text	<|special_separator|>
(519.80, 219.63) (538.37, 219.63) (538.37, 227.44) (519.80, 227.44)      /F134 , and	<|special_separator|>
(541.16, 219.59) (557.57, 219.59) (557.57, 227.50) (541.16, 227.50)      /F148 Title	<|special_separator|>
(557.57, 219.63) (559.58, 219.63) (559.58, 227.44) (557.57, 227.44)      /F134 .	<|special_separator|>
(317.95, 208.68) (558.20, 208.68) (558.20, 216.48) (317.95, 216.48)      /F134 Critical factors that were considered for the choice of these class	<|special_separator|>
(317.95, 197.72) (558.43, 197.72) (558.43, 205.52) (317.95, 205.52)      /F134 labels were (1) the overall occurrence of the label, (2) the specificity	<|special_separator|>
(317.95, 186.76) (558.37, 186.76) (558.37, 194.56) (317.95, 194.56)      /F134 of the label, (3) recognisability on a single page (i.e. no need for	<|special_separator|>
(317.95, 175.80) (558.20, 175.80) (558.20, 183.60) (317.95, 183.60)      /F134 context from previous or next page) and (4) overall coverage of the	<|special_separator|>
(317.95, 164.84) (559.19, 164.84) (559.19, 172.64) (317.95, 172.64)      /F134 page. Specificity ensures that the choice of label is not ambiguous,	<|special_separator|>
(317.62, 153.88) (558.20, 153.88) (558.20, 161.68) (317.62, 161.68)      /F134 while coverage ensures that all meaningful items on a page can	<|special_separator|>
(317.95, 142.92) (558.20, 142.92) (558.20, 150.72) (317.95, 150.72)      /F134 be annotated. We refrained from class labels that are very specific	<|special_separator|>
(317.95, 131.96) (436.91, 131.96) (436.91, 139.76) (317.95, 139.76)      /F134 to a document category, such as	<|special_separator|>
(439.14, 131.92) (469.69, 131.92) (469.69, 139.83) (439.14, 139.83)      /F148 Abstract	<|special_separator|>
(472.43, 131.96) (493.97, 131.96) (493.97, 139.76) (472.43, 139.76)      /F134 in the	<|special_separator|>
(496.21, 131.92) (558.20, 131.92) (558.20, 139.83) (496.21, 139.83)      /F148 Scientific Articles	<|special_separator|>
(317.95, 121.00) (558.21, 121.00) (558.21, 128.81) (317.95, 128.81)      /F134 category. We also avoided class labels that are tightly linked to the	<|special_separator|>
(317.95, 110.05) (447.65, 110.05) (447.65, 117.85) (317.95, 117.85)      /F134 semantics of the text. Labels such as	<|special_separator|>
(449.86, 110.00) (474.31, 110.00) (474.31, 117.91) (449.86, 117.91)      /F148 Author	<|special_separator|>
(477.17, 110.05) (490.40, 110.05) (490.40, 117.85) (477.17, 117.85)      /F134 and	<|special_separator|>
(492.61, 110.00) (528.30, 110.00) (528.30, 117.91) (492.61, 117.91)      /F148 Affiliation	<|special_separator|>
(528.30, 110.05) (558.20, 110.05) (558.20, 117.85) (528.30, 117.85)      /F134 , as seen	<|special_separator|>
(317.95, 099.09) (558.20, 099.09) (558.20, 106.89) (317.95, 106.89)      /F134 in DocBank, are often only distinguishable by discriminating on	<|special_separator|>
(317.95, 086.21) (320.50, 086.21) (320.50, 090.98) (317.95, 090.98)      /F134 3	<|special_separator|>
(321.00, 083.24) (369.25, 083.24) (369.25, 089.31) (321.00, 089.31)      /F134 https://arxiv.org/