taulu
Taulu - segment tables from images
Taulu is a Python package designed to segment images of tables into their constituent rows and columns (and cells).
To use this package, you first need to make an annotation of the headers in your table images. The idea is that these headers will be similar across your full set of images, and they will be used as a starting point for the search algorithm that finds the table grid.
Here is an example python script of how to use Taulu:
from taulu import Taulu
import os
def setup():
# create an Annotation file of the headers in the image
# (one for the left header, one for the right)
# and store them in the examples directory
print("Annotating the LEFT header...")
Taulu.annotate("../data/table_00.png", "table_00_header_left.png")
print("Annotating the RIGHT header...")
Taulu.annotate("../data/table_00.png", "table_00_header_right.png")
def main():
taulu = Taulu(("table_00_header_left.png", "table_00_header_right.png"))
table = taulu.segment_table("../data/table_00.png", cell_height_factor=0.8, debug_view=True)
table.show_cells("../data/table_00.png")
if __name__ == "__main__":
if os.path.exists("table_00_header_left.png") and os.path.exists(
"table_00_header_right.png"
):
main()
else:
setup()
main()
If you want a high-level overview of how to use Taulu, see .taulu.Taulu">the Taulu class
1""" 2Taulu - *segment tables from images* 3 4Taulu is a Python package designed to segment images of tables into their constituent rows and columns (and cells). 5 6To use this package, you first need to make an annotation of the headers in your table images. 7The idea is that these headers will be similar across your full set of images, and they will be 8used as a starting point for the search algorithm that finds the table grid. 9 10Here is an example python script of how to use Taulu: 11```python 12from taulu import Taulu 13import os 14 15 16def setup(): 17 # create an Annotation file of the headers in the image 18 # (one for the left header, one for the right) 19 # and store them in the examples directory 20 print("Annotating the LEFT header...") 21 Taulu.annotate("../data/table_00.png", "table_00_header_left.png") 22 23 print("Annotating the RIGHT header...") 24 Taulu.annotate("../data/table_00.png", "table_00_header_right.png") 25 26 27def main(): 28 taulu = Taulu(("table_00_header_left.png", "table_00_header_right.png")) 29 table = taulu.segment_table("../data/table_00.png", cell_height_factor=0.8, debug_view=True) 30 31 table.show_cells("../data/table_00.png") 32 33 34if __name__ == "__main__": 35 if os.path.exists("table_00_header_left.png") and os.path.exists( 36 "table_00_header_right.png" 37 ): 38 main() 39 else: 40 setup() 41 main() 42 43``` 44 45If you want a high-level overview of how to use Taulu, see [the Taulu class](./taulu.html#taulu.taulu.Taulu) 46""" 47 48from .grid import GridDetector, TableGrid 49from .header_aligner import HeaderAligner 50from .header_template import HeaderTemplate 51from .table_indexer import TableIndexer 52from .split import Split 53from .taulu import Taulu 54 55__pdoc__ = {} 56__pdoc__["constants"] = False 57__pdoc__["main"] = False 58__pdoc__["decorators"] = False 59__pdoc__["error"] = False 60__pdoc__["types"] = False 61__pdoc__["img_util"] = False 62 63__all__ = [ 64 "GridDetector", 65 "TableGrid", 66 "HeaderAligner", 67 "HeaderTemplate", 68 "TableIndexer", 69 "Split", 70 "Taulu", 71]
119class GridDetector: 120 """ 121 Detects table grid intersections using morphological filtering and template matching. 122 123 This detector implements a multi-stage pipeline: 124 125 1. **Binarization**: Sauvola adaptive thresholding to handle varying lighting 126 2. **Morphological operations**: Dilation to connect broken rule segments 127 3. **Cross-kernel matching**: Template matching with a cross-shaped kernel to find 128 rule intersections where horizontal and vertical lines meet 129 4. **Grid growing**: Iterative point detection starting from a known seed point 130 131 The cross-kernel is designed to match the specific geometry of your table rules. 132 It should be sized so that after morphology, it aligns with actual corner shapes. 133 134 ## Tuning Guidelines 135 136 - **kernel_size**: Increase if you need more selectivity (fewer false positives) 137 - **cross_width/height**: Should match rule thickness after morphology 138 - **morph_size**: Increase to connect more broken lines, but this thickens rules 139 - **sauvola_k**: Increase to threshold more aggressively (remove noise) 140 - **search_region**: Increase for documents with more warping/distortion 141 - **distance_penalty**: Increase to prefer corners closer to expected positions 142 143 ## Visual Debugging 144 145 Set `visual=True` in methods to see intermediate results and tune parameters. 146 """ 147 148 def __init__( 149 self, 150 kernel_size: int = 21, 151 cross_width: int = 6, 152 cross_height: Optional[int] = None, 153 morph_size: Optional[int] = None, 154 sauvola_k: float = 0.04, 155 sauvola_window: int = 15, 156 scale: float = 1.0, 157 search_region: int = 40, 158 distance_penalty: float = 0.4, 159 min_rows: int = 5, 160 grow_threshold: float = 0.3, 161 look_distance: int = 4, 162 ): 163 """ 164 Args: 165 kernel_size (int): the size of the cross kernel 166 a larger kernel size often means that more penalty is applied, often leading 167 to more sparse results 168 cross_width (int): the width of one of the edges in the cross filter, should be 169 roughly equal to the width of the rules in the image after morphology is applied 170 cross_height (int | None): useful if the horizontal rules and vertical rules 171 have different sizes 172 morph_size (int | None): the size of the morphology operators that are applied before 173 the cross kernel. 'bridges the gaps' of broken-up lines 174 sauvola_k (float): threshold parameter for sauvola thresholding 175 sauvola_window (int): window_size parameter for sauvola thresholding 176 scale (float): image scale factor to do calculations on (useful for increasing calculation speed mostly) 177 search_region (int): area in which to search for a new max value in `find_nearest` etc. 178 distance_penalty (float): how much the point finding algorithm penalizes points that are further in the region [0, 1] 179 min_rows (int): minimum number of rows to find before stopping the table finding algorithm 180 grow_threshold (float): the threshold for accepting a new point when growing the table 181 look_distance (int): how many points away to look when calculating the median slope 182 """ 183 self._validate_parameters( 184 kernel_size, 185 cross_width, 186 cross_height, 187 morph_size, 188 search_region, 189 sauvola_k, 190 sauvola_window, 191 distance_penalty, 192 ) 193 194 self._kernel_size = kernel_size 195 self._cross_width = cross_width 196 self._cross_height = cross_width if cross_height is None else cross_height 197 self._morph_size = morph_size if morph_size is not None else cross_width 198 self._search_region = search_region 199 self._sauvola_k = sauvola_k 200 self._sauvola_window = sauvola_window 201 self._distance_penalty = distance_penalty 202 self._scale = scale 203 self._min_rows = min_rows 204 self._grow_threshold = grow_threshold 205 self._look_distance = look_distance 206 207 self._cross_kernel = self._create_cross_kernel() 208 209 def _validate_parameters( 210 self, 211 kernel_size: int, 212 cross_width: int, 213 cross_height: Optional[int], 214 morph_size: Optional[int], 215 search_region: int, 216 sauvola_k: float, 217 sauvola_window: int, 218 distance_penalty: float, 219 ) -> None: 220 """Validate initialization parameters.""" 221 if kernel_size % 2 == 0: 222 raise ValueError("kernel_size must be odd") 223 if ( 224 kernel_size <= 0 225 or cross_width <= 0 226 or search_region <= 0 227 or sauvola_window <= 0 228 ): 229 raise ValueError("Size parameters must be positive") 230 if cross_height is not None and cross_height <= 0: 231 raise ValueError("cross_height must be positive") 232 if morph_size is not None and morph_size <= 0: 233 raise ValueError("morph_size must be positive") 234 if not 0 <= distance_penalty <= 1: 235 raise ValueError("distance_penalty must be in [0, 1]") 236 if sauvola_k <= 0: 237 raise ValueError("sauvola_k must be positive") 238 239 def _create_gaussian_weights(self, region_size: int) -> NDArray: 240 """ 241 Create a 2D Gaussian weight mask. 242 243 Args: 244 shape (tuple[int, int]): Shape of the region (height, width) 245 p (float): Minimum value at the edge = 1 - p 246 247 Returns: 248 NDArray: Gaussian weight mask 249 """ 250 if self._distance_penalty == 0: 251 return np.ones((region_size, region_size), dtype=np.float32) 252 253 y = np.linspace(-1, 1, region_size) 254 x = np.linspace(-1, 1, region_size) 255 xv, yv = np.meshgrid(x, y) 256 dist_squared = xv**2 + yv**2 257 258 # Prevent log(0) when distance_penalty is 1 259 if self._distance_penalty >= 0.999: 260 sigma = 0.1 # Small sigma for very sharp peak 261 else: 262 sigma = np.sqrt(-1 / (2 * np.log(1 - self._distance_penalty))) 263 264 weights = np.exp(-dist_squared / (2 * sigma**2)) 265 266 return weights.astype(np.float32) 267 268 def _create_cross_kernel(self) -> NDArray: 269 kernel = np.zeros((self._kernel_size, self._kernel_size), dtype=np.uint8) 270 center = self._kernel_size // 2 271 272 # Create horizontal bar 273 h_start = max(0, center - self._cross_height // 2) 274 h_end = min(self._kernel_size, center + (self._cross_height + 1) // 2) 275 kernel[h_start:h_end, :] = 255 276 277 # Create vertical bar 278 v_start = max(0, center - self._cross_width // 2) 279 v_end = min(self._kernel_size, center + (self._cross_width + 1) // 2) 280 kernel[:, v_start:v_end] = 255 281 282 return kernel 283 284 def _apply_morphology(self, binary: MatLike) -> MatLike: 285 # Define a horizontal kernel (adjust width as needed) 286 kernel_hor = cv.getStructuringElement(cv.MORPH_RECT, (self._morph_size, 1)) 287 kernel_ver = cv.getStructuringElement(cv.MORPH_RECT, (1, self._morph_size)) 288 289 # Apply dilation 290 dilated = cv.dilate(binary, kernel_hor, iterations=1) 291 dilated = cv.dilate(dilated, kernel_ver, iterations=1) 292 293 return dilated 294 295 def _apply_cross_matching(self, img: MatLike) -> MatLike: 296 """Apply cross kernel template matching.""" 297 pad_y = self._cross_kernel.shape[0] // 2 298 pad_x = self._cross_kernel.shape[1] // 2 299 300 padded = cv.copyMakeBorder( 301 img, pad_y, pad_y, pad_x, pad_x, borderType=cv.BORDER_CONSTANT, value=0 302 ) 303 304 filtered = cv.matchTemplate(padded, self._cross_kernel, cv.TM_SQDIFF_NORMED) 305 # Invert and normalize to 0-255 range 306 filtered = cv.normalize(1.0 - filtered, None, 0, 255, cv.NORM_MINMAX) 307 return filtered.astype(np.uint8) 308 309 def apply(self, img: MatLike, visual: bool = False) -> MatLike: 310 """ 311 Apply the grid detection filter to the input image. 312 313 Args: 314 img (MatLike): the input image 315 visual (bool): whether to show intermediate steps 316 317 Returns: 318 MatLike: the filtered image, with high values (whiter pixels) at intersections of horizontal and vertical rules 319 """ 320 321 if img is None or img.size == 0: 322 raise ValueError("Input image is empty or None") 323 324 binary = imu.sauvola(img, k=self._sauvola_k, window_size=self._sauvola_window) 325 326 if visual: 327 imu.show(binary, title="thresholded") 328 329 binary = self._apply_morphology(binary) 330 331 if visual: 332 imu.show(binary, title="dilated") 333 334 filtered = self._apply_cross_matching(binary) 335 336 return filtered 337 338 @log_calls(level=logging.DEBUG, include_return=True) 339 def find_nearest( 340 self, filtered: MatLike, point: Point, region: Optional[int] = None 341 ) -> Tuple[Point, float]: 342 """ 343 Find the nearest 'corner match' in the image, along with its score [0,1] 344 345 Args: 346 filtered (MatLike): the filtered image (obtained through `apply`) 347 point (tuple[int, int]): the approximate target point (x, y) 348 region (None | int): alternative value for search region, 349 overwriting the `__init__` parameter `region` 350 """ 351 352 if filtered is None or filtered.size == 0: 353 raise ValueError("Filtered image is empty or None") 354 355 region_size = region if region is not None else self._search_region 356 x, y = point 357 358 # Calculate crop boundaries 359 crop_x = max(0, x - region_size // 2) 360 crop_y = max(0, y - region_size // 2) 361 crop_width = min(region_size, filtered.shape[1] - crop_x) 362 crop_height = min(region_size, filtered.shape[0] - crop_y) 363 364 # Handle edge cases 365 if crop_width <= 0 or crop_height <= 0: 366 logger.warning(f"Point {point} is outside image bounds") 367 return point, 0.0 368 369 cropped = filtered[crop_y : crop_y + crop_height, crop_x : crop_x + crop_width] 370 371 if cropped.size == 0: 372 return point, 0.0 373 374 # Always apply Gaussian weighting by extending crop if needed 375 if cropped.shape[0] == region_size and cropped.shape[1] == region_size: 376 # Perfect size - apply weights directly 377 weights = self._create_gaussian_weights(region_size) 378 weighted = cropped.astype(np.float32) * weights 379 else: 380 # Extend crop to match region_size, apply weights, then restore 381 extended = np.zeros((region_size, region_size), dtype=cropped.dtype) 382 383 # Calculate offset to center the cropped region in extended array 384 offset_y = (region_size - cropped.shape[0]) // 2 385 offset_x = (region_size - cropped.shape[1]) // 2 386 387 # Place cropped region in center of extended array 388 extended[ 389 offset_y : offset_y + cropped.shape[0], 390 offset_x : offset_x + cropped.shape[1], 391 ] = cropped 392 393 # Apply Gaussian weights to extended array 394 weights = self._create_gaussian_weights(region_size) 395 weighted_extended = extended.astype(np.float32) * weights 396 397 # Extract the original region back out 398 weighted = weighted_extended[ 399 offset_y : offset_y + cropped.shape[0], 400 offset_x : offset_x + cropped.shape[1], 401 ] 402 403 best_idx = np.argmax(weighted) 404 best_y, best_x = np.unravel_index(best_idx, cropped.shape) 405 406 result_point = ( 407 int(crop_x + best_x), 408 int(crop_y + best_y), 409 ) 410 result_confidence = float(weighted[best_y, best_x]) / 255.0 411 412 return result_point, result_confidence 413 414 def find_table_points( 415 self, 416 img: MatLike | PathLike[str], 417 left_top: Point, 418 cell_widths: list[int], 419 cell_heights: list[int] | int, 420 visual: bool = False, 421 window: str = WINDOW, 422 goals_width: Optional[int] = None, 423 ) -> "TableGrid": 424 """ 425 Parse the image to a `TableGrid` structure that holds all of the 426 intersections between horizontal and vertical rules, starting near the `left_top` point 427 428 Args: 429 img (MatLike): the input image of a table 430 left_top (tuple[int, int]): the starting point of the algorithm 431 cell_widths (list[int]): the expected widths of the cells (based on a header template) 432 cell_heights (list[int]): the expected height of the rows of data. 433 The last value from this list is used until the image has no more vertical space. 434 visual (bool): whether to show intermediate steps 435 window (str): the name of the OpenCV window to use for visualization 436 goals_width (int | None): the width of the goal region when searching for the next point. 437 If None, defaults to 1.5 * search_region 438 439 Returns: 440 a TableGrid object 441 """ 442 443 if goals_width is None: 444 goals_width = self._search_region * 3 // 2 445 446 if not cell_widths: 447 raise ValueError("cell_widths must contain at least one value") 448 449 if not isinstance(img, np.ndarray): 450 img = cv.imread(os.fspath(img)) 451 452 filtered = self.apply(img, visual) 453 454 if visual: 455 imu.show(filtered, window=window) 456 457 if isinstance(cell_heights, int): 458 cell_heights = [cell_heights] 459 460 left_top, confidence = self.find_nearest( 461 filtered, left_top, int(self._search_region * 3) 462 ) 463 464 if confidence < 0.1: 465 logger.warning( 466 f"Low confidence for the starting point: {confidence} at {left_top}" 467 ) 468 469 # resize all parameters according to scale 470 img = cv.resize(img, None, fx=self._scale, fy=self._scale) 471 472 if visual: 473 imu.push(img) 474 475 filtered = cv.resize(filtered, None, fx=self._scale, fy=self._scale) 476 cell_widths = [int(w * self._scale) for w in cell_widths] 477 cell_heights = [int(h * self._scale) for h in cell_heights] 478 left_top = (int(left_top[0] * self._scale), int(left_top[1] * self._scale)) 479 self._search_region = int(self._search_region * self._scale) 480 481 img_gray = ensure_gray(img) 482 filtered_gray = ensure_gray(filtered) 483 484 table_grower = TableGrower( 485 img_gray, 486 filtered_gray, 487 cell_widths, # pyright: ignore 488 cell_heights, # pyright: ignore 489 left_top, 490 self._search_region, 491 self._distance_penalty, 492 self._look_distance, 493 self._grow_threshold, 494 self._min_rows, 495 ) 496 497 def show_grower_progress(wait: bool = False): 498 img_orig = np.copy(img) 499 corners = table_grower.get_all_corners() 500 for y in range(len(corners)): 501 for x in range(len(corners[y])): 502 if corners[y][x] is not None: 503 img_orig = imu.draw_points( 504 img_orig, 505 [corners[y][x]], 506 color=(0, 0, 255), 507 thickness=30, 508 ) 509 510 edge = table_grower.get_edge_points() 511 512 for point, score in edge: 513 color = (100, int(clamp(score * 255, 0, 255)), 100) 514 imu.draw_point(img_orig, point, color=color, thickness=20) 515 516 imu.show(img_orig, wait=wait) 517 518 if visual: 519 threshold = self._grow_threshold 520 look_distance = self._look_distance 521 522 # python implementation of rust loops, for visualization purposes 523 # note this is a LOT slower 524 while table_grower.grow_point(img_gray, filtered_gray) is not None: 525 show_grower_progress() 526 527 show_grower_progress(True) 528 529 original_threshold = threshold 530 531 loops_without_change = 0 532 533 while not table_grower.is_table_complete(): 534 loops_without_change += 1 535 536 if loops_without_change > 50: 537 break 538 539 if table_grower.extrapolate_one(img_gray, filtered_gray) is not None: 540 show_grower_progress() 541 542 loops_without_change = 0 543 544 grown = False 545 while table_grower.grow_point(img_gray, filtered_gray) is not None: 546 show_grower_progress() 547 grown = True 548 threshold = min(0.1 + 0.9 * threshold, original_threshold) 549 table_grower.set_threshold(threshold) 550 551 if not grown: 552 threshold *= 0.9 553 table_grower.set_threshold(threshold) 554 555 else: 556 threshold *= 0.9 557 table_grower.set_threshold(threshold) 558 559 if table_grower.grow_point(img_gray, filtered_gray) is not None: 560 show_grower_progress() 561 loops_without_change = 0 562 563 else: 564 table_grower.grow_table(img_gray, filtered_gray) 565 566 table_grower.smooth_grid() 567 corners = table_grower.get_all_corners() 568 logger.info( 569 f"Table growth complete, found {len(corners)} rows and {len(corners[0])} columns" 570 ) 571 # rescale corners back to original size 572 if self._scale != 1.0: 573 for y in range(len(corners)): 574 for x in range(len(corners[y])): 575 if corners[y][x] is not None: 576 corners[y][x] = ( 577 int(corners[y][x][0] / self._scale), # pyright:ignore 578 int(corners[y][x][1] / self._scale), # pyright:ignore 579 ) 580 581 return TableGrid(corners) # pyright: ignore 582 583 @log_calls(level=logging.DEBUG, include_return=True) 584 def _build_table_row( 585 self, 586 gray: MatLike, 587 filtered: MatLike, 588 start_point: Point, 589 cell_widths: List[int], 590 row_idx: int, 591 goals_width: int, 592 previous_row_points: Optional[List[Point]] = None, 593 visual: bool = False, 594 ) -> List[Point]: 595 """Build a single row of table points.""" 596 row = [start_point] 597 current = start_point 598 599 for col_idx, width in enumerate(cell_widths): 600 next_point = self._find_next_column_point( 601 gray, 602 filtered, 603 current, 604 width, 605 goals_width, 606 visual, 607 previous_row_points, 608 col_idx, 609 ) 610 if next_point is None: 611 logger.warning( 612 f"Could not find point for row {row_idx}, col {col_idx + 1}" 613 ) 614 return [] # Return empty list to signal failure 615 row.append(next_point) 616 current = next_point 617 618 return row 619 620 def _clamp_point_to_img(self, point: Point, img: MatLike) -> Point: 621 """Clamp a point to be within the image bounds.""" 622 x = max(0, min(point[0], img.shape[1] - 1)) 623 y = max(0, min(point[1], img.shape[0] - 1)) 624 return (x, y) 625 626 @log_calls(level=logging.DEBUG, include_return=True) 627 def _find_next_column_point( 628 self, 629 gray: MatLike, 630 filtered: MatLike, 631 current: Point, 632 width: int, 633 goals_width: int, 634 visual: bool = False, 635 previous_row_points: Optional[List[Point]] = None, 636 current_col_idx: int = 0, 637 ) -> Optional[Point]: 638 """Find the next point in the current row.""" 639 640 if previous_row_points is not None and current_col_idx + 1 < len( 641 previous_row_points 642 ): 643 # grow an astar path downwards from the previous row point that is 644 # above and to the right of current 645 # and ensure all points are within image bounds 646 bottom_right = [ 647 self._clamp_point_to_img( 648 ( 649 current[0] + width - goals_width // 2 + x, 650 current[1] + goals_width, 651 ), 652 gray, 653 ) 654 for x in range(goals_width) 655 ] 656 goals = self._astar( 657 gray, previous_row_points[current_col_idx + 1], bottom_right, "down" 658 ) 659 660 if goals is None: 661 logger.warning( 662 f"A* failed to find path going downwards from previous row's point at idx {current_col_idx + 1}" 663 ) 664 return None 665 else: 666 goals = [ 667 self._clamp_point_to_img( 668 (current[0] + width, current[1] - goals_width // 2 + y), gray 669 ) 670 for y in range(goals_width) 671 ] 672 673 path = self._astar(gray, current, goals, "right") 674 675 if path is None: 676 logger.warning( 677 f"A* failed to find path going rightward from {current} to goals" 678 ) 679 return None 680 681 next_point, _ = self.find_nearest(filtered, path[-1], self._search_region) 682 683 # show the point and the search region on the image for debugging 684 if visual: 685 self._visualize_path_finding( 686 goals + path, 687 current, 688 next_point, 689 current, 690 path[-1], 691 self._search_region, 692 ) 693 694 return next_point 695 696 @log_calls(level=logging.DEBUG, include_return=True) 697 def _find_next_row_start( 698 self, 699 gray: MatLike, 700 filtered: MatLike, 701 top_point: Point, 702 row_idx: int, 703 cell_heights: List[int], 704 goals_width: int, 705 visual: bool = False, 706 ) -> Optional[Point]: 707 """Find the starting point of the next row.""" 708 if row_idx < len(cell_heights): 709 row_height = cell_heights[row_idx] 710 else: 711 row_height = cell_heights[-1] 712 713 if top_point[1] + row_height >= filtered.shape[0] - 10: # Near bottom 714 return None 715 716 goals = [ 717 (top_point[0] - goals_width // 2 + x, top_point[1] + row_height) 718 for x in range(goals_width) 719 ] 720 721 path = self._astar(gray, top_point, goals, "down") 722 if path is None: 723 return None 724 725 next_point, _ = self.find_nearest( 726 filtered, path[-1], region=self._search_region * 3 // 2 727 ) 728 729 if visual: 730 self._visualize_path_finding( 731 path, top_point, next_point, top_point, path[-1], self._search_region 732 ) 733 734 return next_point 735 736 def _visualize_grid(self, img: MatLike, points: List[List[Point]]) -> None: 737 """Visualize the detected grid points.""" 738 all_points = [point for row in points for point in row] 739 drawn = imu.draw_points(img, all_points) 740 imu.show(drawn, wait=True) 741 742 def _visualize_path_finding( 743 self, 744 path: List[Point], 745 current: Point, 746 next_point: Point, 747 previous_row_target: Optional[Point] = None, 748 region_center: Optional[Point] = None, 749 region_size: Optional[int] = None, 750 ) -> None: 751 """Visualize the path finding process for debugging.""" 752 global show_time 753 754 screen = imu.pop() 755 756 # if gray, convert to BGR 757 if len(screen.shape) == 2 or screen.shape[2] == 1: 758 debug_img = cv.cvtColor(screen, cv.COLOR_GRAY2BGR) 759 else: 760 debug_img = cast(MatLike, screen) 761 762 debug_img = imu.draw_points(debug_img, path, color=(200, 200, 0), thickness=2) 763 debug_img = imu.draw_points( 764 debug_img, [current], color=(0, 255, 0), thickness=3 765 ) 766 debug_img = imu.draw_points( 767 debug_img, [next_point], color=(0, 0, 255), thickness=2 768 ) 769 770 # Draw previous row target if available 771 if previous_row_target is not None: 772 debug_img = imu.draw_points( 773 debug_img, [previous_row_target], color=(255, 0, 255), thickness=2 774 ) 775 776 # Draw search region if available 777 if region_center is not None and region_size is not None: 778 top_left = ( 779 max(0, region_center[0] - region_size // 2), 780 max(0, region_center[1] - region_size // 2), 781 ) 782 bottom_right = ( 783 min(debug_img.shape[1], region_center[0] + region_size // 2), 784 min(debug_img.shape[0], region_center[1] + region_size // 2), 785 ) 786 cv.rectangle( 787 debug_img, 788 top_left, 789 bottom_right, 790 color=(255, 0, 0), 791 thickness=2, 792 lineType=cv.LINE_AA, 793 ) 794 795 imu.push(debug_img) 796 797 show_time += 1 798 if show_time % 10 != 1: 799 return 800 801 imu.show(debug_img, title="Next column point", wait=False) 802 # time.sleep(0.003) 803 804 @log_calls(level=logging.DEBUG, include_return=True) 805 def _astar( 806 self, 807 img: np.ndarray, 808 start: tuple[int, int], 809 goals: list[tuple[int, int]], 810 direction: str, 811 ) -> Optional[List[Point]]: 812 """ 813 Find the best path between the start point and one of the goal points on the image 814 """ 815 816 if not goals: 817 return None 818 819 if self._scale != 1.0: 820 img = cv.resize(img, None, fx=self._scale, fy=self._scale) 821 start = (int(start[0] * self._scale), int(start[1] * self._scale)) 822 goals = [(int(g[0] * self._scale), int(g[1] * self._scale)) for g in goals] 823 824 # calculate bounding box with margin 825 all_points = goals + [start] 826 xs = [p[0] for p in all_points] 827 ys = [p[1] for p in all_points] 828 829 margin = 30 830 top_left = (max(0, min(xs) - margin), max(0, min(ys) - margin)) 831 bottom_right = ( 832 min(img.shape[1], max(xs) + margin), 833 min(img.shape[0], max(ys) + margin), 834 ) 835 836 # check bounds 837 if ( 838 top_left[0] >= bottom_right[0] 839 or top_left[1] >= bottom_right[1] 840 or top_left[0] >= img.shape[1] 841 or top_left[1] >= img.shape[0] 842 ): 843 return None 844 845 # transform coordinates to cropped image 846 start_local = (start[0] - top_left[0], start[1] - top_left[1]) 847 goals_local = [(g[0] - top_left[0], g[1] - top_left[1]) for g in goals] 848 849 cropped = img[top_left[1] : bottom_right[1], top_left[0] : bottom_right[0]] 850 851 if cropped.size == 0: 852 return None 853 854 path = rust_astar(cropped, start_local, goals_local, direction) 855 856 if path is None: 857 return None 858 859 if self._scale != 1.0: 860 path = [(int(p[0] / self._scale), int(p[1] / self._scale)) for p in path] 861 top_left = (int(top_left[0] / self._scale), int(top_left[1] / self._scale)) 862 863 return [(p[0] + top_left[0], p[1] + top_left[1]) for p in path]
Detects table grid intersections using morphological filtering and template matching.
This detector implements a multi-stage pipeline:
- Binarization: Sauvola adaptive thresholding to handle varying lighting
- Morphological operations: Dilation to connect broken rule segments
- Cross-kernel matching: Template matching with a cross-shaped kernel to find rule intersections where horizontal and vertical lines meet
- Grid growing: Iterative point detection starting from a known seed point
The cross-kernel is designed to match the specific geometry of your table rules. It should be sized so that after morphology, it aligns with actual corner shapes.
Tuning Guidelines
- kernel_size: Increase if you need more selectivity (fewer false positives)
- cross_width/height: Should match rule thickness after morphology
- morph_size: Increase to connect more broken lines, but this thickens rules
- sauvola_k: Increase to threshold more aggressively (remove noise)
- search_region: Increase for documents with more warping/distortion
- distance_penalty: Increase to prefer corners closer to expected positions
Visual Debugging
Set visual=True in methods to see intermediate results and tune parameters.
148 def __init__( 149 self, 150 kernel_size: int = 21, 151 cross_width: int = 6, 152 cross_height: Optional[int] = None, 153 morph_size: Optional[int] = None, 154 sauvola_k: float = 0.04, 155 sauvola_window: int = 15, 156 scale: float = 1.0, 157 search_region: int = 40, 158 distance_penalty: float = 0.4, 159 min_rows: int = 5, 160 grow_threshold: float = 0.3, 161 look_distance: int = 4, 162 ): 163 """ 164 Args: 165 kernel_size (int): the size of the cross kernel 166 a larger kernel size often means that more penalty is applied, often leading 167 to more sparse results 168 cross_width (int): the width of one of the edges in the cross filter, should be 169 roughly equal to the width of the rules in the image after morphology is applied 170 cross_height (int | None): useful if the horizontal rules and vertical rules 171 have different sizes 172 morph_size (int | None): the size of the morphology operators that are applied before 173 the cross kernel. 'bridges the gaps' of broken-up lines 174 sauvola_k (float): threshold parameter for sauvola thresholding 175 sauvola_window (int): window_size parameter for sauvola thresholding 176 scale (float): image scale factor to do calculations on (useful for increasing calculation speed mostly) 177 search_region (int): area in which to search for a new max value in `find_nearest` etc. 178 distance_penalty (float): how much the point finding algorithm penalizes points that are further in the region [0, 1] 179 min_rows (int): minimum number of rows to find before stopping the table finding algorithm 180 grow_threshold (float): the threshold for accepting a new point when growing the table 181 look_distance (int): how many points away to look when calculating the median slope 182 """ 183 self._validate_parameters( 184 kernel_size, 185 cross_width, 186 cross_height, 187 morph_size, 188 search_region, 189 sauvola_k, 190 sauvola_window, 191 distance_penalty, 192 ) 193 194 self._kernel_size = kernel_size 195 self._cross_width = cross_width 196 self._cross_height = cross_width if cross_height is None else cross_height 197 self._morph_size = morph_size if morph_size is not None else cross_width 198 self._search_region = search_region 199 self._sauvola_k = sauvola_k 200 self._sauvola_window = sauvola_window 201 self._distance_penalty = distance_penalty 202 self._scale = scale 203 self._min_rows = min_rows 204 self._grow_threshold = grow_threshold 205 self._look_distance = look_distance 206 207 self._cross_kernel = self._create_cross_kernel()
Arguments:
- kernel_size (int): the size of the cross kernel a larger kernel size often means that more penalty is applied, often leading to more sparse results
- cross_width (int): the width of one of the edges in the cross filter, should be roughly equal to the width of the rules in the image after morphology is applied
- cross_height (int | None): useful if the horizontal rules and vertical rules have different sizes
- morph_size (int | None): the size of the morphology operators that are applied before the cross kernel. 'bridges the gaps' of broken-up lines
- sauvola_k (float): threshold parameter for sauvola thresholding
- sauvola_window (int): window_size parameter for sauvola thresholding
- scale (float): image scale factor to do calculations on (useful for increasing calculation speed mostly)
- search_region (int): area in which to search for a new max value in
find_nearestetc. - distance_penalty (float): how much the point finding algorithm penalizes points that are further in the region [0, 1]
- min_rows (int): minimum number of rows to find before stopping the table finding algorithm
- grow_threshold (float): the threshold for accepting a new point when growing the table
- look_distance (int): how many points away to look when calculating the median slope
309 def apply(self, img: MatLike, visual: bool = False) -> MatLike: 310 """ 311 Apply the grid detection filter to the input image. 312 313 Args: 314 img (MatLike): the input image 315 visual (bool): whether to show intermediate steps 316 317 Returns: 318 MatLike: the filtered image, with high values (whiter pixels) at intersections of horizontal and vertical rules 319 """ 320 321 if img is None or img.size == 0: 322 raise ValueError("Input image is empty or None") 323 324 binary = imu.sauvola(img, k=self._sauvola_k, window_size=self._sauvola_window) 325 326 if visual: 327 imu.show(binary, title="thresholded") 328 329 binary = self._apply_morphology(binary) 330 331 if visual: 332 imu.show(binary, title="dilated") 333 334 filtered = self._apply_cross_matching(binary) 335 336 return filtered
Apply the grid detection filter to the input image.
Arguments:
- img (MatLike): the input image
- visual (bool): whether to show intermediate steps
Returns:
MatLike: the filtered image, with high values (whiter pixels) at intersections of horizontal and vertical rules
338 @log_calls(level=logging.DEBUG, include_return=True) 339 def find_nearest( 340 self, filtered: MatLike, point: Point, region: Optional[int] = None 341 ) -> Tuple[Point, float]: 342 """ 343 Find the nearest 'corner match' in the image, along with its score [0,1] 344 345 Args: 346 filtered (MatLike): the filtered image (obtained through `apply`) 347 point (tuple[int, int]): the approximate target point (x, y) 348 region (None | int): alternative value for search region, 349 overwriting the `__init__` parameter `region` 350 """ 351 352 if filtered is None or filtered.size == 0: 353 raise ValueError("Filtered image is empty or None") 354 355 region_size = region if region is not None else self._search_region 356 x, y = point 357 358 # Calculate crop boundaries 359 crop_x = max(0, x - region_size // 2) 360 crop_y = max(0, y - region_size // 2) 361 crop_width = min(region_size, filtered.shape[1] - crop_x) 362 crop_height = min(region_size, filtered.shape[0] - crop_y) 363 364 # Handle edge cases 365 if crop_width <= 0 or crop_height <= 0: 366 logger.warning(f"Point {point} is outside image bounds") 367 return point, 0.0 368 369 cropped = filtered[crop_y : crop_y + crop_height, crop_x : crop_x + crop_width] 370 371 if cropped.size == 0: 372 return point, 0.0 373 374 # Always apply Gaussian weighting by extending crop if needed 375 if cropped.shape[0] == region_size and cropped.shape[1] == region_size: 376 # Perfect size - apply weights directly 377 weights = self._create_gaussian_weights(region_size) 378 weighted = cropped.astype(np.float32) * weights 379 else: 380 # Extend crop to match region_size, apply weights, then restore 381 extended = np.zeros((region_size, region_size), dtype=cropped.dtype) 382 383 # Calculate offset to center the cropped region in extended array 384 offset_y = (region_size - cropped.shape[0]) // 2 385 offset_x = (region_size - cropped.shape[1]) // 2 386 387 # Place cropped region in center of extended array 388 extended[ 389 offset_y : offset_y + cropped.shape[0], 390 offset_x : offset_x + cropped.shape[1], 391 ] = cropped 392 393 # Apply Gaussian weights to extended array 394 weights = self._create_gaussian_weights(region_size) 395 weighted_extended = extended.astype(np.float32) * weights 396 397 # Extract the original region back out 398 weighted = weighted_extended[ 399 offset_y : offset_y + cropped.shape[0], 400 offset_x : offset_x + cropped.shape[1], 401 ] 402 403 best_idx = np.argmax(weighted) 404 best_y, best_x = np.unravel_index(best_idx, cropped.shape) 405 406 result_point = ( 407 int(crop_x + best_x), 408 int(crop_y + best_y), 409 ) 410 result_confidence = float(weighted[best_y, best_x]) / 255.0 411 412 return result_point, result_confidence
Find the nearest 'corner match' in the image, along with its score [0,1]
Arguments:
414 def find_table_points( 415 self, 416 img: MatLike | PathLike[str], 417 left_top: Point, 418 cell_widths: list[int], 419 cell_heights: list[int] | int, 420 visual: bool = False, 421 window: str = WINDOW, 422 goals_width: Optional[int] = None, 423 ) -> "TableGrid": 424 """ 425 Parse the image to a `TableGrid` structure that holds all of the 426 intersections between horizontal and vertical rules, starting near the `left_top` point 427 428 Args: 429 img (MatLike): the input image of a table 430 left_top (tuple[int, int]): the starting point of the algorithm 431 cell_widths (list[int]): the expected widths of the cells (based on a header template) 432 cell_heights (list[int]): the expected height of the rows of data. 433 The last value from this list is used until the image has no more vertical space. 434 visual (bool): whether to show intermediate steps 435 window (str): the name of the OpenCV window to use for visualization 436 goals_width (int | None): the width of the goal region when searching for the next point. 437 If None, defaults to 1.5 * search_region 438 439 Returns: 440 a TableGrid object 441 """ 442 443 if goals_width is None: 444 goals_width = self._search_region * 3 // 2 445 446 if not cell_widths: 447 raise ValueError("cell_widths must contain at least one value") 448 449 if not isinstance(img, np.ndarray): 450 img = cv.imread(os.fspath(img)) 451 452 filtered = self.apply(img, visual) 453 454 if visual: 455 imu.show(filtered, window=window) 456 457 if isinstance(cell_heights, int): 458 cell_heights = [cell_heights] 459 460 left_top, confidence = self.find_nearest( 461 filtered, left_top, int(self._search_region * 3) 462 ) 463 464 if confidence < 0.1: 465 logger.warning( 466 f"Low confidence for the starting point: {confidence} at {left_top}" 467 ) 468 469 # resize all parameters according to scale 470 img = cv.resize(img, None, fx=self._scale, fy=self._scale) 471 472 if visual: 473 imu.push(img) 474 475 filtered = cv.resize(filtered, None, fx=self._scale, fy=self._scale) 476 cell_widths = [int(w * self._scale) for w in cell_widths] 477 cell_heights = [int(h * self._scale) for h in cell_heights] 478 left_top = (int(left_top[0] * self._scale), int(left_top[1] * self._scale)) 479 self._search_region = int(self._search_region * self._scale) 480 481 img_gray = ensure_gray(img) 482 filtered_gray = ensure_gray(filtered) 483 484 table_grower = TableGrower( 485 img_gray, 486 filtered_gray, 487 cell_widths, # pyright: ignore 488 cell_heights, # pyright: ignore 489 left_top, 490 self._search_region, 491 self._distance_penalty, 492 self._look_distance, 493 self._grow_threshold, 494 self._min_rows, 495 ) 496 497 def show_grower_progress(wait: bool = False): 498 img_orig = np.copy(img) 499 corners = table_grower.get_all_corners() 500 for y in range(len(corners)): 501 for x in range(len(corners[y])): 502 if corners[y][x] is not None: 503 img_orig = imu.draw_points( 504 img_orig, 505 [corners[y][x]], 506 color=(0, 0, 255), 507 thickness=30, 508 ) 509 510 edge = table_grower.get_edge_points() 511 512 for point, score in edge: 513 color = (100, int(clamp(score * 255, 0, 255)), 100) 514 imu.draw_point(img_orig, point, color=color, thickness=20) 515 516 imu.show(img_orig, wait=wait) 517 518 if visual: 519 threshold = self._grow_threshold 520 look_distance = self._look_distance 521 522 # python implementation of rust loops, for visualization purposes 523 # note this is a LOT slower 524 while table_grower.grow_point(img_gray, filtered_gray) is not None: 525 show_grower_progress() 526 527 show_grower_progress(True) 528 529 original_threshold = threshold 530 531 loops_without_change = 0 532 533 while not table_grower.is_table_complete(): 534 loops_without_change += 1 535 536 if loops_without_change > 50: 537 break 538 539 if table_grower.extrapolate_one(img_gray, filtered_gray) is not None: 540 show_grower_progress() 541 542 loops_without_change = 0 543 544 grown = False 545 while table_grower.grow_point(img_gray, filtered_gray) is not None: 546 show_grower_progress() 547 grown = True 548 threshold = min(0.1 + 0.9 * threshold, original_threshold) 549 table_grower.set_threshold(threshold) 550 551 if not grown: 552 threshold *= 0.9 553 table_grower.set_threshold(threshold) 554 555 else: 556 threshold *= 0.9 557 table_grower.set_threshold(threshold) 558 559 if table_grower.grow_point(img_gray, filtered_gray) is not None: 560 show_grower_progress() 561 loops_without_change = 0 562 563 else: 564 table_grower.grow_table(img_gray, filtered_gray) 565 566 table_grower.smooth_grid() 567 corners = table_grower.get_all_corners() 568 logger.info( 569 f"Table growth complete, found {len(corners)} rows and {len(corners[0])} columns" 570 ) 571 # rescale corners back to original size 572 if self._scale != 1.0: 573 for y in range(len(corners)): 574 for x in range(len(corners[y])): 575 if corners[y][x] is not None: 576 corners[y][x] = ( 577 int(corners[y][x][0] / self._scale), # pyright:ignore 578 int(corners[y][x][1] / self._scale), # pyright:ignore 579 ) 580 581 return TableGrid(corners) # pyright: ignore
Parse the image to a TableGrid structure that holds all of the
intersections between horizontal and vertical rules, starting near the left_top point
Arguments:
- img (MatLike): the input image of a table
- left_top (tuple[int, int]): the starting point of the algorithm
- cell_widths (list[int]): the expected widths of the cells (based on a header template)
- cell_heights (list[int]): the expected height of the rows of data. The last value from this list is used until the image has no more vertical space.
- visual (bool): whether to show intermediate steps
- window (str): the name of the OpenCV window to use for visualization
- goals_width (int | None): the width of the goal region when searching for the next point. If None, defaults to 1.5 * search_region
Returns:
a TableGrid object
866class TableGrid(TableIndexer): 867 """ 868 A data class that allows segmenting the image into cells 869 """ 870 871 _right_offset: int | None = None 872 873 def __init__(self, points: list[list[Point]], right_offset: Optional[int] = None): 874 """ 875 Args: 876 points: a 2D list of intersections between hor. and vert. rules 877 """ 878 self._points = points 879 self._right_offset = right_offset 880 881 @property 882 def points(self) -> list[list[Point]]: 883 return self._points 884 885 def row(self, i: int) -> list[Point]: 886 assert 0 <= i and i < len(self._points) 887 return self._points[i] 888 889 @property 890 def cols(self) -> int: 891 if self._right_offset is not None: 892 return len(self.row(0)) - 2 893 else: 894 return len(self.row(0)) - 1 895 896 @property 897 def rows(self) -> int: 898 return len(self._points) - 1 899 900 @staticmethod 901 def from_split( 902 split_grids: Split["TableGrid"], offsets: Split[Point] 903 ) -> "TableGrid": 904 """ 905 Convert two ``TableGrid`` objects into one, that is able to segment the original (non-cropped) image 906 907 Args: 908 split_grids (Split[TableGrid]): a Split of TableGrid objects of the left and right part of the table 909 offsets (Split[tuple[int, int]]): a Split of the offsets in the image where the crop happened 910 """ 911 912 def offset_points(points, offset): 913 return [ 914 [(p[0] + offset[0], p[1] + offset[1]) for p in row] for row in points 915 ] 916 917 split_points = split_grids.apply( 918 lambda grid, offset: offset_points(grid.points, offset), offsets 919 ) 920 921 points = [] 922 923 rows = min(split_grids.left.rows, split_grids.right.rows) 924 925 for row in range(rows + 1): 926 row_points = [] 927 928 row_points.extend(split_points.left[row]) 929 row_points.extend(split_points.right[row]) 930 931 points.append(row_points) 932 933 table_grid = TableGrid(points, split_grids.left.cols) 934 935 return table_grid 936 937 def save(self, path: str | Path): 938 with open(path, "w") as f: 939 json.dump({"points": self.points, "right_offset": self._right_offset}, f) 940 941 @staticmethod 942 def from_saved(path: str | Path) -> "TableGrid": 943 with open(path, "r") as f: 944 points = json.load(f) 945 right_offset = points.get("right_offset", None) 946 points = [[(p[0], p[1]) for p in pointes] for pointes in points["points"]] 947 return TableGrid(points, right_offset) 948 949 def add_left_col(self, width: int): 950 for row in self._points: 951 first = row[0] 952 new_first = (first[0] - width, first[1]) 953 row.insert(0, new_first) 954 955 def add_top_row(self, height: int): 956 new_row = [] 957 for point in self._points[0]: 958 new_row.append((point[0], point[1] - height)) 959 960 self.points.insert(0, new_row) 961 962 def _surrounds(self, rect: list[Point], point: tuple[float, float]) -> bool: 963 """point: x, y""" 964 lt, rt, rb, lb = rect 965 x, y = point 966 967 top = _Rule(*lt, *rt) 968 if top._y_at_x(x) > y: 969 return False 970 971 right = _Rule(*rt, *rb) 972 if right._x_at_y(y) < x: 973 return False 974 975 bottom = _Rule(*lb, *rb) 976 if bottom._y_at_x(x) < y: 977 return False 978 979 left = _Rule(*lb, *lt) 980 if left._x_at_y(y) > x: 981 return False 982 983 return True 984 985 def cell(self, point: tuple[float, float]) -> tuple[int, int]: 986 for r in range(len(self._points) - 1): 987 offset = 0 988 for c in range(len(self.row(0)) - 1): 989 if self._right_offset is not None and c == self._right_offset: 990 offset = -1 991 continue 992 993 if self._surrounds( 994 [ 995 self._points[r][c], 996 self._points[r][c + 1], 997 self._points[r + 1][c + 1], 998 self._points[r + 1][c], 999 ], 1000 point, 1001 ): 1002 return (r, c + offset) 1003 1004 return (-1, -1) 1005 1006 def cell_polygon(self, cell: tuple[int, int]) -> tuple[Point, Point, Point, Point]: 1007 r, c = cell 1008 1009 self._check_row_idx(r) 1010 self._check_col_idx(c) 1011 1012 if self._right_offset is not None and c >= self._right_offset: 1013 c = c + 1 1014 1015 return ( 1016 self._points[r][c], 1017 self._points[r][c + 1], 1018 self._points[r + 1][c + 1], 1019 self._points[r + 1][c], 1020 ) 1021 1022 def region( 1023 self, start: tuple[int, int], end: tuple[int, int] 1024 ) -> tuple[Point, Point, Point, Point]: 1025 r0, c0 = start 1026 r1, c1 = end 1027 1028 self._check_row_idx(r0) 1029 self._check_row_idx(r1) 1030 self._check_col_idx(c0) 1031 self._check_col_idx(c1) 1032 1033 if self._right_offset is not None and c0 >= self._right_offset: 1034 c0 = c0 + 1 1035 1036 if self._right_offset is not None and c1 >= self._right_offset: 1037 c1 = c1 + 1 1038 1039 lt = self._points[r0][c0] 1040 rt = self._points[r0][c1 + 1] 1041 rb = self._points[r1 + 1][c1 + 1] 1042 lb = self._points[r1 + 1][c0] 1043 1044 return lt, rt, rb, lb 1045 1046 def visualize_points(self, img: MatLike): 1047 """ 1048 Draw the detected table points on the image for visual verification 1049 """ 1050 import colorsys 1051 1052 def clr(index, total_steps): 1053 hue = index / total_steps # Normalized hue between 0 and 1 1054 r, g, b = colorsys.hsv_to_rgb(hue, 1.0, 1.0) 1055 return int(r * 255), int(g * 255), int(b * 255) 1056 1057 for i, row in enumerate(self._points): 1058 for p in row: 1059 cv.circle(img, p, 4, clr(i, len(self._points)), -1) 1060 1061 imu.show(img) 1062 1063 def text_regions( 1064 self, img: MatLike, row: int, margin_x: int = 10, margin_y: int = -3 1065 ) -> list[tuple[tuple[int, int], tuple[int, int]]]: 1066 def vertical_rule_crop(row: int, col: int): 1067 self._check_col_idx(col) 1068 self._check_row_idx(row) 1069 1070 if self._right_offset is not None and col >= self._right_offset: 1071 col = col + 1 1072 1073 top = self._points[row][col] 1074 bottom = self._points[row + 1][col] 1075 1076 left = int(min(top[0], bottom[0])) 1077 right = int(max(top[0], bottom[0])) 1078 1079 return img[ 1080 int(top[1]) - margin_y : int(bottom[1]) + margin_y, 1081 left - margin_x : right + margin_x, 1082 ] 1083 1084 result = [] 1085 1086 start = None 1087 for col in range(self.cols): 1088 crop = vertical_rule_crop(row, col) 1089 text_over_score = imu.text_presence_score(crop) 1090 text_over = text_over_score > -0.10 1091 1092 if not text_over: 1093 if start is not None: 1094 result.append(((row, start), (row, col - 1))) 1095 start = col 1096 1097 if start is not None: 1098 result.append(((row, start), (row, self.cols - 1))) 1099 1100 return result
A data class that allows segmenting the image into cells
873 def __init__(self, points: list[list[Point]], right_offset: Optional[int] = None): 874 """ 875 Args: 876 points: a 2D list of intersections between hor. and vert. rules 877 """ 878 self._points = points 879 self._right_offset = right_offset
Arguments:
- points: a 2D list of intersections between hor. and vert. rules
900 @staticmethod 901 def from_split( 902 split_grids: Split["TableGrid"], offsets: Split[Point] 903 ) -> "TableGrid": 904 """ 905 Convert two ``TableGrid`` objects into one, that is able to segment the original (non-cropped) image 906 907 Args: 908 split_grids (Split[TableGrid]): a Split of TableGrid objects of the left and right part of the table 909 offsets (Split[tuple[int, int]]): a Split of the offsets in the image where the crop happened 910 """ 911 912 def offset_points(points, offset): 913 return [ 914 [(p[0] + offset[0], p[1] + offset[1]) for p in row] for row in points 915 ] 916 917 split_points = split_grids.apply( 918 lambda grid, offset: offset_points(grid.points, offset), offsets 919 ) 920 921 points = [] 922 923 rows = min(split_grids.left.rows, split_grids.right.rows) 924 925 for row in range(rows + 1): 926 row_points = [] 927 928 row_points.extend(split_points.left[row]) 929 row_points.extend(split_points.right[row]) 930 931 points.append(row_points) 932 933 table_grid = TableGrid(points, split_grids.left.cols) 934 935 return table_grid
Convert two TableGrid objects into one, that is able to segment the original (non-cropped) image
Arguments:
- split_grids (Split[TableGrid]): a Split of TableGrid objects of the left and right part of the table
- offsets (Split[tuple[int, int]]): a Split of the offsets in the image where the crop happened
941 @staticmethod 942 def from_saved(path: str | Path) -> "TableGrid": 943 with open(path, "r") as f: 944 points = json.load(f) 945 right_offset = points.get("right_offset", None) 946 points = [[(p[0], p[1]) for p in pointes] for pointes in points["points"]] 947 return TableGrid(points, right_offset)
985 def cell(self, point: tuple[float, float]) -> tuple[int, int]: 986 for r in range(len(self._points) - 1): 987 offset = 0 988 for c in range(len(self.row(0)) - 1): 989 if self._right_offset is not None and c == self._right_offset: 990 offset = -1 991 continue 992 993 if self._surrounds( 994 [ 995 self._points[r][c], 996 self._points[r][c + 1], 997 self._points[r + 1][c + 1], 998 self._points[r + 1][c], 999 ], 1000 point, 1001 ): 1002 return (r, c + offset) 1003 1004 return (-1, -1)
Returns the coordinate (row, col) of the cell that contains the given position
Arguments:
- point (tuple[float, float]): a location in the input image
Returns:
tuple[int, int]: the cell index (row, col) that contains the given point
1006 def cell_polygon(self, cell: tuple[int, int]) -> tuple[Point, Point, Point, Point]: 1007 r, c = cell 1008 1009 self._check_row_idx(r) 1010 self._check_col_idx(c) 1011 1012 if self._right_offset is not None and c >= self._right_offset: 1013 c = c + 1 1014 1015 return ( 1016 self._points[r][c], 1017 self._points[r][c + 1], 1018 self._points[r + 1][c + 1], 1019 self._points[r + 1][c], 1020 )
returns the polygon (used in e.g. opencv) that enscribes the cell at the given cell position
1022 def region( 1023 self, start: tuple[int, int], end: tuple[int, int] 1024 ) -> tuple[Point, Point, Point, Point]: 1025 r0, c0 = start 1026 r1, c1 = end 1027 1028 self._check_row_idx(r0) 1029 self._check_row_idx(r1) 1030 self._check_col_idx(c0) 1031 self._check_col_idx(c1) 1032 1033 if self._right_offset is not None and c0 >= self._right_offset: 1034 c0 = c0 + 1 1035 1036 if self._right_offset is not None and c1 >= self._right_offset: 1037 c1 = c1 + 1 1038 1039 lt = self._points[r0][c0] 1040 rt = self._points[r0][c1 + 1] 1041 rb = self._points[r1 + 1][c1 + 1] 1042 lb = self._points[r1 + 1][c0] 1043 1044 return lt, rt, rb, lb
Get the bounding box for the rectangular region that goes from start to end
Returns:
4 points: lt, rt, rb, lb, in format (x, y)
1046 def visualize_points(self, img: MatLike): 1047 """ 1048 Draw the detected table points on the image for visual verification 1049 """ 1050 import colorsys 1051 1052 def clr(index, total_steps): 1053 hue = index / total_steps # Normalized hue between 0 and 1 1054 r, g, b = colorsys.hsv_to_rgb(hue, 1.0, 1.0) 1055 return int(r * 255), int(g * 255), int(b * 255) 1056 1057 for i, row in enumerate(self._points): 1058 for p in row: 1059 cv.circle(img, p, 4, clr(i, len(self._points)), -1) 1060 1061 imu.show(img)
Draw the detected table points on the image for visual verification
1063 def text_regions( 1064 self, img: MatLike, row: int, margin_x: int = 10, margin_y: int = -3 1065 ) -> list[tuple[tuple[int, int], tuple[int, int]]]: 1066 def vertical_rule_crop(row: int, col: int): 1067 self._check_col_idx(col) 1068 self._check_row_idx(row) 1069 1070 if self._right_offset is not None and col >= self._right_offset: 1071 col = col + 1 1072 1073 top = self._points[row][col] 1074 bottom = self._points[row + 1][col] 1075 1076 left = int(min(top[0], bottom[0])) 1077 right = int(max(top[0], bottom[0])) 1078 1079 return img[ 1080 int(top[1]) - margin_y : int(bottom[1]) + margin_y, 1081 left - margin_x : right + margin_x, 1082 ] 1083 1084 result = [] 1085 1086 start = None 1087 for col in range(self.cols): 1088 crop = vertical_rule_crop(row, col) 1089 text_over_score = imu.text_presence_score(crop) 1090 text_over = text_over_score > -0.10 1091 1092 if not text_over: 1093 if start is not None: 1094 result.append(((row, start), (row, col - 1))) 1095 start = col 1096 1097 if start is not None: 1098 result.append(((row, start), (row, self.cols - 1))) 1099 1100 return result
Split the row into regions of continuous text
Returns list[tuple[int, int]]: a list of spans (start col, end col)
23class HeaderAligner: 24 """ 25 Aligns table header templates to subject images using feature-based registration. 26 27 This class uses ORB (Oriented FAST and Rotated BRIEF) feature detection and 28 matching to compute a homography transformation that maps points from a header 29 template image to their corresponding locations in full table images. 30 31 ## How it Works 32 33 1. **Feature Detection**: Extracts ORB keypoints from both template and subject 34 2. **Feature Matching**: Finds correspondences using Hamming distance 35 3. **Filtering**: Keeps top matches and prunes based on spatial consistency 36 4. **Homography Estimation**: Computes perspective transform using RANSAC 37 38 The computed homography can then transform any point from template space to 39 image space, allowing you to locate table structures based on your annotation. 40 41 ## Preprocessing Options 42 43 - Set `k` parameter to apply Sauvola thresholding before feature detection. 44 This can improve matching on documents with variable lighting. 45 - Set `k=None` to use raw images (just extract blue channel for BGR images) 46 47 ## Tuning Guidelines 48 49 - **max_features**: Increase if matching fails on complex templates 50 - **match_fraction**: Decrease if you get many incorrect matches 51 - **max_dist**: Increase for documents with more warping/distortion 52 - **scale**: Decrease (<1.0) to speed up on high-resolution images 53 54 Args: 55 template (MatLike | PathLike[str] | str | None): Header template image or path. 56 This should contain a clear, representative view of the table header. 57 max_features (int): Maximum ORB features to detect. More features = slower 58 but potentially more robust matching. 59 patch_size (int): ORB patch size for feature extraction. 60 match_fraction (float): Fraction [0, 1] of matches to keep after sorting by 61 quality. Higher = more matches but potentially more outliers. 62 scale (float): Image downscaling factor (0, 1] for processing speed. 63 max_dist (float): Maximum allowed distance (relative to image size) between 64 matched keypoints. Filters out spatially inconsistent matches. 65 k (float | None): Sauvola threshold parameter for preprocessing. If None, 66 no thresholding is applied. Typical range: 0.03-0.15. 67 """ 68 69 def __init__( 70 self, 71 template: None | MatLike | PathLike[str] | str = None, 72 max_features: int = 25_000, 73 patch_size: int = 31, 74 match_fraction: float = 0.6, 75 scale: float = 1.0, 76 max_dist: float = 1.00, 77 k: float | None = 0.05, 78 ): 79 """ 80 Args: 81 template (MatLike | str): (path of) template image, with the table template clearly visible 82 max_features (int): maximal number of features that will be extracted by ORB 83 patch_size (int): for ORB feature extractor 84 match_fraction (float): best fraction of matches that are kept 85 scale (float): image scale factor to do calculations on (useful for increasing calculation speed mostly) 86 max_dist (float): maximum distance (relative to image size) of matched features. 87 Increase this value if the warping between image and template needs to be more agressive 88 k (float | None): sauvola thresholding threshold value. If None, no sauvola thresholding is done 89 """ 90 91 if type(template) is str or type(template) is PathLike: 92 value = cv.imread(fspath(template)) 93 template = value 94 95 self._k = k 96 if scale > 1.0: 97 raise TauluException( 98 "Scaling up the image for header alignment is useless. Use 0 < scale <= 1.0" 99 ) 100 if scale == 0: 101 raise TauluException("Use 0 < scale <= 1.0") 102 103 self._scale = scale 104 self._template = self._scale_img(cast(MatLike, template)) 105 self._template_orig: None | MatLike = None 106 self._preprocess_template() 107 self._max_features = max_features 108 self._patch_size = patch_size 109 self._match_fraction = match_fraction 110 self._max_dist = max_dist 111 112 def _scale_img(self, img: MatLike) -> MatLike: 113 if self._scale == 1.0: 114 return img 115 116 return cv.resize(img, None, fx=self._scale, fy=self._scale) 117 118 def _unscale_img(self, img: MatLike) -> MatLike: 119 if self._scale == 1.0: 120 return img 121 122 return cv.resize(img, None, fx=1 / self._scale, fy=1 / self._scale) 123 124 def _unscale_homography(self, h: np.ndarray) -> np.ndarray: 125 if self._scale == 1.0: 126 return h 127 128 scale_matrix = np.diag([self._scale, self._scale, 1.0]) 129 # inv_scale_matrix = np.linalg.inv(scale_matrix) 130 inv_scale_matrix = np.diag([1.0 / self._scale, 1.0 / self._scale, 1.0]) 131 # return inv_scale_matrix @ h @ scale_matrix 132 return inv_scale_matrix @ h @ scale_matrix 133 134 @property 135 def template(self): 136 """The template image that subject images are aligned to""" 137 return self._template 138 139 @template.setter 140 def template(self, value: MatLike | str): 141 """Set the template image as a path or an image""" 142 143 if type(value) is str: 144 value = cv.imread(value) 145 self._template = value 146 147 # TODO: check if the image has the right properties (dimensions etc.) 148 self._template = cast(MatLike, value) 149 150 self._preprocess_template() 151 152 def _preprocess_template(self): 153 self._template_orig = cv.cvtColor(self._template, cv.COLOR_BGR2GRAY) 154 if self._k is not None: 155 self._template = imu.sauvola(self._template, self._k) 156 self._template = cv.bitwise_not(self._template) 157 else: 158 _, _, self._template = cv.split(self._template) 159 160 def _preprocess_image(self, img: MatLike): 161 if self._template_orig is None: 162 raise TauluException("process the template first") 163 164 if self._k is not None: 165 img = imu.sauvola(img, self._k) 166 img = cv.bitwise_not(img) 167 else: 168 _, _, img = cv.split(img) 169 170 return img 171 172 @log_calls(level=logging.DEBUG, include_return=True) 173 def _find_transform_of_template_on( 174 self, im: MatLike, visual: bool = False, window: str = WINDOW 175 ): 176 im = self._scale_img(im) 177 # Detect ORB features and compute descriptors. 178 orb = cv.ORB_create( 179 self._max_features, # type:ignore 180 patchSize=self._patch_size, 181 ) 182 keypoints_im, descriptors_im = orb.detectAndCompute(im, None) 183 keypoints_tg, descriptors_tg = orb.detectAndCompute(self._template, None) 184 185 # Match features 186 matcher = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True) 187 matches = matcher.match(descriptors_im, descriptors_tg) 188 189 # Sort matches by score 190 matches = sorted(matches, key=lambda x: x.distance) 191 192 # Remove not so good matches 193 numGoodMatches = int(len(matches) * self._match_fraction) 194 matches = matches[:numGoodMatches] 195 196 if visual: 197 final_img_filtered = cv.drawMatches( 198 im, 199 keypoints_im, 200 self._template, 201 keypoints_tg, 202 matches[:10], 203 None, # type:ignore 204 cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS, 205 ) 206 imu.show(final_img_filtered, title="matches", window=window) 207 208 # Extract location of good matches 209 points1 = np.zeros((len(matches), 2), dtype=np.float32) 210 points2 = np.zeros((len(matches), 2), dtype=np.float32) 211 212 for i, match in enumerate(matches): 213 points1[i, :] = keypoints_tg[match.trainIdx].pt 214 points2[i, :] = keypoints_im[match.queryIdx].pt 215 216 # Prune reference points based upon distance between 217 # key points. This assumes a fairly good alignment to start with 218 # due to the protocol used (location of the sheets) 219 p1 = pd.DataFrame(data=points1) 220 p2 = pd.DataFrame(data=points2) 221 refdist = abs(p1 - p2) 222 223 mask_x = refdist.loc[:, 0] < (im.shape[0] * self._max_dist) 224 mask_y = refdist.loc[:, 1] < (im.shape[1] * self._max_dist) 225 mask = mask_x & mask_y 226 points1 = points1[mask.to_numpy()] 227 points2 = points2[mask.to_numpy()] 228 229 # Find homography 230 h, _ = cv.findHomography(points1, points2, cv.RANSAC) 231 232 return self._unscale_homography(h) 233 234 def view_alignment(self, img: MatLike, h: NDArray): 235 """ 236 Show the alignment of the template on the given image 237 by transforming it using the supplied transformation matrix `h` 238 and visualising both on different channels 239 240 Args: 241 img (MatLike): the image on which the template is transformed 242 h (NDArray): the transformation matrix 243 """ 244 245 im = imu.ensure_gray(img) 246 header = imu.ensure_gray(self._unscale_img(self._template)) 247 height, width = im.shape 248 249 header_warped = cv.warpPerspective(header, h, (width, height)) 250 251 merged = np.full((height, width, 3), 255, dtype=np.uint8) 252 253 merged[..., 1] = im 254 merged[..., 2] = header_warped 255 256 return imu.show(merged) 257 258 @log_calls(level=logging.DEBUG, include_return=True) 259 def align( 260 self, img: MatLike | str, visual: bool = False, window: str = WINDOW 261 ) -> NDArray: 262 """ 263 Calculates a homogeneous transformation matrix that maps pixels of 264 the template to the given image 265 """ 266 267 logger.info("Aligning header with supplied table image") 268 269 if type(img) is str: 270 img = cv.imread(img) 271 img = cast(MatLike, img) 272 273 img = self._preprocess_image(img) 274 275 h = self._find_transform_of_template_on(img, visual, window) 276 277 if visual: 278 self.view_alignment(img, h) 279 280 return h 281 282 def template_to_img(self, h: NDArray, point: Iterable[int]) -> tuple[int, int]: 283 """ 284 Transform the given point (in template-space) using the transformation h 285 (obtained through the `align` method) 286 287 Args: 288 h (NDArray): transformation matrix of shape (3, 3) 289 point (Iterable[int]): the to-be-transformed point, should conform to (x, y) 290 """ 291 292 point = np.array([[point[0], point[1], 1]]) # type:ignore 293 transformed = np.dot(h, point.T) # type:ignore 294 295 transformed /= transformed[2] 296 297 return int(transformed[0][0]), int(transformed[1][0])
Aligns table header templates to subject images using feature-based registration.
This class uses ORB (Oriented FAST and Rotated BRIEF) feature detection and matching to compute a homography transformation that maps points from a header template image to their corresponding locations in full table images.
How it Works
- Feature Detection: Extracts ORB keypoints from both template and subject
- Feature Matching: Finds correspondences using Hamming distance
- Filtering: Keeps top matches and prunes based on spatial consistency
- Homography Estimation: Computes perspective transform using RANSAC
The computed homography can then transform any point from template space to image space, allowing you to locate table structures based on your annotation.
Preprocessing Options
- Set
kparameter to apply Sauvola thresholding before feature detection. This can improve matching on documents with variable lighting. - Set
k=Noneto use raw images (just extract blue channel for BGR images)
Tuning Guidelines
- max_features: Increase if matching fails on complex templates
- match_fraction: Decrease if you get many incorrect matches
- max_dist: Increase for documents with more warping/distortion
- scale: Decrease (<1.0) to speed up on high-resolution images
Arguments:
- template (MatLike | PathLike[str] | str | None): Header template image or path. This should contain a clear, representative view of the table header.
- max_features (int): Maximum ORB features to detect. More features = slower but potentially more robust matching.
- patch_size (int): ORB patch size for feature extraction.
- match_fraction (float): Fraction [0, 1] of matches to keep after sorting by quality. Higher = more matches but potentially more outliers.
- scale (float): Image downscaling factor (0, 1] for processing speed.
- max_dist (float): Maximum allowed distance (relative to image size) between matched keypoints. Filters out spatially inconsistent matches.
- k (float | None): Sauvola threshold parameter for preprocessing. If None, no thresholding is applied. Typical range: 0.03-0.15.
69 def __init__( 70 self, 71 template: None | MatLike | PathLike[str] | str = None, 72 max_features: int = 25_000, 73 patch_size: int = 31, 74 match_fraction: float = 0.6, 75 scale: float = 1.0, 76 max_dist: float = 1.00, 77 k: float | None = 0.05, 78 ): 79 """ 80 Args: 81 template (MatLike | str): (path of) template image, with the table template clearly visible 82 max_features (int): maximal number of features that will be extracted by ORB 83 patch_size (int): for ORB feature extractor 84 match_fraction (float): best fraction of matches that are kept 85 scale (float): image scale factor to do calculations on (useful for increasing calculation speed mostly) 86 max_dist (float): maximum distance (relative to image size) of matched features. 87 Increase this value if the warping between image and template needs to be more agressive 88 k (float | None): sauvola thresholding threshold value. If None, no sauvola thresholding is done 89 """ 90 91 if type(template) is str or type(template) is PathLike: 92 value = cv.imread(fspath(template)) 93 template = value 94 95 self._k = k 96 if scale > 1.0: 97 raise TauluException( 98 "Scaling up the image for header alignment is useless. Use 0 < scale <= 1.0" 99 ) 100 if scale == 0: 101 raise TauluException("Use 0 < scale <= 1.0") 102 103 self._scale = scale 104 self._template = self._scale_img(cast(MatLike, template)) 105 self._template_orig: None | MatLike = None 106 self._preprocess_template() 107 self._max_features = max_features 108 self._patch_size = patch_size 109 self._match_fraction = match_fraction 110 self._max_dist = max_dist
Arguments:
- template (MatLike | str): (path of) template image, with the table template clearly visible
- max_features (int): maximal number of features that will be extracted by ORB
- patch_size (int): for ORB feature extractor
- match_fraction (float): best fraction of matches that are kept
- scale (float): image scale factor to do calculations on (useful for increasing calculation speed mostly)
- max_dist (float): maximum distance (relative to image size) of matched features. Increase this value if the warping between image and template needs to be more agressive
- k (float | None): sauvola thresholding threshold value. If None, no sauvola thresholding is done
134 @property 135 def template(self): 136 """The template image that subject images are aligned to""" 137 return self._template
The template image that subject images are aligned to
234 def view_alignment(self, img: MatLike, h: NDArray): 235 """ 236 Show the alignment of the template on the given image 237 by transforming it using the supplied transformation matrix `h` 238 and visualising both on different channels 239 240 Args: 241 img (MatLike): the image on which the template is transformed 242 h (NDArray): the transformation matrix 243 """ 244 245 im = imu.ensure_gray(img) 246 header = imu.ensure_gray(self._unscale_img(self._template)) 247 height, width = im.shape 248 249 header_warped = cv.warpPerspective(header, h, (width, height)) 250 251 merged = np.full((height, width, 3), 255, dtype=np.uint8) 252 253 merged[..., 1] = im 254 merged[..., 2] = header_warped 255 256 return imu.show(merged)
Show the alignment of the template on the given image
by transforming it using the supplied transformation matrix h
and visualising both on different channels
Arguments:
- img (MatLike): the image on which the template is transformed
- h (NDArray): the transformation matrix
258 @log_calls(level=logging.DEBUG, include_return=True) 259 def align( 260 self, img: MatLike | str, visual: bool = False, window: str = WINDOW 261 ) -> NDArray: 262 """ 263 Calculates a homogeneous transformation matrix that maps pixels of 264 the template to the given image 265 """ 266 267 logger.info("Aligning header with supplied table image") 268 269 if type(img) is str: 270 img = cv.imread(img) 271 img = cast(MatLike, img) 272 273 img = self._preprocess_image(img) 274 275 h = self._find_transform_of_template_on(img, visual, window) 276 277 if visual: 278 self.view_alignment(img, h) 279 280 return h
Calculates a homogeneous transformation matrix that maps pixels of the template to the given image
282 def template_to_img(self, h: NDArray, point: Iterable[int]) -> tuple[int, int]: 283 """ 284 Transform the given point (in template-space) using the transformation h 285 (obtained through the `align` method) 286 287 Args: 288 h (NDArray): transformation matrix of shape (3, 3) 289 point (Iterable[int]): the to-be-transformed point, should conform to (x, y) 290 """ 291 292 point = np.array([[point[0], point[1], 1]]) # type:ignore 293 transformed = np.dot(h, point.T) # type:ignore 294 295 transformed /= transformed[2] 296 297 return int(transformed[0][0]), int(transformed[1][0])
Transform the given point (in template-space) using the transformation h
(obtained through the align method)
Arguments:
- h (NDArray): transformation matrix of shape (3, 3)
- point (Iterable[int]): the to-be-transformed point, should conform to (x, y)
151class HeaderTemplate(TableIndexer): 152 def __init__(self, rules: Iterable[Iterable[int]]): 153 """ 154 A TableTemplate is a collection of rules of a table. This class implements methods 155 for finding cell positions in a table image, given the template the image adheres to. 156 157 Args: 158 rules: 2D array of lines, where each line is represented as [x0, y0, x1, y1] 159 """ 160 161 super().__init__() 162 self._rules = [_Rule(*rule) for rule in rules] 163 self._h_rules = sorted( 164 [rule for rule in self._rules if rule._is_horizontal()], key=lambda r: r._y 165 ) 166 self._v_rules = sorted( 167 [rule for rule in self._rules if rule._is_vertical()], key=lambda r: r._x 168 ) 169 170 @log_calls(level=logging.DEBUG) 171 def save(self, path: PathLike[str]): 172 """ 173 Save the HeaderTemplate to the given path, as a json 174 """ 175 176 data = {"rules": [r.to_dict() for r in self._rules]} 177 178 with open(path, "w") as f: 179 json.dump(data, f) 180 181 @staticmethod 182 @log_calls(level=logging.DEBUG) 183 def from_saved(path: PathLike[str]) -> "HeaderTemplate": 184 with open(path, "r") as f: 185 data = json.load(f) 186 rules = data["rules"] 187 rules = [[r["x0"], r["y0"], r["x1"], r["y1"]] for r in rules] 188 189 return HeaderTemplate(rules) 190 191 @property 192 def cols(self) -> int: 193 return len(self._v_rules) - 1 194 195 @property 196 def rows(self) -> int: 197 return len(self._h_rules) - 1 198 199 @staticmethod 200 @log_calls(level=logging.DEBUG) 201 def annotate_image( 202 template: MatLike | str, crop: Optional[PathLike[str]] = None, margin: int = 10 203 ) -> "HeaderTemplate": 204 """ 205 Utility method that allows users to create a template form a template image. 206 207 The user is asked to click to annotate lines (two clicks per line). 208 209 Args: 210 template: the image on which to annotate the header lines 211 crop (str | None): if str, crop the template image first, then do the annotation. 212 The cropped image will be stored at the supplied path 213 margin (int): margin to add around the cropping of the header 214 """ 215 216 if type(template) is str: 217 value = cv.imread(template) 218 template = value 219 template = cast(MatLike, template) 220 221 if crop is not None: 222 cropped = HeaderTemplate._crop(template, margin) 223 cv.imwrite(os.fspath(crop), cropped) 224 template = cropped 225 226 start_point = None 227 lines: list[list[int]] = [] 228 229 anno_template = np.copy(template) 230 231 def get_point(event, x, y, flags, params): 232 nonlocal lines, start_point, anno_template 233 _ = flags 234 _ = params 235 if event == cv.EVENT_LBUTTONDOWN: 236 if start_point is not None: 237 line: list[int] = [start_point[1], start_point[0], x, y] 238 239 cv.line( # type:ignore 240 anno_template, # type:ignore 241 (start_point[1], start_point[0]), 242 (x, y), 243 (0, 255, 0), 244 2, 245 cv.LINE_AA, 246 ) 247 cv.imshow(constants.WINDOW, anno_template) # type:ignore 248 249 lines.append(line) 250 start_point = None 251 else: 252 start_point = (y, x) 253 elif event == cv.EVENT_RBUTTONDOWN: 254 start_point = None 255 256 # remove the last annotation 257 lines = lines[:-1] 258 259 anno_template = np.copy(anno_template) 260 261 for line in lines: 262 cv.line( 263 template, 264 (line[0], line[1]), 265 (line[2], line[3]), 266 (0, 255, 0), 267 2, 268 cv.LINE_AA, 269 ) 270 271 cv.imshow(constants.WINDOW, template) 272 273 print(ANNO_HELP) 274 275 imu.show(anno_template, get_point, title="annotate the header") 276 277 return HeaderTemplate(lines) 278 279 @staticmethod 280 @log_calls(level=logging.DEBUG, include_return=True) 281 def _crop(template: MatLike, margin: int = 10) -> MatLike: 282 """ 283 Crop the image to contain only the annotations, such that it can be used as the header image in the taulu workflow. 284 """ 285 286 points = [] 287 anno_template = np.copy(template) 288 289 def get_point(event, x, y, flags, params): 290 nonlocal points, anno_template 291 _ = flags 292 _ = params 293 if event == cv.EVENT_LBUTTONDOWN: 294 point = (x, y) 295 296 cv.circle( # type:ignore 297 anno_template, # type:ignore 298 (x, y), 299 4, 300 (0, 255, 0), 301 2, 302 ) 303 cv.imshow(constants.WINDOW, anno_template) # type:ignore 304 305 points.append(point) 306 elif event == cv.EVENT_RBUTTONDOWN: 307 # remove the last annotation 308 points = points[:-1] 309 310 anno_template = np.copy(anno_template) 311 312 for p in points: 313 cv.circle( 314 anno_template, 315 p, 316 4, 317 (0, 255, 0), 318 2, 319 ) 320 321 cv.imshow(constants.WINDOW, anno_template) 322 323 print(CROP_HELP) 324 325 imu.show(anno_template, get_point, title="crop the header") 326 327 assert len(points) == 4, ( 328 "you need to annotate the four corners of the table in order to crop it" 329 ) 330 331 # crop the image to contain all of the points (just crop rectangularly, x, y, w, h) 332 # Convert points to numpy array 333 points_np = np.array(points) 334 335 # Find bounding box 336 x_min = np.min(points_np[:, 0]) 337 y_min = np.min(points_np[:, 1]) 338 x_max = np.max(points_np[:, 0]) 339 y_max = np.max(points_np[:, 1]) 340 341 # Compute width and height 342 width = x_max - x_min 343 height = y_max - y_min 344 345 # Ensure integers and within image boundaries 346 x_min = max(int(x_min), 0) 347 y_min = max(int(y_min), 0) 348 width = int(width) 349 height = int(height) 350 351 # Crop the image 352 cropped = template[ 353 y_min - margin : y_min + height + margin, 354 x_min - margin : x_min + width + margin, 355 ] 356 357 return cropped 358 359 @staticmethod 360 def from_vgg_annotation(annotation: str) -> "HeaderTemplate": 361 """ 362 Create a TableTemplate from annotations made in [vgg](https://annotate.officialstatistics.org/), using the polylines tool. 363 364 Args: 365 annotation (str): the path of the annotation csv file 366 """ 367 368 rules = [] 369 with open(annotation, "r") as csvfile: 370 reader = csv.DictReader(csvfile) 371 for row in reader: 372 shape_attributes = json.loads(row["region_shape_attributes"]) 373 if shape_attributes["name"] == "polyline": 374 x_points = shape_attributes["all_points_x"] 375 y_points = shape_attributes["all_points_y"] 376 if len(x_points) == 2 and len(y_points) == 2: 377 rules.append( 378 [x_points[0], y_points[0], x_points[1], y_points[1]] 379 ) 380 381 return HeaderTemplate(rules) 382 383 def cell_width(self, i: int) -> int: 384 self._check_col_idx(i) 385 return int(self._v_rules[i + 1]._x - self._v_rules[i]._x) 386 387 def cell_widths(self, start: int = 0) -> list[int]: 388 return [self.cell_width(i) for i in range(start, self.cols)] 389 390 def cell_height(self, header_factor: float = 0.8) -> int: 391 return int((self._h_rules[1]._y - self._h_rules[0]._y) * header_factor) 392 393 def cell_heights(self, header_factors: list[float] | float) -> list[int]: 394 if isinstance(header_factors, float): 395 header_factors = [header_factors] 396 header_factors = cast(list, header_factors) 397 return [ 398 int((self._h_rules[1]._y - self._h_rules[0]._y) * f) for f in header_factors 399 ] 400 401 def intersection(self, index: tuple[int, int]) -> tuple[float, float]: 402 """ 403 Returns the interaction of the index[0]th horizontal rule and the 404 index[1]th vertical rule 405 """ 406 407 ints = self._h_rules[index[0]].intersection(self._v_rules[index[1]]) 408 assert ints is not None 409 return ints 410 411 def cell(self, point: tuple[float, float]) -> tuple[int, int]: 412 """ 413 Get the cell index (row, col) that corresponds with the point (x, y) in the template image 414 415 Args: 416 point (tuple[float, float]): the coordinates in the template image 417 418 Returns: 419 tuple[int, int]: (row, col) 420 """ 421 422 x, y = point 423 424 row = -1 425 col = -1 426 427 for i in range(self.rows): 428 y0 = self._h_rules[i]._y_at_x(x) 429 y1 = self._h_rules[i + 1]._y_at_x(x) 430 if min(y0, y1) <= y <= max(y0, y1): 431 row = i 432 break 433 434 for i in range(self.cols): 435 x0 = self._v_rules[i]._x_at_y(y) 436 x1 = self._v_rules[i + 1]._x_at_y(y) 437 if min(x0, x1) <= x <= max(x0, x1): 438 col = i 439 break 440 441 if row == -1 or col == -1: 442 return (-1, -1) 443 444 return (row, col) 445 446 def cell_polygon( 447 self, cell: tuple[int, int] 448 ) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]: 449 """ 450 Return points (x,y) that make up a polygon around the requested cell 451 (top left, top right, bottom right, bottom left) 452 """ 453 454 row, col = cell 455 456 self._check_col_idx(col) 457 self._check_row_idx(row) 458 459 top_rule = self._h_rules[row] 460 bottom_rule = self._h_rules[row + 1] 461 left_rule = self._v_rules[col] 462 right_rule = self._v_rules[col + 1] 463 464 # Calculate corner points using intersections 465 top_left = top_rule.intersection(left_rule) 466 top_right = top_rule.intersection(right_rule) 467 bottom_left = bottom_rule.intersection(left_rule) 468 bottom_right = bottom_rule.intersection(right_rule) 469 470 if not all( 471 [ 472 point is not None 473 for point in [top_left, top_right, bottom_left, bottom_right] 474 ] 475 ): 476 raise TauluException("the lines around this cell do not intersect") 477 478 return top_left, top_right, bottom_right, bottom_left # type:ignore 479 480 def region( 481 self, start: tuple[int, int], end: tuple[int, int] 482 ) -> tuple[Point, Point, Point, Point]: 483 self._check_row_idx(start[0]) 484 self._check_row_idx(end[0]) 485 self._check_col_idx(start[1]) 486 self._check_col_idx(end[1]) 487 488 # the rules that surround this row 489 top_rule = self._h_rules[start[0]] 490 bottom_rule = self._h_rules[end[0] + 1] 491 left_rule = self._v_rules[start[1]] 492 right_rule = self._v_rules[end[1] + 1] 493 494 # four points that will be the bounding polygon of the result, 495 # which needs to be rectified 496 top_left = top_rule.intersection(left_rule) 497 top_right = top_rule.intersection(right_rule) 498 bottom_left = bottom_rule.intersection(left_rule) 499 bottom_right = bottom_rule.intersection(right_rule) 500 501 if ( 502 top_left is None 503 or top_right is None 504 or bottom_left is None 505 or bottom_right is None 506 ): 507 raise TauluException("the lines around this row do not intersect properly") 508 509 def to_point(pnt) -> Point: 510 return (int(pnt[0]), int(pnt[1])) 511 512 return ( 513 to_point(top_left), 514 to_point(top_right), 515 to_point(bottom_right), 516 to_point(bottom_left), 517 ) 518 519 def text_regions( 520 self, img: MatLike, row: int, margin_x: int = 10, margin_y: int = -20 521 ) -> list[tuple[tuple[int, int], tuple[int, int]]]: 522 raise TauluException("text_regions should not be called on a HeaderTemplate")
Subclasses implement methods for going from a pixel in the input image to a table cell index, and cropping an image to the given table cell index.
152 def __init__(self, rules: Iterable[Iterable[int]]): 153 """ 154 A TableTemplate is a collection of rules of a table. This class implements methods 155 for finding cell positions in a table image, given the template the image adheres to. 156 157 Args: 158 rules: 2D array of lines, where each line is represented as [x0, y0, x1, y1] 159 """ 160 161 super().__init__() 162 self._rules = [_Rule(*rule) for rule in rules] 163 self._h_rules = sorted( 164 [rule for rule in self._rules if rule._is_horizontal()], key=lambda r: r._y 165 ) 166 self._v_rules = sorted( 167 [rule for rule in self._rules if rule._is_vertical()], key=lambda r: r._x 168 )
A TableTemplate is a collection of rules of a table. This class implements methods for finding cell positions in a table image, given the template the image adheres to.
Arguments:
- rules: 2D array of lines, where each line is represented as [x0, y0, x1, y1]
170 @log_calls(level=logging.DEBUG) 171 def save(self, path: PathLike[str]): 172 """ 173 Save the HeaderTemplate to the given path, as a json 174 """ 175 176 data = {"rules": [r.to_dict() for r in self._rules]} 177 178 with open(path, "w") as f: 179 json.dump(data, f)
Save the HeaderTemplate to the given path, as a json
199 @staticmethod 200 @log_calls(level=logging.DEBUG) 201 def annotate_image( 202 template: MatLike | str, crop: Optional[PathLike[str]] = None, margin: int = 10 203 ) -> "HeaderTemplate": 204 """ 205 Utility method that allows users to create a template form a template image. 206 207 The user is asked to click to annotate lines (two clicks per line). 208 209 Args: 210 template: the image on which to annotate the header lines 211 crop (str | None): if str, crop the template image first, then do the annotation. 212 The cropped image will be stored at the supplied path 213 margin (int): margin to add around the cropping of the header 214 """ 215 216 if type(template) is str: 217 value = cv.imread(template) 218 template = value 219 template = cast(MatLike, template) 220 221 if crop is not None: 222 cropped = HeaderTemplate._crop(template, margin) 223 cv.imwrite(os.fspath(crop), cropped) 224 template = cropped 225 226 start_point = None 227 lines: list[list[int]] = [] 228 229 anno_template = np.copy(template) 230 231 def get_point(event, x, y, flags, params): 232 nonlocal lines, start_point, anno_template 233 _ = flags 234 _ = params 235 if event == cv.EVENT_LBUTTONDOWN: 236 if start_point is not None: 237 line: list[int] = [start_point[1], start_point[0], x, y] 238 239 cv.line( # type:ignore 240 anno_template, # type:ignore 241 (start_point[1], start_point[0]), 242 (x, y), 243 (0, 255, 0), 244 2, 245 cv.LINE_AA, 246 ) 247 cv.imshow(constants.WINDOW, anno_template) # type:ignore 248 249 lines.append(line) 250 start_point = None 251 else: 252 start_point = (y, x) 253 elif event == cv.EVENT_RBUTTONDOWN: 254 start_point = None 255 256 # remove the last annotation 257 lines = lines[:-1] 258 259 anno_template = np.copy(anno_template) 260 261 for line in lines: 262 cv.line( 263 template, 264 (line[0], line[1]), 265 (line[2], line[3]), 266 (0, 255, 0), 267 2, 268 cv.LINE_AA, 269 ) 270 271 cv.imshow(constants.WINDOW, template) 272 273 print(ANNO_HELP) 274 275 imu.show(anno_template, get_point, title="annotate the header") 276 277 return HeaderTemplate(lines)
Utility method that allows users to create a template form a template image.
The user is asked to click to annotate lines (two clicks per line).
Arguments:
- template: the image on which to annotate the header lines
- crop (str | None): if str, crop the template image first, then do the annotation. The cropped image will be stored at the supplied path
- margin (int): margin to add around the cropping of the header
359 @staticmethod 360 def from_vgg_annotation(annotation: str) -> "HeaderTemplate": 361 """ 362 Create a TableTemplate from annotations made in [vgg](https://annotate.officialstatistics.org/), using the polylines tool. 363 364 Args: 365 annotation (str): the path of the annotation csv file 366 """ 367 368 rules = [] 369 with open(annotation, "r") as csvfile: 370 reader = csv.DictReader(csvfile) 371 for row in reader: 372 shape_attributes = json.loads(row["region_shape_attributes"]) 373 if shape_attributes["name"] == "polyline": 374 x_points = shape_attributes["all_points_x"] 375 y_points = shape_attributes["all_points_y"] 376 if len(x_points) == 2 and len(y_points) == 2: 377 rules.append( 378 [x_points[0], y_points[0], x_points[1], y_points[1]] 379 ) 380 381 return HeaderTemplate(rules)
Create a TableTemplate from annotations made in vgg, using the polylines tool.
Arguments:
- annotation (str): the path of the annotation csv file
393 def cell_heights(self, header_factors: list[float] | float) -> list[int]: 394 if isinstance(header_factors, float): 395 header_factors = [header_factors] 396 header_factors = cast(list, header_factors) 397 return [ 398 int((self._h_rules[1]._y - self._h_rules[0]._y) * f) for f in header_factors 399 ]
401 def intersection(self, index: tuple[int, int]) -> tuple[float, float]: 402 """ 403 Returns the interaction of the index[0]th horizontal rule and the 404 index[1]th vertical rule 405 """ 406 407 ints = self._h_rules[index[0]].intersection(self._v_rules[index[1]]) 408 assert ints is not None 409 return ints
Returns the interaction of the index[0]th horizontal rule and the index[1]th vertical rule
411 def cell(self, point: tuple[float, float]) -> tuple[int, int]: 412 """ 413 Get the cell index (row, col) that corresponds with the point (x, y) in the template image 414 415 Args: 416 point (tuple[float, float]): the coordinates in the template image 417 418 Returns: 419 tuple[int, int]: (row, col) 420 """ 421 422 x, y = point 423 424 row = -1 425 col = -1 426 427 for i in range(self.rows): 428 y0 = self._h_rules[i]._y_at_x(x) 429 y1 = self._h_rules[i + 1]._y_at_x(x) 430 if min(y0, y1) <= y <= max(y0, y1): 431 row = i 432 break 433 434 for i in range(self.cols): 435 x0 = self._v_rules[i]._x_at_y(y) 436 x1 = self._v_rules[i + 1]._x_at_y(y) 437 if min(x0, x1) <= x <= max(x0, x1): 438 col = i 439 break 440 441 if row == -1 or col == -1: 442 return (-1, -1) 443 444 return (row, col)
Get the cell index (row, col) that corresponds with the point (x, y) in the template image
Arguments:
- point (tuple[float, float]): the coordinates in the template image
Returns:
tuple[int, int]: (row, col)
446 def cell_polygon( 447 self, cell: tuple[int, int] 448 ) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]: 449 """ 450 Return points (x,y) that make up a polygon around the requested cell 451 (top left, top right, bottom right, bottom left) 452 """ 453 454 row, col = cell 455 456 self._check_col_idx(col) 457 self._check_row_idx(row) 458 459 top_rule = self._h_rules[row] 460 bottom_rule = self._h_rules[row + 1] 461 left_rule = self._v_rules[col] 462 right_rule = self._v_rules[col + 1] 463 464 # Calculate corner points using intersections 465 top_left = top_rule.intersection(left_rule) 466 top_right = top_rule.intersection(right_rule) 467 bottom_left = bottom_rule.intersection(left_rule) 468 bottom_right = bottom_rule.intersection(right_rule) 469 470 if not all( 471 [ 472 point is not None 473 for point in [top_left, top_right, bottom_left, bottom_right] 474 ] 475 ): 476 raise TauluException("the lines around this cell do not intersect") 477 478 return top_left, top_right, bottom_right, bottom_left # type:ignore
Return points (x,y) that make up a polygon around the requested cell (top left, top right, bottom right, bottom left)
480 def region( 481 self, start: tuple[int, int], end: tuple[int, int] 482 ) -> tuple[Point, Point, Point, Point]: 483 self._check_row_idx(start[0]) 484 self._check_row_idx(end[0]) 485 self._check_col_idx(start[1]) 486 self._check_col_idx(end[1]) 487 488 # the rules that surround this row 489 top_rule = self._h_rules[start[0]] 490 bottom_rule = self._h_rules[end[0] + 1] 491 left_rule = self._v_rules[start[1]] 492 right_rule = self._v_rules[end[1] + 1] 493 494 # four points that will be the bounding polygon of the result, 495 # which needs to be rectified 496 top_left = top_rule.intersection(left_rule) 497 top_right = top_rule.intersection(right_rule) 498 bottom_left = bottom_rule.intersection(left_rule) 499 bottom_right = bottom_rule.intersection(right_rule) 500 501 if ( 502 top_left is None 503 or top_right is None 504 or bottom_left is None 505 or bottom_right is None 506 ): 507 raise TauluException("the lines around this row do not intersect properly") 508 509 def to_point(pnt) -> Point: 510 return (int(pnt[0]), int(pnt[1])) 511 512 return ( 513 to_point(top_left), 514 to_point(top_right), 515 to_point(bottom_right), 516 to_point(bottom_left), 517 )
Get the bounding box for the rectangular region that goes from start to end
Returns:
4 points: lt, rt, rb, lb, in format (x, y)
519 def text_regions( 520 self, img: MatLike, row: int, margin_x: int = 10, margin_y: int = -20 521 ) -> list[tuple[tuple[int, int], tuple[int, int]]]: 522 raise TauluException("text_regions should not be called on a HeaderTemplate")
Split the row into regions of continuous text
Returns list[tuple[int, int]]: a list of spans (start col, end col)
72class TableIndexer(ABC): 73 """ 74 Subclasses implement methods for going from a pixel in the input image to a table cell index, 75 and cropping an image to the given table cell index. 76 """ 77 78 def __init__(self): 79 self._col_offset = 0 80 81 @property 82 def col_offset(self) -> int: 83 return self._col_offset 84 85 @col_offset.setter 86 def col_offset(self, value: int): 87 assert value >= 0 88 self._col_offset = value 89 90 @property 91 @abstractmethod 92 def cols(self) -> int: 93 pass 94 95 @property 96 @abstractmethod 97 def rows(self) -> int: 98 pass 99 100 def cells(self) -> Generator[tuple[int, int], None, None]: 101 for row in range(self.rows): 102 for col in range(self.cols): 103 yield (row, col) 104 105 def _check_row_idx(self, row: int): 106 if row < 0: 107 raise TauluException("row number needs to be positive or zero") 108 if row >= self.rows: 109 raise TauluException(f"row number too high: {row} >= {self.rows}") 110 111 def _check_col_idx(self, col: int): 112 if col < 0: 113 raise TauluException("col number needs to be positive or zero") 114 if col >= self.cols: 115 raise TauluException(f"col number too high: {col} >= {self.cols}") 116 117 @abstractmethod 118 def cell(self, point: tuple[float, float]) -> tuple[int, int]: 119 """ 120 Returns the coordinate (row, col) of the cell that contains the given position 121 122 Args: 123 point (tuple[float, float]): a location in the input image 124 125 Returns: 126 tuple[int, int]: the cell index (row, col) that contains the given point 127 """ 128 pass 129 130 @abstractmethod 131 def cell_polygon( 132 self, cell: tuple[int, int] 133 ) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]: 134 """returns the polygon (used in e.g. opencv) that enscribes the cell at the given cell position""" 135 pass 136 137 def _highlight_cell( 138 self, 139 image: MatLike, 140 cell: tuple[int, int], 141 color: tuple[int, int, int] = (0, 0, 255), 142 thickness: int = 2, 143 ): 144 polygon = self.cell_polygon(cell) 145 points = np.int32(list(polygon)) # type:ignore 146 cv.polylines(image, [points], True, color, thickness, cv.LINE_AA) # type:ignore 147 cv.putText( 148 image, 149 str(cell), 150 (int(polygon[3][0] + 10), int(polygon[3][1] - 10)), 151 cv.FONT_HERSHEY_PLAIN, 152 2.0, 153 (255, 255, 255), 154 2, 155 ) 156 157 def highlight_all_cells( 158 self, 159 image: MatLike, 160 color: tuple[int, int, int] = (0, 0, 255), 161 thickness: int = 1, 162 ) -> MatLike: 163 img = np.copy(image) 164 165 for cell in self.cells(): 166 self._highlight_cell(img, cell, color, thickness) 167 168 return img 169 170 def select_one_cell( 171 self, 172 image: MatLike, 173 window: str = WINDOW, 174 color: tuple[int, int, int] = (255, 0, 0), 175 thickness: int = 2, 176 ) -> tuple[int, int] | None: 177 clicked = None 178 179 def click_event(event, x, y, flags, params): 180 nonlocal clicked 181 182 img = np.copy(image) 183 _ = flags 184 _ = params 185 if event == cv.EVENT_LBUTTONDOWN: 186 cell = self.cell((x, y)) 187 if cell[0] >= 0: 188 clicked = cell 189 else: 190 return 191 self._highlight_cell(img, cell, color, thickness) 192 cv.imshow(window, img) 193 194 imu.show(image, click_event=click_event, title="select one cell", window=window) 195 196 return clicked 197 198 def show_cells( 199 self, image: MatLike | os.PathLike[str] | str, window: str = WINDOW 200 ) -> list[tuple[int, int]]: 201 if not isinstance(image, np.ndarray): 202 image = cv.imread(os.fspath(image)) 203 204 img = np.copy(image) 205 206 cells = [] 207 208 def click_event(event, x, y, flags, params): 209 _ = flags 210 _ = params 211 if event == cv.EVENT_LBUTTONDOWN: 212 cell = self.cell((x, y)) 213 if cell[0] >= 0: 214 cells.append(cell) 215 else: 216 return 217 self._highlight_cell(img, cell) 218 cv.imshow(window, img) 219 220 imu.show( 221 img, 222 click_event=click_event, 223 title="click to highlight cells", 224 window=window, 225 ) 226 227 return cells 228 229 @abstractmethod 230 def region( 231 self, 232 start: tuple[int, int], 233 end: tuple[int, int], 234 ) -> tuple[Point, Point, Point, Point]: 235 """ 236 Get the bounding box for the rectangular region that goes from start to end 237 238 Returns: 239 4 points: lt, rt, rb, lb, in format (x, y) 240 """ 241 pass 242 243 def crop_region( 244 self, 245 image: MatLike, 246 start: tuple[int, int], 247 end: tuple[int, int], 248 margin: int = 0, 249 margin_top: int | None = None, 250 margin_bottom: int | None = None, 251 margin_left: int | None = None, 252 margin_right: int | None = None, 253 margin_y: int | None = None, 254 margin_x: int | None = None, 255 ) -> MatLike: 256 """Crop the input image to a rectangular region with the start and end cells as extremes""" 257 258 region = self.region(start, end) 259 260 lt, rt, rb, lb = _apply_margin( 261 *region, 262 margin=margin, 263 margin_top=margin_top, 264 margin_bottom=margin_bottom, 265 margin_left=margin_left, 266 margin_right=margin_right, 267 margin_y=margin_y, 268 margin_x=margin_x, 269 ) 270 271 # apply margins according to priority: 272 # margin_top > margin_y > margin (etc.) 273 274 w = (rt[0] - lt[0] + rb[0] - lb[0]) / 2 275 h = (rb[1] - rt[1] + lb[1] - lt[1]) / 2 276 277 # crop by doing a perspective transform to the desired quad 278 src_pts = np.array([lt, rt, rb, lb], dtype="float32") 279 dst_pts = np.array([[0, 0], [w, 0], [w, h], [0, h]], dtype="float32") 280 M = cv.getPerspectiveTransform(src_pts, dst_pts) 281 warped = cv.warpPerspective(image, M, (int(w), int(h))) # type:ignore 282 283 return warped 284 285 @abstractmethod 286 def text_regions( 287 self, img: MatLike, row: int, margin_x: int = 0, margin_y: int = 0 288 ) -> list[tuple[tuple[int, int], tuple[int, int]]]: 289 """ 290 Split the row into regions of continuous text 291 292 Returns 293 list[tuple[int, int]]: a list of spans (start col, end col) 294 """ 295 296 pass 297 298 def crop_cell(self, image, cell: tuple[int, int], margin: int = 0) -> MatLike: 299 return self.crop_region(image, cell, cell, margin)
Subclasses implement methods for going from a pixel in the input image to a table cell index, and cropping an image to the given table cell index.
117 @abstractmethod 118 def cell(self, point: tuple[float, float]) -> tuple[int, int]: 119 """ 120 Returns the coordinate (row, col) of the cell that contains the given position 121 122 Args: 123 point (tuple[float, float]): a location in the input image 124 125 Returns: 126 tuple[int, int]: the cell index (row, col) that contains the given point 127 """ 128 pass
Returns the coordinate (row, col) of the cell that contains the given position
Arguments:
- point (tuple[float, float]): a location in the input image
Returns:
tuple[int, int]: the cell index (row, col) that contains the given point
130 @abstractmethod 131 def cell_polygon( 132 self, cell: tuple[int, int] 133 ) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]: 134 """returns the polygon (used in e.g. opencv) that enscribes the cell at the given cell position""" 135 pass
returns the polygon (used in e.g. opencv) that enscribes the cell at the given cell position
170 def select_one_cell( 171 self, 172 image: MatLike, 173 window: str = WINDOW, 174 color: tuple[int, int, int] = (255, 0, 0), 175 thickness: int = 2, 176 ) -> tuple[int, int] | None: 177 clicked = None 178 179 def click_event(event, x, y, flags, params): 180 nonlocal clicked 181 182 img = np.copy(image) 183 _ = flags 184 _ = params 185 if event == cv.EVENT_LBUTTONDOWN: 186 cell = self.cell((x, y)) 187 if cell[0] >= 0: 188 clicked = cell 189 else: 190 return 191 self._highlight_cell(img, cell, color, thickness) 192 cv.imshow(window, img) 193 194 imu.show(image, click_event=click_event, title="select one cell", window=window) 195 196 return clicked
198 def show_cells( 199 self, image: MatLike | os.PathLike[str] | str, window: str = WINDOW 200 ) -> list[tuple[int, int]]: 201 if not isinstance(image, np.ndarray): 202 image = cv.imread(os.fspath(image)) 203 204 img = np.copy(image) 205 206 cells = [] 207 208 def click_event(event, x, y, flags, params): 209 _ = flags 210 _ = params 211 if event == cv.EVENT_LBUTTONDOWN: 212 cell = self.cell((x, y)) 213 if cell[0] >= 0: 214 cells.append(cell) 215 else: 216 return 217 self._highlight_cell(img, cell) 218 cv.imshow(window, img) 219 220 imu.show( 221 img, 222 click_event=click_event, 223 title="click to highlight cells", 224 window=window, 225 ) 226 227 return cells
229 @abstractmethod 230 def region( 231 self, 232 start: tuple[int, int], 233 end: tuple[int, int], 234 ) -> tuple[Point, Point, Point, Point]: 235 """ 236 Get the bounding box for the rectangular region that goes from start to end 237 238 Returns: 239 4 points: lt, rt, rb, lb, in format (x, y) 240 """ 241 pass
Get the bounding box for the rectangular region that goes from start to end
Returns:
4 points: lt, rt, rb, lb, in format (x, y)
243 def crop_region( 244 self, 245 image: MatLike, 246 start: tuple[int, int], 247 end: tuple[int, int], 248 margin: int = 0, 249 margin_top: int | None = None, 250 margin_bottom: int | None = None, 251 margin_left: int | None = None, 252 margin_right: int | None = None, 253 margin_y: int | None = None, 254 margin_x: int | None = None, 255 ) -> MatLike: 256 """Crop the input image to a rectangular region with the start and end cells as extremes""" 257 258 region = self.region(start, end) 259 260 lt, rt, rb, lb = _apply_margin( 261 *region, 262 margin=margin, 263 margin_top=margin_top, 264 margin_bottom=margin_bottom, 265 margin_left=margin_left, 266 margin_right=margin_right, 267 margin_y=margin_y, 268 margin_x=margin_x, 269 ) 270 271 # apply margins according to priority: 272 # margin_top > margin_y > margin (etc.) 273 274 w = (rt[0] - lt[0] + rb[0] - lb[0]) / 2 275 h = (rb[1] - rt[1] + lb[1] - lt[1]) / 2 276 277 # crop by doing a perspective transform to the desired quad 278 src_pts = np.array([lt, rt, rb, lb], dtype="float32") 279 dst_pts = np.array([[0, 0], [w, 0], [w, h], [0, h]], dtype="float32") 280 M = cv.getPerspectiveTransform(src_pts, dst_pts) 281 warped = cv.warpPerspective(image, M, (int(w), int(h))) # type:ignore 282 283 return warped
Crop the input image to a rectangular region with the start and end cells as extremes
285 @abstractmethod 286 def text_regions( 287 self, img: MatLike, row: int, margin_x: int = 0, margin_y: int = 0 288 ) -> list[tuple[tuple[int, int], tuple[int, int]]]: 289 """ 290 Split the row into regions of continuous text 291 292 Returns 293 list[tuple[int, int]]: a list of spans (start col, end col) 294 """ 295 296 pass
Split the row into regions of continuous text
Returns list[tuple[int, int]]: a list of spans (start col, end col)
15class Split(Generic[T]): 16 """ 17 Container for paired left/right data with convenient manipulation methods. 18 19 The Split class is designed for working with table images that span two pages 20 or have distinct left and right sections. It allows you to: 21 - Store related data for both sides 22 - Apply functions to both sides simultaneously 23 - Access attributes/methods of contained objects transparently 24 25 Examples: 26 >>> # Create a split with different parameters for each side 27 >>> thresholds = Split(0.25, 0.30) 28 >>> 29 >>> # Apply a function to both sides 30 >>> images = Split(left_img, right_img) 31 >>> processed = images.apply(lambda img: cv2.blur(img, (5, 5))) 32 >>> 33 >>> # Use with different parameters per side 34 >>> results = images.apply( 35 ... lambda img, k: sauvola_threshold(img, k), 36 ... k=thresholds # k.left used for left img, k.right for right 37 ... ) 38 >>> 39 >>> # Access methods of contained objects directly 40 >>> templates = Split(template_left, template_right) 41 >>> widths = templates.cell_widths(0) # Calls on both templates 42 43 Type Parameters: 44 T: The type of objects stored in left and right 45 """ 46 47 def __init__(self, left: T | None = None, right: T | None = None): 48 """ 49 Initialize a Split container. 50 51 Args: 52 left: Data for the left side 53 right: Data for the right side 54 55 Note: 56 Both can initially be None. Use the `append` method or set 57 properties directly to populate. 58 """ 59 self._left = left 60 self._right = right 61 62 @property 63 def left(self) -> T: 64 assert self._left is not None 65 return self._left 66 67 @left.setter 68 def left(self, value: T): 69 self._left = value 70 71 @property 72 def right(self) -> T: 73 assert self._right is not None 74 return self._right 75 76 @right.setter 77 def right(self, value: T): 78 self._right = value 79 80 def append(self, value: T): 81 if self._left is None: 82 self._left = value 83 else: 84 self._right = value 85 86 def __repr__(self) -> str: 87 return f"left: {self._left}, right: {self._right}" 88 89 def __iter__(self): 90 assert self._left is not None 91 assert self._right is not None 92 return iter((self._left, self._right)) 93 94 def __getitem__(self, index: bool) -> T: 95 assert self._left is not None 96 assert self._right is not None 97 if int(index) == 0: 98 return self._left 99 else: 100 return self._right 101 102 def apply( 103 self, 104 funcs: "Split[Callable[[T, *Any], V]] | Callable[[T, *Any], V]", 105 *args, 106 **kwargs, 107 ) -> "Split[V]": 108 if not isinstance(funcs, Split): 109 funcs = Split(funcs, funcs) 110 111 def get_arg(side: str, arg): 112 if isinstance(arg, Split): 113 return getattr(arg, side) 114 return arg 115 116 def call(side: str): 117 func = getattr(funcs, side) 118 target = getattr(self, side) 119 120 side_args = [get_arg(side, arg) for arg in args] 121 side_kwargs = {k: get_arg(side, v) for k, v in kwargs.items()} 122 123 return func(target, *side_args, **side_kwargs) 124 125 return Split(call("left"), call("right")) 126 127 def __getattr__(self, attr_name: str): 128 if attr_name in self.__dict__: 129 return getattr(self, attr_name) 130 131 def wrapper(*args, **kwargs): 132 return self.apply( 133 Split( 134 getattr(self.left.__class__, attr_name), 135 getattr(self.right.__class__, attr_name), 136 ), 137 *args, 138 **kwargs, 139 ) 140 141 return wrapper
Container for paired left/right data with convenient manipulation methods.
The Split class is designed for working with table images that span two pages or have distinct left and right sections. It allows you to:
- Store related data for both sides
- Apply functions to both sides simultaneously
- Access attributes/methods of contained objects transparently
Examples:
>>> # Create a split with different parameters for each side >>> thresholds = Split(0.25, 0.30) >>> >>> # Apply a function to both sides >>> images = Split(left_img, right_img) >>> processed = images.apply(lambda img: cv2.blur(img, (5, 5))) >>> >>> # Use with different parameters per side >>> results = images.apply( ... lambda img, k: sauvola_threshold(img, k), ... k=thresholds # k.left used for left img, k.right for right ... ) >>> >>> # Access methods of contained objects directly >>> templates = Split(template_left, template_right) >>> widths = templates.cell_widths(0) # Calls on both templates
Type Parameters:
T: The type of objects stored in left and right
47 def __init__(self, left: T | None = None, right: T | None = None): 48 """ 49 Initialize a Split container. 50 51 Args: 52 left: Data for the left side 53 right: Data for the right side 54 55 Note: 56 Both can initially be None. Use the `append` method or set 57 properties directly to populate. 58 """ 59 self._left = left 60 self._right = right
Initialize a Split container.
Arguments:
- left: Data for the left side
- right: Data for the right side
Note:
Both can initially be None. Use the
appendmethod or set properties directly to populate.
102 def apply( 103 self, 104 funcs: "Split[Callable[[T, *Any], V]] | Callable[[T, *Any], V]", 105 *args, 106 **kwargs, 107 ) -> "Split[V]": 108 if not isinstance(funcs, Split): 109 funcs = Split(funcs, funcs) 110 111 def get_arg(side: str, arg): 112 if isinstance(arg, Split): 113 return getattr(arg, side) 114 return arg 115 116 def call(side: str): 117 func = getattr(funcs, side) 118 target = getattr(self, side) 119 120 side_args = [get_arg(side, arg) for arg in args] 121 side_kwargs = {k: get_arg(side, v) for k, v in kwargs.items()} 122 123 return func(target, *side_args, **side_kwargs) 124 125 return Split(call("left"), call("right"))
35class Taulu: 36 """ 37 High-level API for table segmentation from images. 38 39 Taulu provides a simplified interface that orchestrates header alignment, 40 grid detection, and table segmentation into a single workflow. It's designed 41 to hide complexity while still allowing fine-tuned control through parameters. 42 43 ## Workflow Overview 44 45 1. **Header Template Creation**: Use `Taulu.annotate()` to create annotated 46 header images that define your table structure 47 2. **Initialization**: Create a Taulu instance with your header(s) and parameters 48 3. **Segmentation**: Call `segment_table()` on your table images to get a 49 `TableGrid` object containing all detected cell boundaries 50 51 ## Single vs Split Tables 52 53 Taulu supports two modes: 54 55 - **Single header**: For tables that fit on one page or have consistent structure 56 - **Split header**: For tables that span two pages (left/right) with potentially 57 different parameters for each side 58 59 Use `Split[T]` objects to provide different parameters for left and right sides. 60 61 ## Parameter Tuning Strategy 62 63 If segmentation fails or is inaccurate: 64 65 1. **Visual debugging**: Set `debug_view=True` in `segment_table()` to see 66 intermediate results 67 2. **Adjust thresholding**: Modify `sauvola_k` to change binarization sensitivity 68 - Increase to remove more noise (more aggressive) 69 - Decrease to preserve faint lines 70 3. **Tune cross-kernel**: Adjust `cross_width`, `cross_height`, `kernel_size` 71 to match your rule thickness after morphology 72 4. **Morphology**: Increase `morph_size` to connect broken lines, but be aware 73 this also thickens lines (requiring larger cross_width) 74 5. **Search parameters**: Increase `search_region` for warped documents, 75 adjust `distance_penalty` to control how strictly positions are enforced 76 6. **Growth parameters**: Lower `grow_threshold` if the algorithm stops too early, 77 increase `look_distance` for better extrapolation 78 79 Examples: 80 Basic usage with a single header: 81 82 >>> from taulu import Taulu 83 >>> 84 >>> # First, create annotated header (one-time setup) 85 >>> Taulu.annotate("table_image.png", "header.png") 86 >>> # This creates header.png and header.json 87 >>> 88 >>> # Initialize Taulu with the header 89 >>> taulu = Taulu( 90 ... header_image_path="header.png", 91 ... cell_height_factor=0.8, # Rows are 80% of header height 92 ... sauvola_k=0.25, 93 ... search_region=60, 94 ... cross_width=10 95 ... ) 96 >>> 97 >>> # Segment a table image 98 >>> grid = taulu.segment_table("table_page_01.png") 99 >>> 100 >>> # Use the grid to extract cells 101 >>> import cv2 102 >>> img = cv2.imread("table_page_01.png") 103 >>> cell_image = grid.crop_cell(img, (0, 0)) # First cell 104 105 Using split headers for two-page tables: 106 107 >>> from taulu import Taulu, Split 108 >>> 109 >>> # Annotate both headers 110 >>> Taulu.annotate("scan_01.png", "header_left.png") 111 >>> Taulu.annotate("scan_01.png", "header_right.png") 112 >>> 113 >>> # Use different parameters for each side 114 >>> taulu = Taulu( 115 ... header_image_path=Split("header_left.png", "header_right.png"), 116 ... cell_height_factor=Split([0.8, 0.9], [0.75]), # Different row heights 117 ... sauvola_k=Split(0.25, 0.30), # Different thresholds 118 ... cross_width=10 # Same for both sides 119 ... ) 120 >>> 121 >>> # Segment returns a unified grid 122 >>> grid = taulu.segment_table("scan_01.png") 123 124 Debug visualization to tune parameters: 125 126 >>> taulu = Taulu("header.png", sauvola_k=0.15) 127 >>> 128 >>> # Opens windows showing each processing step 129 >>> # Press 'n' to advance, 'q' to quit 130 >>> grid = taulu.segment_table("table.png", debug_view=True) 131 >>> 132 >>> # Adjust parameters based on what you see: 133 >>> # - If binarization is too noisy: increase sauvola_k 134 >>> # - If lines are broken after morphology: increase morph_size 135 >>> # - If filtered image has "undefined" corners: adjust cross_width to match line thickness (after morphology) 136 >>> # - If corners are missed during search: decrease grow_threshold or increase search_region 137 138 139 Attributes: 140 _header (MatLike | Split[MatLike]): Loaded header image(s) 141 _aligner (HeaderAligner | Split[HeaderAligner]): Header alignment engine(s) 142 _template (HeaderTemplate | Split[HeaderTemplate]): Parsed header structure(s) 143 _grid_detector (GridDetector | Split[GridDetector]): Grid detection engine(s) 144 _cell_heights (list[int] | Split[list[int]]): Computed cell heights in pixels 145 146 Raises: 147 TauluException: If header files don't exist, annotation is missing, or 148 Split parameters are used incorrectly with single headers 149 150 See Also: 151 - `TableGrid`: The result object with methods for accessing cells 152 - `Split`: Container for paired left/right parameters 153 - `GridDetector`: Lower-level grid detection (for advanced usage) 154 - `HeaderAligner`: Lower-level header alignment (for advanced usage) 155 """ 156 157 def __init__( 158 self, 159 header_image_path: PathLike[str] | str | Split[PathLike[str] | str], 160 cell_height_factor: float | list[float] | Split[float | list[float]] = [1.0], 161 header_anno_path: PathLike[str] 162 | str 163 | Split[PathLike[str] | str] 164 | None = None, 165 sauvola_k: float | Split[float] = 0.25, 166 search_region: int | Split[int] = 60, 167 distance_penalty: float | Split[float] = 0.4, 168 cross_width: int | Split[int] = 10, 169 morph_size: int | Split[int] = 4, 170 kernel_size: int | Split[int] = 41, 171 processing_scale: float | Split[float] = 1.0, 172 min_rows: int | Split[int] = 5, 173 look_distance: int | Split[int] = 3, 174 grow_threshold: float | Split[float] = 0.3, 175 ): 176 """ 177 Args: 178 header_image_path: 179 Path to the header template image(s). The header should be a cropped 180 image showing a clear view of the table's first row. An annotation 181 file (.json) must exist alongside the image, created via `Taulu.annotate()`. 182 For split tables, provide a `Split` containing left and right header paths. 183 184 cell_height_factor: 185 Height of data rows relative to header height. For example, if your 186 header is 100px tall and data rows are 80px tall, use 0.8. 187 188 - **float**: All rows have the same height 189 - **list[float]**: Different heights for different rows. The last value 190 is repeated for any additional rows beyond the list length. Useful when 191 the first data row is taller than subsequent rows. 192 - **Split**: Different height factors for left and right sides 193 194 Default: [1.0] 195 196 header_anno_path (PathLike[str] | str | Split[PathLike[str] | str] | None): 197 Optional explicit path to header annotation JSON file(s). If None, 198 looks for a .json file with the same name as `header_image_path`. 199 Default: None 200 201 sauvola_k (float | Split[float]): 202 Threshold sensitivity for Sauvola adaptive binarization (0.0-1.0). 203 Controls how aggressively the algorithm converts the image to binary. 204 205 - **Lower values** (0.04-0.15): Preserve faint lines, more noise 206 - **Higher values** (0.20-0.35): Remove noise, may lose faint lines 207 208 Start with 0.25 and adjust based on your image quality. 209 Default: 0.25 210 211 search_region (int | Split[int]): 212 Size in pixels of the square region to search for the next corner point. 213 The algorithm estimates where a corner should be, then searches within 214 this region for the best match. 215 216 - **Smaller values** (20-40): Faster, requires well-aligned tables 217 - **Larger values** (60-100): More robust to warping and distortion 218 219 Default: 60 220 221 distance_penalty (float | Split[float]): 222 Weight factor [0, 1] for penalizing corners far from expected position. 223 Uses Gaussian weighting within the search region. 224 225 - **0.0**: No penalty, any position in search region is equally valid 226 - **0.5**: Moderate preference for positions near the expected location 227 - **1.0**: Strong preference, only accepts positions very close to expected 228 229 Default: 0.4 230 231 cross_width (int | Split[int]): 232 Width in pixels of the cross-shaped kernel used to detect intersections. 233 Should approximately match the thickness of your table rules AFTER 234 morphological dilation. 235 236 **Tuning**: Look at the dilated image in debug_view. The cross_width 237 should match the thickness of the black lines you see. 238 Default: 10 239 240 morph_size (int | Split[int]): 241 Size of morphological structuring element for dilation. Controls how 242 much gap-bridging occurs to connect broken line segments. 243 244 - **Smaller values** (2-4): Minimal connection, preserves thin lines 245 - **Larger values** (6-10): Connects larger gaps, but thickens lines 246 247 Note: Increasing this requires increasing `cross_width` proportionally. 248 Default: 4 249 250 kernel_size (int | Split[int]): 251 Size of the cross-shaped kernel (must be odd). Larger kernels are more 252 selective, reducing false positives but potentially missing valid corners. 253 254 - **Smaller values** (21-31): More sensitive, finds more candidates 255 - **Larger values** (41-61): More selective, fewer false positives 256 257 Default: 41 258 259 processing_scale (float | Split[float]): 260 Image downscaling factor (0, 1] for processing speed. Processing is done 261 on scaled images, then results are scaled back to original size. 262 263 - **1.0**: Full resolution (slowest, most accurate) 264 - **0.5-0.75**: Good balance for high-res scans (2x-4x speedup) 265 - **0.25-0.5**: Fast processing for very large images 266 267 Default: 1.0 268 269 min_rows (int | Split[int]): 270 Minimum number of rows required before the algorithm considers the 271 table complete. Prevents stopping too early on tables with initial 272 low-confidence detections. 273 Default: 5 274 275 look_distance (int | Split[int]): 276 Number of adjacent rows/columns to examine when extrapolating missing 277 corners using polynomial regression. Higher values provide more context 278 but may smooth over legitimate variations. 279 280 - **2-3**: Good for consistent grids 281 - **4-6**: Better for grids with some irregularity 282 283 Default: 3 284 285 grow_threshold (float | Split[float]): 286 Initial minimum confidence [0, 1] required to accept a detected corner 287 during the growing phase. The algorithm may adaptively lower this 288 threshold if growth stalls. 289 290 - **Higher values** (0.5-0.8): Stricter, fewer errors but may miss valid corners 291 - **Lower values** (0.2-0.4): More permissive, finds more corners but more errors 292 293 Default: 0.3 294 295 """ 296 self._processing_scale = processing_scale 297 self._cell_height_factor = cell_height_factor 298 299 if isinstance(header_image_path, Split) or isinstance(header_anno_path, Split): 300 header = Split(Path(header_image_path.left), Path(header_image_path.right)) 301 302 if not exists(header.left.with_suffix(".png")) or not exists( 303 header.right.with_suffix(".png") 304 ): 305 raise TauluException( 306 "The header images you provided do not exist (or they aren't .png files)" 307 ) 308 309 if header_anno_path is None: 310 if not exists(header.left.with_suffix(".json")) or not exists( 311 header.right.with_suffix(".json") 312 ): 313 raise TauluException( 314 "You need to annotate the headers of your table first\n\nsee the Taulu.annotate method" 315 ) 316 317 template_left = HeaderTemplate.from_saved( 318 header.left.with_suffix(".json") 319 ) 320 template_right = HeaderTemplate.from_saved( 321 header.right.with_suffix(".json") 322 ) 323 324 else: 325 if not exists(header_anno_path.left) or not exists( 326 header_anno_path.right 327 ): 328 raise TauluException( 329 "The header annotation files you provided do not exist (or they aren't .json files)" 330 ) 331 332 template_left = HeaderTemplate.from_saved(header_anno_path.left) 333 template_right = HeaderTemplate.from_saved(header_anno_path.right) 334 335 self._header = Split( 336 cv2.imread(os.fspath(header.left)), cv2.imread(os.fspath(header.right)) 337 ) 338 339 self._aligner = Split( 340 HeaderAligner( 341 self._header.left, scale=get_param(self._processing_scale, "left") 342 ), 343 HeaderAligner( 344 self._header.right, scale=get_param(self._processing_scale, "right") 345 ), 346 ) 347 348 self._template = Split(template_left, template_right) 349 350 self._cell_heights = Split( 351 self._template.left.cell_heights(get_param(cell_height_factor, "left")), 352 self._template.right.cell_heights( 353 get_param(cell_height_factor, "right") 354 ), 355 ) 356 357 # Create GridDetector for left and right with potentially different parameters 358 self._grid_detector = Split( 359 GridDetector( 360 kernel_size=get_param(kernel_size, "left"), 361 cross_width=get_param(cross_width, "left"), 362 morph_size=get_param(morph_size, "left"), 363 search_region=get_param(search_region, "left"), 364 sauvola_k=get_param(sauvola_k, "left"), 365 distance_penalty=get_param(distance_penalty, "left"), 366 scale=get_param(self._processing_scale, "left"), 367 min_rows=get_param(min_rows, "left"), 368 look_distance=get_param(look_distance, "left"), 369 grow_threshold=get_param(grow_threshold, "left"), 370 ), 371 GridDetector( 372 kernel_size=get_param(kernel_size, "right"), 373 cross_width=get_param(cross_width, "right"), 374 morph_size=get_param(morph_size, "right"), 375 search_region=get_param(search_region, "right"), 376 sauvola_k=get_param(sauvola_k, "right"), 377 distance_penalty=get_param(distance_penalty, "right"), 378 scale=get_param(self._processing_scale, "right"), 379 min_rows=get_param(min_rows, "right"), 380 look_distance=get_param(look_distance, "right"), 381 grow_threshold=get_param(grow_threshold, "right"), 382 ), 383 ) 384 385 else: 386 header_image_path = Path(header_image_path) 387 self._header = cv2.imread(os.fspath(header_image_path)) 388 self._aligner = HeaderAligner(self._header) 389 self._template = HeaderTemplate.from_saved( 390 header_image_path.with_suffix(".json") 391 ) 392 393 # For single header, parameters should not be Split objects 394 if any( 395 isinstance(param, Split) 396 for param in [ 397 sauvola_k, 398 search_region, 399 distance_penalty, 400 cross_width, 401 morph_size, 402 kernel_size, 403 processing_scale, 404 min_rows, 405 look_distance, 406 grow_threshold, 407 cell_height_factor, 408 ] 409 ): 410 raise TauluException( 411 "Split parameters can only be used with split headers (tuple header_path)" 412 ) 413 414 self._cell_heights = self._template.cell_heights(self._cell_height_factor) 415 416 self._grid_detector = GridDetector( 417 kernel_size=kernel_size, 418 cross_width=cross_width, 419 morph_size=morph_size, 420 search_region=search_region, 421 sauvola_k=sauvola_k, 422 distance_penalty=distance_penalty, 423 scale=self._processing_scale, 424 min_rows=min_rows, 425 look_distance=look_distance, 426 grow_threshold=grow_threshold, 427 ) 428 429 @staticmethod 430 def annotate(image_path: PathLike[str] | str, output_path: PathLike[str] | str): 431 """ 432 Interactive tool to create header annotations for table segmentation. 433 434 This method guides you through a two-step annotation process: 435 436 1. **Crop the header**: Click four corners to define the header region 437 2. **Annotate lines**: Click pairs of points to define each vertical and 438 horizontal line in the header 439 440 The annotations are saved as: 441 - A cropped header image (.png) at `output_path` 442 - A JSON file (.json) containing line coordinates 443 444 ## Annotation Guidelines 445 446 **Which lines to annotate:** 447 - All vertical lines that extend into the table body (column separators) 448 - The top horizontal line of the header 449 - The bottom horizontal line of the header (top of data rows) 450 451 **Order doesn't matter** - annotate lines in any order that's convenient. 452 453 **To annotate a line:** 454 1. Click once at one endpoint 455 2. Click again at the other endpoint 456 3. A green line appears showing your annotation 457 458 **To undo:** 459 - Right-click anywhere to remove the last line you drew 460 461 **When finished:** 462 - Press 'n' to save and exit 463 - Press 'q' to quit without saving 464 465 Args: 466 image_path (PathLike[str] | str): Path to a table image containing 467 a clear view of the header. This can be a full table image. 468 output_path (PathLike[str] | str): Where to save the cropped header 469 image. The annotation JSON will be saved with the same name but 470 .json extension. 471 472 Raises: 473 TauluException: If image_path doesn't exist or output_path is a directory 474 475 Examples: 476 Annotate a single header: 477 478 >>> from taulu import Taulu 479 >>> Taulu.annotate("scan_page_01.png", "header.png") 480 # Interactive window opens 481 # After annotation: creates header.png and header.json 482 483 Annotate left and right headers for a split table: 484 485 >>> Taulu.annotate("scan_page_01.png", "header_left.png") 486 >>> Taulu.annotate("scan_page_01.png", "header_right.png") 487 # Creates header_left.{png,json} and header_right.{png,json} 488 489 Notes: 490 - The header image doesn't need to be perfectly cropped initially - 491 the tool will help you crop it precisely 492 - Annotation accuracy is important: misaligned lines will cause 493 segmentation errors 494 - You can re-run this method to update annotations if needed 495 """ 496 497 if not exists(image_path): 498 raise TauluException(f"Image path {image_path} does not exist") 499 500 if os.path.isdir(output_path): 501 raise TauluException("Output path should be a file") 502 503 output_path = Path(output_path) 504 505 template = HeaderTemplate.annotate_image( 506 os.fspath(image_path), crop=output_path.with_suffix(".png") 507 ) 508 509 template.save(output_path.with_suffix(".json")) 510 511 def segment_table( 512 self, 513 image: MatLike | PathLike[str] | str, 514 debug_view: bool = False, 515 ) -> TableGrid: 516 """ 517 Segment a table image into a grid of cells. 518 519 This is the main entry point for the taulu package. It orchestrates: 520 521 1. **Header alignment**: Locates the table by matching the header template 522 to the image using feature-based registration (ORB features + homography) 523 2. **Grid detection**: Applies morphological filtering and cross-correlation 524 to find corner intersections 525 3. **Grid growing**: Iteratively detects corners row-by-row and column-by-column, 526 starting from the aligned header position 527 4. **Extrapolation**: Fills in any missing corners using polynomial regression 528 based on neighboring detected points 529 5. **Smoothing**: Refines corner positions for consistency 530 531 ## Performance Notes 532 533 Processing time depends on: 534 - Image resolution (use `processing_scale < 1.0` for large images) 535 - Table complexity (more rows/columns = longer processing) 536 - Parameter tuning (lower thresholds = more computation) 537 538 Typical processing times: 539 - Small tables (10 rows, 5 cols, 2000x1500px): 1-3 seconds 540 - Large tables (50 rows, 10 cols, 4000x3000px): 5-15 seconds 541 542 ## Troubleshooting 543 544 **If segmentation fails (returns incomplete grid):** 545 1. Enable `debug_view=True` to see where it stops 546 2. Check if header alignment is correct (first debug image) 547 3. Verify cross-correlation shows bright spots at corners 548 4. Adjust `grow_threshold` (lower if stopping too early) 549 5. Increase `search_region` if corners are far from expected positions 550 551 **If segmentation is inaccurate (corners in wrong positions):** 552 1. Check binarization quality (adjust `sauvola_k`) 553 2. Verify cross-kernel size matches line thickness (adjust `cross_width`) 554 3. Ensure morphology isn't over-connecting (reduce `morph_size`) 555 4. Increase `distance_penalty` to enforce expected positions more strictly 556 557 Args: 558 image (MatLike | PathLike[str] | str): Table image to segment. 559 Can be a file path or a numpy array (BGR or grayscale). 560 561 debug_view (bool): If True, opens OpenCV windows showing intermediate 562 processing steps: 563 - Header alignment overlay 564 - Binarized image 565 - After morphological operations 566 - Cross-correlation result 567 - Growing progress (corner-by-corner) 568 569 **Controls:** 570 - Press 'n' to advance to next step 571 - Press 'q' to quit immediately 572 573 Useful for parameter tuning and understanding failures. 574 Default: False 575 576 Returns: 577 TableGrid: A grid structure containing detected corner positions with 578 methods for: 579 580 **Position queries:** 581 - `cell(point)`: Get (row, col) at pixel coordinates (x, y) 582 - `cell_polygon(cell)`: Get 4 corners of a cell as (lt, rt, rb, lb) 583 - `region(start, end)`: Get bounding box for a cell range 584 585 **Image extraction:** 586 - `crop_cell(img, cell, margin=0)`: Extract single cell with optional margin 587 - `crop_region(img, start, end, margin=0)`: Extract rectangular region 588 589 **Visualization:** 590 - `show_cells(img)`: Interactive cell viewer (click to highlight) 591 - `highlight_all_cells(img)`: Draw all cell boundaries 592 - `visualize_points(img)`: Show detected corner points 593 594 **Analysis:** 595 - `text_regions(img, row)`: Find continuous text regions in a row 596 - `cells()`: Generator yielding all (row, col) indices 597 598 **Persistence:** 599 - `save(path)`: Save grid to JSON file 600 - `TableGrid.from_saved(path)`: Load grid from JSON 601 602 **Properties:** 603 - `rows`: Number of data rows (header not included) 604 - `cols`: Number of columns 605 - `points`: Raw list of detected corner coordinates 606 607 Raises: 608 TauluException: If image cannot be loaded, header alignment fails, 609 or grid detection produces no results 610 611 Examples: 612 Basic segmentation: 613 614 >>> from taulu import Taulu 615 >>> import cv2 616 >>> 617 >>> taulu = Taulu("header.png") 618 >>> grid = taulu.segment_table("table_page_01.png") 619 >>> 620 >>> print(f"Detected {grid.rows} rows and {grid.cols} columns") 621 >>> 622 >>> # Extract first cell 623 >>> img = cv2.imread("table_page_01.png") 624 >>> cell_img = grid.crop_cell(img, (0, 0)) 625 >>> cv2.imwrite("cell_0_0.png", cell_img) 626 627 Debug mode for parameter tuning: 628 629 >>> grid = taulu.segment_table("table_page_01.png", debug_view=True) 630 # Windows open showing each step 631 # Adjust parameters based on what you see 632 633 Process multiple images with the same header: 634 635 >>> taulu = Taulu("header.png", sauvola_k=0.25) 636 >>> 637 >>> for i in range(1, 11): 638 ... img_path = f"table_page_{i:02d}.png" 639 ... grid = taulu.segment_table(img_path) 640 ... grid.save(f"grid_{i:02d}.json") 641 ... print(f"Page {i}: {grid.rows} rows detected") 642 643 Extract all cells from a table: 644 645 >>> img = cv2.imread("table.png") 646 >>> grid = taulu.segment_table("table.png") 647 >>> 648 >>> for row, col in grid.cells(): 649 ... cell_img = grid.crop_cell(img, (row, col), margin=5) 650 ... cv2.imwrite(f"cell_{row}_{col}.png", cell_img) 651 652 Find text regions for OCR: 653 654 >>> for row in range(grid.rows): 655 ... text_regions = grid.text_regions(img, row) 656 ... for start_cell, end_cell in text_regions: 657 ... # Extract region spanning multiple cells 658 ... region_img = grid.crop_region(img, start_cell, end_cell) 659 ... # Run OCR on region_img... 660 661 See Also: 662 - `TableGrid`: Complete documentation of the returned object 663 - `GridDetector.find_table_points()`: Lower-level grid detection 664 - `HeaderAligner.align()`: Lower-level header alignment 665 """ 666 667 if not isinstance(image, MatLike): 668 image = cv2.imread(os.fspath(image)) 669 670 now = perf_counter() 671 h = self._aligner.align(image, visual=debug_view) 672 align_time = perf_counter() - now 673 logger.info(f"Header alignment took {align_time:.2f} seconds") 674 675 # find the starting point for the table grid algorithm 676 left_top_template = self._template.intersection((1, 0)) 677 if isinstance(left_top_template, Split): 678 left_top_template = Split( 679 (int(left_top_template.left[0]), int(left_top_template.left[1])), 680 (int(left_top_template.right[0]), int(left_top_template.right[1])), 681 ) 682 else: 683 left_top_template = (int(left_top_template[0]), int(left_top_template[1])) 684 685 left_top_table = self._aligner.template_to_img(h, left_top_template) 686 687 now = perf_counter() 688 table = self._grid_detector.find_table_points( 689 image, 690 left_top_table, 691 self._template.cell_widths(0), 692 self._cell_heights, 693 visual=debug_view, 694 ) 695 grid_time = perf_counter() - now 696 logger.info(f"Grid detection took {grid_time:.2f} seconds") 697 698 if isinstance(table, Split): 699 table = TableGrid.from_split(table, (0, 0)) 700 701 return table
High-level API for table segmentation from images.
Taulu provides a simplified interface that orchestrates header alignment, grid detection, and table segmentation into a single workflow. It's designed to hide complexity while still allowing fine-tuned control through parameters.
Workflow Overview
- Header Template Creation: Use
Taulu.annotate()to create annotated header images that define your table structure - Initialization: Create a Taulu instance with your header(s) and parameters
- Segmentation: Call
segment_table()on your table images to get aTableGridobject containing all detected cell boundaries
Single vs Split Tables
Taulu supports two modes:
- Single header: For tables that fit on one page or have consistent structure
- Split header: For tables that span two pages (left/right) with potentially different parameters for each side
Use Split[T] objects to provide different parameters for left and right sides.
Parameter Tuning Strategy
If segmentation fails or is inaccurate:
- Visual debugging: Set
debug_view=Trueinsegment_table()to see intermediate results - Adjust thresholding: Modify
sauvola_kto change binarization sensitivity- Increase to remove more noise (more aggressive)
- Decrease to preserve faint lines
- Tune cross-kernel: Adjust
cross_width,cross_height,kernel_sizeto match your rule thickness after morphology - Morphology: Increase
morph_sizeto connect broken lines, but be aware this also thickens lines (requiring larger cross_width) - Search parameters: Increase
search_regionfor warped documents, adjustdistance_penaltyto control how strictly positions are enforced - Growth parameters: Lower
grow_thresholdif the algorithm stops too early, increaselook_distancefor better extrapolation
Examples:
Basic usage with a single header:
>>> from taulu import Taulu >>> >>> # First, create annotated header (one-time setup) >>> Taulu.annotate("table_image.png", "header.png") >>> # This creates header.png and header.json >>> >>> # Initialize Taulu with the header >>> taulu = Taulu( ... header_image_path="header.png", ... cell_height_factor=0.8, # Rows are 80% of header height ... sauvola_k=0.25, ... search_region=60, ... cross_width=10 ... ) >>> >>> # Segment a table image >>> grid = taulu.segment_table("table_page_01.png") >>> >>> # Use the grid to extract cells >>> import cv2 >>> img = cv2.imread("table_page_01.png") >>> cell_image = grid.crop_cell(img, (0, 0)) # First cellUsing split headers for two-page tables:
>>> from taulu import Taulu, Split >>> >>> # Annotate both headers >>> Taulu.annotate("scan_01.png", "header_left.png") >>> Taulu.annotate("scan_01.png", "header_right.png") >>> >>> # Use different parameters for each side >>> taulu = Taulu( ... header_image_path=Split("header_left.png", "header_right.png"), ... cell_height_factor=Split([0.8, 0.9], [0.75]), # Different row heights ... sauvola_k=Split(0.25, 0.30), # Different thresholds ... cross_width=10 # Same for both sides ... ) >>> >>> # Segment returns a unified grid >>> grid = taulu.segment_table("scan_01.png")Debug visualization to tune parameters:
>>> taulu = Taulu("header.png", sauvola_k=0.15) >>> >>> # Opens windows showing each processing step >>> # Press 'n' to advance, 'q' to quit >>> grid = taulu.segment_table("table.png", debug_view=True) >>> >>> # Adjust parameters based on what you see: >>> # - If binarization is too noisy: increase sauvola_k >>> # - If lines are broken after morphology: increase morph_size >>> # - If filtered image has "undefined" corners: adjust cross_width to match line thickness (after morphology) >>> # - If corners are missed during search: decrease grow_threshold or increase search_region
Attributes:
- _header (MatLike | Split[MatLike]): Loaded header image(s)
- _aligner (HeaderAligner | Split[HeaderAligner]): Header alignment engine(s)
- _template (HeaderTemplate | Split[HeaderTemplate]): Parsed header structure(s)
- _grid_detector (GridDetector | Split[GridDetector]): Grid detection engine(s)
- _cell_heights (list[int] | Split[list[int]]): Computed cell heights in pixels
Raises:
- TauluException: If header files don't exist, annotation is missing, or Split parameters are used incorrectly with single headers
See Also:
TableGrid: The result object with methods for accessing cellsSplit: Container for paired left/right parametersGridDetector: Lower-level grid detection (for advanced usage)HeaderAligner: Lower-level header alignment (for advanced usage)
157 def __init__( 158 self, 159 header_image_path: PathLike[str] | str | Split[PathLike[str] | str], 160 cell_height_factor: float | list[float] | Split[float | list[float]] = [1.0], 161 header_anno_path: PathLike[str] 162 | str 163 | Split[PathLike[str] | str] 164 | None = None, 165 sauvola_k: float | Split[float] = 0.25, 166 search_region: int | Split[int] = 60, 167 distance_penalty: float | Split[float] = 0.4, 168 cross_width: int | Split[int] = 10, 169 morph_size: int | Split[int] = 4, 170 kernel_size: int | Split[int] = 41, 171 processing_scale: float | Split[float] = 1.0, 172 min_rows: int | Split[int] = 5, 173 look_distance: int | Split[int] = 3, 174 grow_threshold: float | Split[float] = 0.3, 175 ): 176 """ 177 Args: 178 header_image_path: 179 Path to the header template image(s). The header should be a cropped 180 image showing a clear view of the table's first row. An annotation 181 file (.json) must exist alongside the image, created via `Taulu.annotate()`. 182 For split tables, provide a `Split` containing left and right header paths. 183 184 cell_height_factor: 185 Height of data rows relative to header height. For example, if your 186 header is 100px tall and data rows are 80px tall, use 0.8. 187 188 - **float**: All rows have the same height 189 - **list[float]**: Different heights for different rows. The last value 190 is repeated for any additional rows beyond the list length. Useful when 191 the first data row is taller than subsequent rows. 192 - **Split**: Different height factors for left and right sides 193 194 Default: [1.0] 195 196 header_anno_path (PathLike[str] | str | Split[PathLike[str] | str] | None): 197 Optional explicit path to header annotation JSON file(s). If None, 198 looks for a .json file with the same name as `header_image_path`. 199 Default: None 200 201 sauvola_k (float | Split[float]): 202 Threshold sensitivity for Sauvola adaptive binarization (0.0-1.0). 203 Controls how aggressively the algorithm converts the image to binary. 204 205 - **Lower values** (0.04-0.15): Preserve faint lines, more noise 206 - **Higher values** (0.20-0.35): Remove noise, may lose faint lines 207 208 Start with 0.25 and adjust based on your image quality. 209 Default: 0.25 210 211 search_region (int | Split[int]): 212 Size in pixels of the square region to search for the next corner point. 213 The algorithm estimates where a corner should be, then searches within 214 this region for the best match. 215 216 - **Smaller values** (20-40): Faster, requires well-aligned tables 217 - **Larger values** (60-100): More robust to warping and distortion 218 219 Default: 60 220 221 distance_penalty (float | Split[float]): 222 Weight factor [0, 1] for penalizing corners far from expected position. 223 Uses Gaussian weighting within the search region. 224 225 - **0.0**: No penalty, any position in search region is equally valid 226 - **0.5**: Moderate preference for positions near the expected location 227 - **1.0**: Strong preference, only accepts positions very close to expected 228 229 Default: 0.4 230 231 cross_width (int | Split[int]): 232 Width in pixels of the cross-shaped kernel used to detect intersections. 233 Should approximately match the thickness of your table rules AFTER 234 morphological dilation. 235 236 **Tuning**: Look at the dilated image in debug_view. The cross_width 237 should match the thickness of the black lines you see. 238 Default: 10 239 240 morph_size (int | Split[int]): 241 Size of morphological structuring element for dilation. Controls how 242 much gap-bridging occurs to connect broken line segments. 243 244 - **Smaller values** (2-4): Minimal connection, preserves thin lines 245 - **Larger values** (6-10): Connects larger gaps, but thickens lines 246 247 Note: Increasing this requires increasing `cross_width` proportionally. 248 Default: 4 249 250 kernel_size (int | Split[int]): 251 Size of the cross-shaped kernel (must be odd). Larger kernels are more 252 selective, reducing false positives but potentially missing valid corners. 253 254 - **Smaller values** (21-31): More sensitive, finds more candidates 255 - **Larger values** (41-61): More selective, fewer false positives 256 257 Default: 41 258 259 processing_scale (float | Split[float]): 260 Image downscaling factor (0, 1] for processing speed. Processing is done 261 on scaled images, then results are scaled back to original size. 262 263 - **1.0**: Full resolution (slowest, most accurate) 264 - **0.5-0.75**: Good balance for high-res scans (2x-4x speedup) 265 - **0.25-0.5**: Fast processing for very large images 266 267 Default: 1.0 268 269 min_rows (int | Split[int]): 270 Minimum number of rows required before the algorithm considers the 271 table complete. Prevents stopping too early on tables with initial 272 low-confidence detections. 273 Default: 5 274 275 look_distance (int | Split[int]): 276 Number of adjacent rows/columns to examine when extrapolating missing 277 corners using polynomial regression. Higher values provide more context 278 but may smooth over legitimate variations. 279 280 - **2-3**: Good for consistent grids 281 - **4-6**: Better for grids with some irregularity 282 283 Default: 3 284 285 grow_threshold (float | Split[float]): 286 Initial minimum confidence [0, 1] required to accept a detected corner 287 during the growing phase. The algorithm may adaptively lower this 288 threshold if growth stalls. 289 290 - **Higher values** (0.5-0.8): Stricter, fewer errors but may miss valid corners 291 - **Lower values** (0.2-0.4): More permissive, finds more corners but more errors 292 293 Default: 0.3 294 295 """ 296 self._processing_scale = processing_scale 297 self._cell_height_factor = cell_height_factor 298 299 if isinstance(header_image_path, Split) or isinstance(header_anno_path, Split): 300 header = Split(Path(header_image_path.left), Path(header_image_path.right)) 301 302 if not exists(header.left.with_suffix(".png")) or not exists( 303 header.right.with_suffix(".png") 304 ): 305 raise TauluException( 306 "The header images you provided do not exist (or they aren't .png files)" 307 ) 308 309 if header_anno_path is None: 310 if not exists(header.left.with_suffix(".json")) or not exists( 311 header.right.with_suffix(".json") 312 ): 313 raise TauluException( 314 "You need to annotate the headers of your table first\n\nsee the Taulu.annotate method" 315 ) 316 317 template_left = HeaderTemplate.from_saved( 318 header.left.with_suffix(".json") 319 ) 320 template_right = HeaderTemplate.from_saved( 321 header.right.with_suffix(".json") 322 ) 323 324 else: 325 if not exists(header_anno_path.left) or not exists( 326 header_anno_path.right 327 ): 328 raise TauluException( 329 "The header annotation files you provided do not exist (or they aren't .json files)" 330 ) 331 332 template_left = HeaderTemplate.from_saved(header_anno_path.left) 333 template_right = HeaderTemplate.from_saved(header_anno_path.right) 334 335 self._header = Split( 336 cv2.imread(os.fspath(header.left)), cv2.imread(os.fspath(header.right)) 337 ) 338 339 self._aligner = Split( 340 HeaderAligner( 341 self._header.left, scale=get_param(self._processing_scale, "left") 342 ), 343 HeaderAligner( 344 self._header.right, scale=get_param(self._processing_scale, "right") 345 ), 346 ) 347 348 self._template = Split(template_left, template_right) 349 350 self._cell_heights = Split( 351 self._template.left.cell_heights(get_param(cell_height_factor, "left")), 352 self._template.right.cell_heights( 353 get_param(cell_height_factor, "right") 354 ), 355 ) 356 357 # Create GridDetector for left and right with potentially different parameters 358 self._grid_detector = Split( 359 GridDetector( 360 kernel_size=get_param(kernel_size, "left"), 361 cross_width=get_param(cross_width, "left"), 362 morph_size=get_param(morph_size, "left"), 363 search_region=get_param(search_region, "left"), 364 sauvola_k=get_param(sauvola_k, "left"), 365 distance_penalty=get_param(distance_penalty, "left"), 366 scale=get_param(self._processing_scale, "left"), 367 min_rows=get_param(min_rows, "left"), 368 look_distance=get_param(look_distance, "left"), 369 grow_threshold=get_param(grow_threshold, "left"), 370 ), 371 GridDetector( 372 kernel_size=get_param(kernel_size, "right"), 373 cross_width=get_param(cross_width, "right"), 374 morph_size=get_param(morph_size, "right"), 375 search_region=get_param(search_region, "right"), 376 sauvola_k=get_param(sauvola_k, "right"), 377 distance_penalty=get_param(distance_penalty, "right"), 378 scale=get_param(self._processing_scale, "right"), 379 min_rows=get_param(min_rows, "right"), 380 look_distance=get_param(look_distance, "right"), 381 grow_threshold=get_param(grow_threshold, "right"), 382 ), 383 ) 384 385 else: 386 header_image_path = Path(header_image_path) 387 self._header = cv2.imread(os.fspath(header_image_path)) 388 self._aligner = HeaderAligner(self._header) 389 self._template = HeaderTemplate.from_saved( 390 header_image_path.with_suffix(".json") 391 ) 392 393 # For single header, parameters should not be Split objects 394 if any( 395 isinstance(param, Split) 396 for param in [ 397 sauvola_k, 398 search_region, 399 distance_penalty, 400 cross_width, 401 morph_size, 402 kernel_size, 403 processing_scale, 404 min_rows, 405 look_distance, 406 grow_threshold, 407 cell_height_factor, 408 ] 409 ): 410 raise TauluException( 411 "Split parameters can only be used with split headers (tuple header_path)" 412 ) 413 414 self._cell_heights = self._template.cell_heights(self._cell_height_factor) 415 416 self._grid_detector = GridDetector( 417 kernel_size=kernel_size, 418 cross_width=cross_width, 419 morph_size=morph_size, 420 search_region=search_region, 421 sauvola_k=sauvola_k, 422 distance_penalty=distance_penalty, 423 scale=self._processing_scale, 424 min_rows=min_rows, 425 look_distance=look_distance, 426 grow_threshold=grow_threshold, 427 )
Arguments:
- header_image_path: Path to the header template image(s). The header should be a cropped
image showing a clear view of the table's first row. An annotation
file (.json) must exist alongside the image, created via
Taulu.annotate(). For split tables, provide aSplitcontaining left and right header paths. cell_height_factor: Height of data rows relative to header height. For example, if your header is 100px tall and data rows are 80px tall, use 0.8.
- float: All rows have the same height
- list[float]: Different heights for different rows. The last value is repeated for any additional rows beyond the list length. Useful when the first data row is taller than subsequent rows.
- Split: Different height factors for left and right sides
Default: [1.0]
- header_anno_path (PathLike[str] | str | Split[PathLike[str] | str] | None): Optional explicit path to header annotation JSON file(s). If None,
looks for a .json file with the same name as
header_image_path. Default: None sauvola_k (float | Split[float]): Threshold sensitivity for Sauvola adaptive binarization (0.0-1.0). Controls how aggressively the algorithm converts the image to binary.
- Lower values (0.04-0.15): Preserve faint lines, more noise
- Higher values (0.20-0.35): Remove noise, may lose faint lines
Start with 0.25 and adjust based on your image quality. Default: 0.25
search_region (int | Split[int]): Size in pixels of the square region to search for the next corner point. The algorithm estimates where a corner should be, then searches within this region for the best match.
- Smaller values (20-40): Faster, requires well-aligned tables
- Larger values (60-100): More robust to warping and distortion
Default: 60
distance_penalty (float | Split[float]): Weight factor [0, 1] for penalizing corners far from expected position. Uses Gaussian weighting within the search region.
- 0.0: No penalty, any position in search region is equally valid
- 0.5: Moderate preference for positions near the expected location
- 1.0: Strong preference, only accepts positions very close to expected
Default: 0.4
cross_width (int | Split[int]): Width in pixels of the cross-shaped kernel used to detect intersections. Should approximately match the thickness of your table rules AFTER morphological dilation.
Tuning: Look at the dilated image in debug_view. The cross_width should match the thickness of the black lines you see. Default: 10
morph_size (int | Split[int]): Size of morphological structuring element for dilation. Controls how much gap-bridging occurs to connect broken line segments.
- Smaller values (2-4): Minimal connection, preserves thin lines
- Larger values (6-10): Connects larger gaps, but thickens lines
Note: Increasing this requires increasing
cross_widthproportionally. Default: 4kernel_size (int | Split[int]): Size of the cross-shaped kernel (must be odd). Larger kernels are more selective, reducing false positives but potentially missing valid corners.
- Smaller values (21-31): More sensitive, finds more candidates
- Larger values (41-61): More selective, fewer false positives
Default: 41
processing_scale (float | Split[float]): Image downscaling factor (0, 1] for processing speed. Processing is done on scaled images, then results are scaled back to original size.
- 1.0: Full resolution (slowest, most accurate)
- 0.5-0.75: Good balance for high-res scans (2x-4x speedup)
- 0.25-0.5: Fast processing for very large images
Default: 1.0
- min_rows (int | Split[int]): Minimum number of rows required before the algorithm considers the table complete. Prevents stopping too early on tables with initial low-confidence detections. Default: 5
look_distance (int | Split[int]): Number of adjacent rows/columns to examine when extrapolating missing corners using polynomial regression. Higher values provide more context but may smooth over legitimate variations.
- 2-3: Good for consistent grids
- 4-6: Better for grids with some irregularity
Default: 3
grow_threshold (float | Split[float]): Initial minimum confidence [0, 1] required to accept a detected corner during the growing phase. The algorithm may adaptively lower this threshold if growth stalls.
- Higher values (0.5-0.8): Stricter, fewer errors but may miss valid corners
- Lower values (0.2-0.4): More permissive, finds more corners but more errors
Default: 0.3
429 @staticmethod 430 def annotate(image_path: PathLike[str] | str, output_path: PathLike[str] | str): 431 """ 432 Interactive tool to create header annotations for table segmentation. 433 434 This method guides you through a two-step annotation process: 435 436 1. **Crop the header**: Click four corners to define the header region 437 2. **Annotate lines**: Click pairs of points to define each vertical and 438 horizontal line in the header 439 440 The annotations are saved as: 441 - A cropped header image (.png) at `output_path` 442 - A JSON file (.json) containing line coordinates 443 444 ## Annotation Guidelines 445 446 **Which lines to annotate:** 447 - All vertical lines that extend into the table body (column separators) 448 - The top horizontal line of the header 449 - The bottom horizontal line of the header (top of data rows) 450 451 **Order doesn't matter** - annotate lines in any order that's convenient. 452 453 **To annotate a line:** 454 1. Click once at one endpoint 455 2. Click again at the other endpoint 456 3. A green line appears showing your annotation 457 458 **To undo:** 459 - Right-click anywhere to remove the last line you drew 460 461 **When finished:** 462 - Press 'n' to save and exit 463 - Press 'q' to quit without saving 464 465 Args: 466 image_path (PathLike[str] | str): Path to a table image containing 467 a clear view of the header. This can be a full table image. 468 output_path (PathLike[str] | str): Where to save the cropped header 469 image. The annotation JSON will be saved with the same name but 470 .json extension. 471 472 Raises: 473 TauluException: If image_path doesn't exist or output_path is a directory 474 475 Examples: 476 Annotate a single header: 477 478 >>> from taulu import Taulu 479 >>> Taulu.annotate("scan_page_01.png", "header.png") 480 # Interactive window opens 481 # After annotation: creates header.png and header.json 482 483 Annotate left and right headers for a split table: 484 485 >>> Taulu.annotate("scan_page_01.png", "header_left.png") 486 >>> Taulu.annotate("scan_page_01.png", "header_right.png") 487 # Creates header_left.{png,json} and header_right.{png,json} 488 489 Notes: 490 - The header image doesn't need to be perfectly cropped initially - 491 the tool will help you crop it precisely 492 - Annotation accuracy is important: misaligned lines will cause 493 segmentation errors 494 - You can re-run this method to update annotations if needed 495 """ 496 497 if not exists(image_path): 498 raise TauluException(f"Image path {image_path} does not exist") 499 500 if os.path.isdir(output_path): 501 raise TauluException("Output path should be a file") 502 503 output_path = Path(output_path) 504 505 template = HeaderTemplate.annotate_image( 506 os.fspath(image_path), crop=output_path.with_suffix(".png") 507 ) 508 509 template.save(output_path.with_suffix(".json"))
Interactive tool to create header annotations for table segmentation.
This method guides you through a two-step annotation process:
- Crop the header: Click four corners to define the header region
- Annotate lines: Click pairs of points to define each vertical and horizontal line in the header
The annotations are saved as:
- A cropped header image (.png) at
output_path - A JSON file (.json) containing line coordinates
Annotation Guidelines
Which lines to annotate:
- All vertical lines that extend into the table body (column separators)
- The top horizontal line of the header
- The bottom horizontal line of the header (top of data rows)
Order doesn't matter - annotate lines in any order that's convenient.
To annotate a line:
- Click once at one endpoint
- Click again at the other endpoint
- A green line appears showing your annotation
To undo:
- Right-click anywhere to remove the last line you drew
When finished:
- Press 'n' to save and exit
- Press 'q' to quit without saving
Arguments:
- image_path (PathLike[str] | str): Path to a table image containing a clear view of the header. This can be a full table image.
- output_path (PathLike[str] | str): Where to save the cropped header image. The annotation JSON will be saved with the same name but .json extension.
Raises:
- TauluException: If image_path doesn't exist or output_path is a directory
Examples:
Annotate a single header:
>>> from taulu import Taulu >>> Taulu.annotate("scan_page_01.png", "header.png") <h1 id="interactive-window-opens">Interactive window opens</h1>After annotation: creates header.png and header.json
Annotate left and right headers for a split table:
>>> Taulu.annotate("scan_page_01.png", "header_left.png") >>> Taulu.annotate("scan_page_01.png", "header_right.png") <h1 id="creates-header_leftpngjson-and-header_rightpngjson">Creates header_left.{png,json} and header_right.{png,json}</h1>
Notes:
- The header image doesn't need to be perfectly cropped initially - the tool will help you crop it precisely
- Annotation accuracy is important: misaligned lines will cause segmentation errors
- You can re-run this method to update annotations if needed
511 def segment_table( 512 self, 513 image: MatLike | PathLike[str] | str, 514 debug_view: bool = False, 515 ) -> TableGrid: 516 """ 517 Segment a table image into a grid of cells. 518 519 This is the main entry point for the taulu package. It orchestrates: 520 521 1. **Header alignment**: Locates the table by matching the header template 522 to the image using feature-based registration (ORB features + homography) 523 2. **Grid detection**: Applies morphological filtering and cross-correlation 524 to find corner intersections 525 3. **Grid growing**: Iteratively detects corners row-by-row and column-by-column, 526 starting from the aligned header position 527 4. **Extrapolation**: Fills in any missing corners using polynomial regression 528 based on neighboring detected points 529 5. **Smoothing**: Refines corner positions for consistency 530 531 ## Performance Notes 532 533 Processing time depends on: 534 - Image resolution (use `processing_scale < 1.0` for large images) 535 - Table complexity (more rows/columns = longer processing) 536 - Parameter tuning (lower thresholds = more computation) 537 538 Typical processing times: 539 - Small tables (10 rows, 5 cols, 2000x1500px): 1-3 seconds 540 - Large tables (50 rows, 10 cols, 4000x3000px): 5-15 seconds 541 542 ## Troubleshooting 543 544 **If segmentation fails (returns incomplete grid):** 545 1. Enable `debug_view=True` to see where it stops 546 2. Check if header alignment is correct (first debug image) 547 3. Verify cross-correlation shows bright spots at corners 548 4. Adjust `grow_threshold` (lower if stopping too early) 549 5. Increase `search_region` if corners are far from expected positions 550 551 **If segmentation is inaccurate (corners in wrong positions):** 552 1. Check binarization quality (adjust `sauvola_k`) 553 2. Verify cross-kernel size matches line thickness (adjust `cross_width`) 554 3. Ensure morphology isn't over-connecting (reduce `morph_size`) 555 4. Increase `distance_penalty` to enforce expected positions more strictly 556 557 Args: 558 image (MatLike | PathLike[str] | str): Table image to segment. 559 Can be a file path or a numpy array (BGR or grayscale). 560 561 debug_view (bool): If True, opens OpenCV windows showing intermediate 562 processing steps: 563 - Header alignment overlay 564 - Binarized image 565 - After morphological operations 566 - Cross-correlation result 567 - Growing progress (corner-by-corner) 568 569 **Controls:** 570 - Press 'n' to advance to next step 571 - Press 'q' to quit immediately 572 573 Useful for parameter tuning and understanding failures. 574 Default: False 575 576 Returns: 577 TableGrid: A grid structure containing detected corner positions with 578 methods for: 579 580 **Position queries:** 581 - `cell(point)`: Get (row, col) at pixel coordinates (x, y) 582 - `cell_polygon(cell)`: Get 4 corners of a cell as (lt, rt, rb, lb) 583 - `region(start, end)`: Get bounding box for a cell range 584 585 **Image extraction:** 586 - `crop_cell(img, cell, margin=0)`: Extract single cell with optional margin 587 - `crop_region(img, start, end, margin=0)`: Extract rectangular region 588 589 **Visualization:** 590 - `show_cells(img)`: Interactive cell viewer (click to highlight) 591 - `highlight_all_cells(img)`: Draw all cell boundaries 592 - `visualize_points(img)`: Show detected corner points 593 594 **Analysis:** 595 - `text_regions(img, row)`: Find continuous text regions in a row 596 - `cells()`: Generator yielding all (row, col) indices 597 598 **Persistence:** 599 - `save(path)`: Save grid to JSON file 600 - `TableGrid.from_saved(path)`: Load grid from JSON 601 602 **Properties:** 603 - `rows`: Number of data rows (header not included) 604 - `cols`: Number of columns 605 - `points`: Raw list of detected corner coordinates 606 607 Raises: 608 TauluException: If image cannot be loaded, header alignment fails, 609 or grid detection produces no results 610 611 Examples: 612 Basic segmentation: 613 614 >>> from taulu import Taulu 615 >>> import cv2 616 >>> 617 >>> taulu = Taulu("header.png") 618 >>> grid = taulu.segment_table("table_page_01.png") 619 >>> 620 >>> print(f"Detected {grid.rows} rows and {grid.cols} columns") 621 >>> 622 >>> # Extract first cell 623 >>> img = cv2.imread("table_page_01.png") 624 >>> cell_img = grid.crop_cell(img, (0, 0)) 625 >>> cv2.imwrite("cell_0_0.png", cell_img) 626 627 Debug mode for parameter tuning: 628 629 >>> grid = taulu.segment_table("table_page_01.png", debug_view=True) 630 # Windows open showing each step 631 # Adjust parameters based on what you see 632 633 Process multiple images with the same header: 634 635 >>> taulu = Taulu("header.png", sauvola_k=0.25) 636 >>> 637 >>> for i in range(1, 11): 638 ... img_path = f"table_page_{i:02d}.png" 639 ... grid = taulu.segment_table(img_path) 640 ... grid.save(f"grid_{i:02d}.json") 641 ... print(f"Page {i}: {grid.rows} rows detected") 642 643 Extract all cells from a table: 644 645 >>> img = cv2.imread("table.png") 646 >>> grid = taulu.segment_table("table.png") 647 >>> 648 >>> for row, col in grid.cells(): 649 ... cell_img = grid.crop_cell(img, (row, col), margin=5) 650 ... cv2.imwrite(f"cell_{row}_{col}.png", cell_img) 651 652 Find text regions for OCR: 653 654 >>> for row in range(grid.rows): 655 ... text_regions = grid.text_regions(img, row) 656 ... for start_cell, end_cell in text_regions: 657 ... # Extract region spanning multiple cells 658 ... region_img = grid.crop_region(img, start_cell, end_cell) 659 ... # Run OCR on region_img... 660 661 See Also: 662 - `TableGrid`: Complete documentation of the returned object 663 - `GridDetector.find_table_points()`: Lower-level grid detection 664 - `HeaderAligner.align()`: Lower-level header alignment 665 """ 666 667 if not isinstance(image, MatLike): 668 image = cv2.imread(os.fspath(image)) 669 670 now = perf_counter() 671 h = self._aligner.align(image, visual=debug_view) 672 align_time = perf_counter() - now 673 logger.info(f"Header alignment took {align_time:.2f} seconds") 674 675 # find the starting point for the table grid algorithm 676 left_top_template = self._template.intersection((1, 0)) 677 if isinstance(left_top_template, Split): 678 left_top_template = Split( 679 (int(left_top_template.left[0]), int(left_top_template.left[1])), 680 (int(left_top_template.right[0]), int(left_top_template.right[1])), 681 ) 682 else: 683 left_top_template = (int(left_top_template[0]), int(left_top_template[1])) 684 685 left_top_table = self._aligner.template_to_img(h, left_top_template) 686 687 now = perf_counter() 688 table = self._grid_detector.find_table_points( 689 image, 690 left_top_table, 691 self._template.cell_widths(0), 692 self._cell_heights, 693 visual=debug_view, 694 ) 695 grid_time = perf_counter() - now 696 logger.info(f"Grid detection took {grid_time:.2f} seconds") 697 698 if isinstance(table, Split): 699 table = TableGrid.from_split(table, (0, 0)) 700 701 return table
Segment a table image into a grid of cells.
This is the main entry point for the taulu package. It orchestrates:
- Header alignment: Locates the table by matching the header template to the image using feature-based registration (ORB features + homography)
- Grid detection: Applies morphological filtering and cross-correlation to find corner intersections
- Grid growing: Iteratively detects corners row-by-row and column-by-column, starting from the aligned header position
- Extrapolation: Fills in any missing corners using polynomial regression based on neighboring detected points
- Smoothing: Refines corner positions for consistency
Performance Notes
Processing time depends on:
- Image resolution (use
processing_scale < 1.0for large images) - Table complexity (more rows/columns = longer processing)
- Parameter tuning (lower thresholds = more computation)
Typical processing times:
- Small tables (10 rows, 5 cols, 2000x1500px): 1-3 seconds
- Large tables (50 rows, 10 cols, 4000x3000px): 5-15 seconds
Troubleshooting
If segmentation fails (returns incomplete grid):
- Enable
debug_view=Trueto see where it stops - Check if header alignment is correct (first debug image)
- Verify cross-correlation shows bright spots at corners
- Adjust
grow_threshold(lower if stopping too early) - Increase
search_regionif corners are far from expected positions
If segmentation is inaccurate (corners in wrong positions):
- Check binarization quality (adjust
sauvola_k) - Verify cross-kernel size matches line thickness (adjust
cross_width) - Ensure morphology isn't over-connecting (reduce
morph_size) - Increase
distance_penaltyto enforce expected positions more strictly
Arguments:
- image (MatLike | PathLike[str] | str): Table image to segment. Can be a file path or a numpy array (BGR or grayscale).
debug_view (bool): If True, opens OpenCV windows showing intermediate processing steps:
- Header alignment overlay
- Binarized image
- After morphological operations
- Cross-correlation result
- Growing progress (corner-by-corner)
Controls:
- Press 'n' to advance to next step
- Press 'q' to quit immediately
Useful for parameter tuning and understanding failures. Default: False
Returns:
TableGrid: A grid structure containing detected corner positions with methods for:
**Position queries:** - `cell(point)`: Get (row, col) at pixel coordinates (x, y) - `cell_polygon(cell)`: Get 4 corners of a cell as (lt, rt, rb, lb) - `region(start, end)`: Get bounding box for a cell range **Image extraction:** - `crop_cell(img, cell, margin=0)`: Extract single cell with optional margin - `crop_region(img, start, end, margin=0)`: Extract rectangular region **Visualization:** - `show_cells(img)`: Interactive cell viewer (click to highlight) - `highlight_all_cells(img)`: Draw all cell boundaries - `visualize_points(img)`: Show detected corner points **Analysis:** - `text_regions(img, row)`: Find continuous text regions in a row - `cells()`: Generator yielding all (row, col) indices **Persistence:** - `save(path)`: Save grid to JSON file - `TableGrid.from_saved(path)`: Load grid from JSON **Properties:** - `rows`: Number of data rows (header not included) - `cols`: Number of columns - `points`: Raw list of detected corner coordinates
Raises:
- TauluException: If image cannot be loaded, header alignment fails, or grid detection produces no results
Examples:
Basic segmentation:
>>> from taulu import Taulu >>> import cv2 >>> >>> taulu = Taulu("header.png") >>> grid = taulu.segment_table("table_page_01.png") >>> >>> print(f"Detected {grid.rows} rows and {grid.cols} columns") >>> >>> # Extract first cell >>> img = cv2.imread("table_page_01.png") >>> cell_img = grid.crop_cell(img, (0, 0)) >>> cv2.imwrite("cell_0_0.png", cell_img)Debug mode for parameter tuning:
>>> grid = taulu.segment_table("table_page_01.png", debug_view=True) <h1 id="windows-open-showing-each-step">Windows open showing each step</h1>Adjust parameters based on what you see
Process multiple images with the same header:
>>> taulu = Taulu("header.png", sauvola_k=0.25) >>> >>> for i in range(1, 11): ... img_path = f"table_page_{i:02d}.png" ... grid = taulu.segment_table(img_path) ... grid.save(f"grid_{i:02d}.json") ... print(f"Page {i}: {grid.rows} rows detected")Extract all cells from a table:
>>> img = cv2.imread("table.png") >>> grid = taulu.segment_table("table.png") >>> >>> for row, col in grid.cells(): ... cell_img = grid.crop_cell(img, (row, col), margin=5) ... cv2.imwrite(f"cell_{row}_{col}.png", cell_img)Find text regions for OCR:
>>> for row in range(grid.rows): ... text_regions = grid.text_regions(img, row) ... for start_cell, end_cell in text_regions: ... # Extract region spanning multiple cells ... region_img = grid.crop_region(img, start_cell, end_cell) ... # Run OCR on region_img...
See Also:
TableGrid: Complete documentation of the returned objectGridDetector.find_table_points(): Lower-level grid detectionHeaderAligner.align(): Lower-level header alignment