(108.00, 754.57) (126.82, 754.57) (126.82, 763.12) (108.00, 763.12)       /F81 IBM	<|special_separator|>
(399.97, 754.57) (504.00, 754.57) (504.00, 763.12) (399.97, 763.12)       /F81 Granite Language Models	<|special_separator|>
(108.43, 694.88) (362.55, 695.59) (362.55, 707.41) (108.43, 709.66)       /F81 GRANITE 3.0 LANGUAGE MODELS	<|special_separator|>
(113.98, 665.27) (198.01, 665.27) (198.01, 674.22) (113.98, 674.22)       /F90 Granite Team, IBM	<|special_separator|>
(198.01, 669.63) (201.98, 669.63) (201.98, 675.73) (198.01, 675.73)       /F27 1	<|special_separator|>
(113.98, 658.24) (117.95, 658.24) (117.95, 664.33) (113.98, 664.33)       /F27 1	<|special_separator|>
(118.45, 653.89) (390.02, 653.89) (390.02, 662.44) (118.45, 662.44)       /F81 See Contributions and Acknowledgments section for full author list.	<|special_separator|>
(119.30, 642.93) (355.92, 642.93) (355.92, 651.48) (119.30, 651.48)       /F81 Please send correspondence to granite-inquiries@ibm.com.	<|special_separator|>
(278.29, 601.27) (333.72, 601.76) (333.72, 609.97) (278.29, 611.53)       /F81 ABSTRACT	<|special_separator|>
(143.56, 575.74) (468.14, 575.74) (468.14, 584.29) (143.56, 584.29)       /F81 This report presents Granite 3.0, a new set of lightweight, state-of-the-art, open	<|special_separator|>
(143.87, 564.78) (469.87, 564.78) (469.87, 573.33) (143.87, 573.33)       /F81 foundation models ranging in scale from 400 million to 8 billion active parameters.	<|special_separator|>
(143.87, 553.82) (468.13, 553.82) (468.13, 562.37) (143.87, 562.37)       /F81 Equipped with native support of multilingual, coding, function calling, and strong	<|special_separator|>
(143.87, 542.86) (468.14, 542.86) (468.14, 551.41) (143.87, 551.41)       /F81 safety performance, these models target enterprise use cases, including on-premise	<|special_separator|>
(143.87, 531.90) (468.14, 531.90) (468.14, 540.45) (143.87, 540.45)       /F81 and on-device settings. Evaluations on a comprehensive set of tasks demonstrate	<|special_separator|>
(143.87, 520.94) (468.13, 520.94) (468.13, 529.49) (143.87, 529.49)       /F81 that our models consistently reach state-of-the-art performance for their size (as	<|special_separator|>
(143.87, 509.98) (468.13, 509.98) (468.13, 518.53) (143.87, 518.53)       /F81 shown in Figure 1 and 2). This report also discloses technical details of pre-training	<|special_separator|>
(143.87, 499.02) (468.13, 499.02) (468.13, 507.58) (143.87, 507.58)       /F81 and post-training that may help the research community accelerate the collective	<|special_separator|>
(143.87, 488.06) (468.13, 488.06) (468.13, 496.62) (143.87, 496.62)       /F81 efforts to develop open foundation models. We publicly release pre-trained and	<|special_separator|>
(143.87, 477.11) (468.13, 477.11) (468.13, 485.66) (143.87, 485.66)       /F81 post-trained versions of all our Granite 3.0 models under a standard permissive	<|special_separator|>
(143.51, 466.15) (468.14, 466.15) (468.14, 474.70) (143.51, 474.70)       /F81 Apache 2.0 license allowing both research and commercial use. With support from	<|special_separator|>
(143.87, 455.19) (468.13, 455.19) (468.13, 463.74) (143.87, 463.74)       /F81 the open source community, the Granite 3.0 models have been integrated with a	<|special_separator|>
(143.87, 444.23) (418.39, 444.23) (418.39, 452.78) (143.87, 452.78)       /F81 range of existing tools for quantization, fine-tuning, and deployment.	<|special_separator|>
(162.75, 330.24) (170.53, 330.24) (170.53, 339.75) (162.75, 339.75)       /Tc1 20	<|special_separator|>
(162.75, 346.28) (170.53, 346.28) (170.53, 355.79) (162.75, 355.79)       /Tc1 30	<|special_separator|>
(162.75, 362.32) (170.53, 362.32) (170.53, 371.83) (162.75, 371.83)       /Tc1 40	<|special_separator|>
(162.75, 378.36) (170.53, 378.36) (170.53, 387.87) (162.75, 387.87)       /Tc1 50	<|special_separator|>
(162.75, 394.31) (170.53, 394.31) (170.53, 403.82) (162.75, 403.82)       /Tc1 60	<|special_separator|>
(154.55, 285.09) (193.13, 323.68) (186.41, 330.40) (147.83, 291.81)       /Tc1 Granite-3.0-8B	<|special_separator|>
(185.75, 290.59) (218.82, 323.66) (212.10, 330.38) (179.03, 297.31)       /Tc1 Llama-3.1-8B	<|special_separator|>
(205.92, 285.09) (244.50, 323.68) (237.78, 330.40) (199.20, 291.81)       /Tc1 Granite-3.0-2B	<|special_separator|>
(242.63, 296.17) (270.19, 323.73) (263.47, 330.45) (235.91, 302.89)       /Tc1 Mistral-7B	<|special_separator|>
(240.75, 268.52) (295.87, 323.65) (289.15, 330.37) (234.03, 275.24)       /Tc1 Granite-3.0-3B-A800M	<|special_separator|>
(299.52, 301.66) (321.56, 323.71) (314.84, 330.43) (292.80, 308.38)       /Tc1 Gemma-2B	<|special_separator|>
(314.17, 290.59) (347.25, 323.66) (340.53, 330.38) (307.45, 297.31)       /Tc1 Llama-3.1-3B	<|special_separator|>
(317.80, 268.52) (372.93, 323.65) (366.21, 330.37) (311.08, 275.24)       /Tc1 Granite-3.0-1B-A400M	<|special_separator|>
(368.30, 293.33) (398.62, 323.65) (391.90, 330.37) (361.58, 300.05)       /Tc1 SmolLM-1.7B	<|special_separator|>
(391.23, 290.59) (424.30, 323.66) (417.58, 330.38) (384.51, 297.31)       /Tc1 Llama-3.2-1B	<|special_separator|>
(419.67, 293.33) (449.99, 323.65) (443.27, 330.37) (412.95, 300.05)       /Tc1 SmolLM-360M	<|special_separator|>
(161.99, 323.20) (161.99, 397.26) (152.49, 397.26) (152.49, 323.20)       /Tc1 Average Performance	<|special_separator|>
(280.27, 404.84) (331.74, 404.84) (331.74, 416.54) (280.27, 416.54)       /Tc2 Base Models	<|special_separator|>
(147.19, 246.55) (464.81, 246.55) (464.81, 255.10) (147.19, 255.10)       /F81 Figure 1: Average performance of base models across 19 tasks from 6 domains.	<|special_separator|>
(162.77, 155.55) (170.56, 155.55) (170.56, 165.06) (162.77, 165.06)       /Tc1 20	<|special_separator|>
(162.77, 171.61) (170.56, 171.61) (170.56, 181.12) (162.77, 181.12)       /Tc1 30	<|special_separator|>
(162.77, 187.58) (170.56, 187.58) (170.56, 197.10) (162.77, 197.10)       /Tc1 40	<|special_separator|>
(162.77, 203.64) (170.56, 203.64) (170.56, 213.16) (162.77, 213.16)       /Tc1 50	<|special_separator|>
(162.77, 219.70) (170.56, 219.70) (170.56, 229.22) (162.77, 229.22)       /Tc1 60	<|special_separator|>
(154.55, 110.34) (193.18, 148.97) (186.45, 155.70) (147.82, 117.07)       /Tc1 Granite-3.0-8B	<|special_separator|>
(185.77, 115.84) (218.88, 148.96) (212.15, 155.68) (179.04, 122.57)       /Tc1 Llama-3.1-8B	<|special_separator|>
(205.95, 110.34) (244.58, 148.97) (237.85, 155.70) (199.22, 117.07)       /Tc1 Granite-3.0-2B	<|special_separator|>
(242.69, 121.34) (270.28, 148.94) (263.55, 155.66) (235.96, 128.07)       /Tc1 Mistral-7B	<|special_separator|>
(262.87, 115.84) (295.98, 148.96) (289.25, 155.68) (256.14, 122.57)       /Tc1 Llama-3.1-3B	<|special_separator|>
(299.61, 126.84) (321.68, 148.91) (314.95, 155.64) (292.88, 133.57)       /Tc1 Gemma-2B	<|special_separator|>
(292.18, 093.75) (347.38, 148.94) (340.65, 155.67) (285.45, 100.48)       /Tc1 Granite-3.0-3B-A800M	<|special_separator|>
(339.97, 115.84) (373.08, 148.96) (366.35, 155.68) (333.24, 122.57)       /Tc1 Llama-3.2-1B	<|special_separator|>
(343.58, 093.75) (398.78, 148.94) (392.05, 155.67) (336.85, 100.48)       /Tc1 Granite-3.0-1B-A400M	<|special_separator|>
(394.13, 118.59) (424.48, 148.94) (417.75, 155.67) (387.40, 125.32)       /Tc1 SmolLM-1.7B	<|special_separator|>
(419.83, 118.59) (450.18, 148.94) (443.45, 155.67) (413.10, 125.32)       /Tc1 SmolLM-360M	<|special_separator|>
(162.01, 148.41) (162.01, 222.57) (152.50, 222.57) (152.50, 148.41)       /Tc1 Average Performance	<|special_separator|>
(270.96, 230.15) (341.24, 230.15) (341.24, 241.86) (270.96, 241.86)       /Tc2 Instruct Models	<|special_separator|>
(141.10, 071.72) (470.90, 071.72) (470.90, 080.27) (141.10, 080.27)       /F81 Figure 2: Average performance of instruct models across 23 tasks from 8 domains.	<|special_separator|>
(303.14, 030.18) (308.12, 030.18) (308.12, 038.74) (303.14, 038.74)       /F81 1