[{"data":1,"prerenderedAt":718},["ShallowReactive",2],{"/en-us/blog/the-consul-outage-that-never-happened/":3,"navigation-en-us":34,"banner-en-us":463,"footer-en-us":480,"Devin Sylva":690,"next-steps-en-us":703},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"seo":8,"content":16,"config":24,"_id":27,"_type":28,"title":29,"_source":30,"_file":31,"_stem":32,"_extension":33},"/en-us/blog/the-consul-outage-that-never-happened","blog",false,"",{"title":9,"description":10,"ogTitle":9,"ogDescription":10,"noIndex":6,"ogImage":11,"ogUrl":12,"ogSiteName":13,"ogType":14,"canonicalUrls":12,"schema":15},"The Consul outage that never happened","Sometimes a good plan is the best tool for the job.","https://res.cloudinary.com/about-gitlab-com/image/upload/v1749679092/Blog/Hero%20Images/consul-outage-image.jpg","https://about.gitlab.com/blog/the-consul-outage-that-never-happened","https://about.gitlab.com","article","\n                        {\n        \"@context\": \"https://schema.org\",\n        \"@type\": \"Article\",\n        \"headline\": \"The Consul outage that never happened\",\n        \"author\": [{\"@type\":\"Person\",\"name\":\"Devin Sylva\"}],\n        \"datePublished\": \"2019-11-08\",\n      }",{"title":9,"description":10,"authors":17,"heroImage":11,"date":19,"body":20,"category":21,"tags":22},[18],"Devin Sylva","2019-11-08","When things go wrong on a large website, it can be fun to read the dramatic\nstories of high pressure incidents where nothing goes as planned. It makes\nfor good reading. Every once in a while though, we get a success story.\nEvery once in a while, things go exactly as planned.\n\n\n[GitLab.com](http://GitLab.com) is a large, high availability instance of\nGitLab. It is maintained by the [Infrastructure\ngroup](/company/team/?department=infrastructure-department), which currently\nconsists of 20 to 24 engineers (depending on how you count), four managers,\nand a director, distributed all around the world. Distributed, in this case,\ndoes not mean across a few different offices. There are three or four major\ncities which have more than one engineer but with the exception of coworking\ndays nobody is working from the same building.\n\n\nIn order to handle the load generated by about four million users working on\naround 12 million projects, GitLab.com breaks out the individual components\nof the GitLab product and currently spreads them out over 271 production\nservers.\n\n\nThe site is slowly migrating to using Hashicorp's\n[Consul](https://www.consul.io) for service location. Consul can be thought\nof like DNS, in that it associates a well-known name with the actual\nphysical location of that service. It also provides other useful functions\nsuch as storing dynamic configuration for services, as well as locking for\nclusters. All of the Consul client and server components talk to each other\nover encrypted connections. These connections require a certificate at each\nend to validate the identity of the client and server and to provide the\nencryption key. The main component of GitLab.com which currently relies on\nthis service is the database and its high availability system\n[Patroni](https://patroni.readthedocs.io/en/latest/). Like any website that\nprovides functionality and not just information, the database is the central\nservice that everything else depends on. Without the database, the website,\nAPI, CI pipelines, and git services will all deny requests and return\nerrors.\n\n\n## Troubleshooting\n\n\nThe [issue](https://gitlab.com/gitlab-com/gl-infra/production/issues/1037)\ncame to our attention when a database engineer noticed that one of our\ndatabase servers in the staging environment could not reconnect to the\nstaging Consul server after the database node was restarted.\n\n\nIt turns out that the TLS certificate was expired. This is normally a simple\nfix. Someone would go to the Certificate Authority (CA) and request a\nrenewal – or if that fails, generate a new certificate to be signed by the\nsame CA. That certificate would replace the expired copy and the service\nwould be restarted. All of the connections should reestablish using the new\ncertificate and just like with any other rolling configuration change, it\nshould be transparent to all users.\n\n\nAfter looking everywhere, and asking everyone on the team, we got the\ndefinitive answer that the CA key we created a year ago for this self-signed\ncertificate had been lost.\n\n\nThese test certificates were generated for the original proof-of-concept\ninstallation for this service and were never intended to be transitioned\ninto production. However, since everything was working perfectly, the\nexpired test certificate had not been calling attention to itself. A few\nthings should have been done, including: Rebuilding the service with\nproduction in mind; conducting a production readiness review; and\nmonitoring. But a year ago, our production team was in a very different\nplace. We were small with just four engineers, and three new team members: A\nmanager, director, and engineer, all of whom were still onboarding. We were\nless focused on the gaps that led to this oversight a year ago and more\nfocused on fixing the urgent problem today.\n\n\n### Validating the problem\n\n\nFirst, we needed to validate the problem using the information we'd\ngathered. Since we couldn't update the existing certificates, we turned\nvalidation off on the client that couldn't connect. Turning validation off\ndidn't change anything since the encrypted connections validate both the\ncluster side and client side. Next, we changed the setting on one server\nnode in the cluster and so the restarted client could then connect to the\nserver node. The problem now was that the server could no longer connect to\nany other cluster node and could not rejoin the cluster. The server we\nchanged was not validating connections, meaning it was ignoring the expired\ncertificate of its peers in the cluster but the peers were not returning the\nfavor. They were shunning it, putting the whole cluster in a degraded state.\n\n\nWe realized that no matter what we did, some servers and some clients would\nnot be able to connect to each other until after the change had been made\neverywhere and after every service was restarted. Unfortunately, we were\ntalking about 255 of our 271 servers. Our tool set is designed for gradual\nrollouts, not simultaneous actions.\n\n\nWe were unsure why the site was even still online because if the clients and\nservices could not connect it was unclear why anything was still working. We\nran a small test, confirming the site was only working because the\nconnections were already established when the certificates expired. Any\ninterruption of these long-running connections would cause them to\nrevalidate the new connections, resulting in them rejecting all new\nconnections across the fleet.\n\n\n> Effectively, we were in the middle of an outage that had already started,\nbut hadn't yet gotten to the point of taking down the site.\n\n\n### Testing in staging\n\n\nWe declared an incident and began testing every angle we could think of in\nthe staging environment, including:\n\n\n* Reloading the configuration of the running service, which worked fine and\ndid not drop connections, but the [certificate\nsettings](https://github.com/hashicorp/consul/pull/4204) are [not included\nin the reloadable\nsettings](https://www.consul.io/docs/agent/options.html#reloadable-configuration)\nfor our version of Consul.\n\n* Simultaneous restarts of various services, which worked, but our tools\nwouldn't allow us to do that with ALL of the nodes at once.\n\n\nEverything we tried indicated that we had to break those existing\nconnections in order to activate any change, and that we could only avoid\ndowntime if that happened on **ALL nodes at precisely the same time**.\n\n\nEvery problem uncovered other problems and as we were troubleshooting one of\nour production Consul servers became unresponsive, disconnected all SSH\nsessions, and would not allow anyone to reconnect. The server did not log\nany errors. It was still sending monitoring data and was still participating\nin the Consul cluster. If we restarted the server, then it would not have\nbeen able to reconnect to its peers and we would have an even number of\nnodes. Not having quorum in the cluster would have been dangerous when we\nwent to restart all of the nodes, so we left it in that state for the\nmoment.\n\n\n## Planning\n\n\nOnce the troubleshooting was finished [it was time to start\nplanning](https://gitlab.com/gitlab-com/gl-infra/production/issues/1042).\n\n\nThere were a few ways to solve the problem. We could:\n\n\n* Replace the CA and the certificates with new self-signed ones.\n\n* Change the CA setting to point to the system store, allowing us to use\ncertificates signed by our standard certificate provider and then replace\nthe certificates.\n\n* Disable the validation of the dates so that the expired certificate would\nnot cause connections to fail.\n\n\nAll of these options would incur the same risks and involve the same risky\nrestart of all services at once.\n\n\nWe picked the last option. Our reasoning was that disabling the validation\nwould eliminate the immediate risk and give us time to slowly roll out a\nproperly robust solution in the near future, without having to worry about\ndisrupting the whole system. It was also the [smallest and most incremental\nchange](https://handbook.gitlab.com/handbook/values/#iteration).\n\n\n### Working asynchronously to tackle the problem\n\n\nWhile there was some time pressure due to the [risk of network connections\nbeing\ninterrupted](https://gitlab.com/gitlab-com/gl-infra/production/issues/1037#note_201745119),\nwe had to consider the reality of working across timezones as we planned our\nsolution.\n\n\n> We decided not to hand it off to the European shift, who were coming\nonline soon. Being a [globally distributed](/company/culture/all-remote/)\nteam, we had already handed things off from the end of the day in Mongolia,\nthrough Eastern and Western Europe and across the Americas, and were\napproaching the end of the day in Hawaii and New Zealand.\n\n\nAustralia still had a few more hours and Mongolia had started the day again,\nbut the folks who had been troubleshooting it throughout the day had a\npretty good handle on what needed to happen and what could go wrong. It made\nsense for them to be the ones to do the work. We decided to make a \"Break\nGlass\" plan instead. This was a merge request with all of the changes and\ninformation necessary for the European shift to get us back into a good\nstate in case a full outage happened before anyone who had been working on\nit woke up. Everyone slept better knowing that we had a plan that would work\neven if it could not be executed without causing down time. If we were\nalready experiencing down time, there would be no problem.\n\n\n### Designing our approach\n\n\nIn the morning (HST) everything was how we left it so we started planning\nhow to change the settings and restart all of the services without downtime.\nOur normal management tools were out because of the time it takes to roll\nout changes. Even sequential tools such as `knife ssh`, `mussh`, or\n`ansible` wouldn't work because the change had to be **precisely\nsimultaneous**. Someone joked about setting it up in `cron` which led us to\nthe standard linux `at` command (a relative of the more widely used\n`batch`). `cron` would require cleanup afterward but an `at` command can be\npushed out ahead of time with a sequential tool and will run a command at a\nprecise time on all machines. Back in the days of hands-on, bare metal\nsystem administration, it was a useful trick for running one-time\nmaintenance in the middle of the night or making it look like you were\nworking when you weren't. Now `at` has become more obscure with the trend\ntoward managing fleets of servers rather than big monolithic central\nmachines. We chose to run the command `sudo systemctl restart\nconsul.service`. We tested this in staging to verify that our Ubuntu\ndistribution made environment variables like `$PATH` available, and that\n`sudo` did not ask for a password. On some distributions (older CentOS\nespecially) this is not always the case.\n\n\nWith those successful tests, we still needed to change the config files.\nLuckily, there is nothing that prevents changing these ahead of time since\nthe changes aren't picked up until the service restarts. We didn't want to\ndo this step at the same time as the service restart so we could validate\nthe changes and keep the `at` command as small as possible. We decided not\nto use Chef to push out the change because we needed complete and immediate\ntransparency. Any nodes that did not get the change would fail after the\nrestart. `mussh` was the tool that offered the most control and visibility\nwhile still being able to change all hosts with one command.\n\n\nWe also had to disable the Chef client so that it didn't overwrite the\nchanges between when they were written and when the service restarted.\n\n\nBefore running anything we also needed to address the one Consul server that\nwe couldn't access. It likely just needed to be rebooted and would come up\nand be unable to reconnect to the cluster. The best option was to do this\nmanually just before starting the rest of the procedure.\n\n\nOnce we had mapped out the plan we practiced it in the disaster recovery\nenvironment. We used the disaster recovery environment instead of the\nstaging environment because all of the nodes in the staging environment had\nalready been restarted, so there were no long-running connections to test.\nMaking the disaster recovery environment was the next best option. It did\nnot go perfectly since the database in this environment was already in an\nunhealthy state but it gave us valuable information to adjust the plan.\n\n\n## Pre-execution\n\n\n### A moment of panic\n\n\nIt was almost time to fix the inaccessible Consul node. The team connected\nin to one of the other nodes to monitor and watch logs. Suddenly, the second\nnode started disconnecting people. It was behaving exactly like the\ninaccessible node had the previous day. 😱 Suspiciously, it didn't\ndisconnect everyone. Those who were still logged in noticed that `sshguard`\nwas blocking access to some of the bastion servers that all of our ssh\ntraffic flows through when accessing the internal nodes:\n[Infrastructure#7484](https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7484).\nWe have three bastion servers, and two were blocked because so many of us\nconnected so many sessions so quickly. Disabling `sshguard` allowed everyone\nback in and that information was the hint we needed to manually find the one\nbastion which hadn't yet been blocked. It got us back into the original\nproblem server. Disabling `sshguard` there left us with a fully functional\nnode and with the ability to accept the `at` command to restart the Consul\nservice at exactly the same time as the others.\n\n\nWe verified that we had an accurate and instantaneous way to monitor the\nstate of the services. Watching the output of the `consul operator raft\nlist-peers` command every second gave us view that looked like this:\n\n\n```\n\nNode                Address          State     Voter  RaftProtocol\n\nconsul-01-inf-gprd  10.218.1.4:8300  follower  true   3\n\nconsul-03-inf-gprd  10.218.1.2:8300  leader    true   3\n\nconsul-05-inf-gprd  10.218.1.6:8300  follower  true   3\n\nconsul-04-inf-gprd  10.218.1.5:8300  follower  true   3\n\nconsul-02-inf-gprd  10.218.1.3:8300  follower  true   3\n\n```\n\n\n### More nodes, more problems\n\n\nEven the most thorough plans always miss something. At this point we\nrealized that one of the three `pgbouncer` nodes which direct traffic to the\ncorrect database instance was not showing as healthy in the load balancer.\nOne is normally in this state as a warm spare, but one of the side effects\nof disconnecting the `pgbouncer` nodes from Consul is that they would all\nfail their load balancer health checks. If all health checks are failing,\nGCP load balancers send requests to ALL nodes as a safety feature. This\nwould lead to too many connections to our database servers, causing\nunintended consequences. We worked around this by removing the unhealthy\nnode from the load balancer pool for the remainder of this activity.\n\n\n* We checked that the lag on the database replicas was zero, and that they\nweren't trying to replicate any large and time-consuming transactions.\n\n* We generated a text list of all of the nodes that run the Consul client or\nserver.\n\n* We verified the time zone (UTC) and time synchronization on all of those\nservers to ensure that when the `at` command executed the restart, an\nunsynchronized clock wouldn't cause unintended behavior.\n\n* We also verified the `at` scheduler was running on all of those nodes, and\nthat `sudo` would not ask for a password.\n\n* We verified the script that would edit the config files, and tested it\nagainst the staging environment.\n\n* We also made sure `sshguard` was disabled and wasn't going to lock out the\nscripted process for behaving like a scripted process.\n\n\nThis might seem like a lot of steps but without any of these prerequisites\nthe whole process would fail. Once all of that was done, everything was\nready to go.\n\n\n## Execution\n\n\nIn the end, we scheduled a maintenance window and distilled all of the\nresearch and troubleshooting down to the [steps in this\nissue](https://gitlab.com/gitlab-com/gl-infra/production/issues/1042).\n\n\nEverything was staged and it was time to make the changes. This course of\naction included four key steps. First, we paused the Patroni database high\navailability subsystem. Pausing would freeze database failover and keep the\nhigh availability configuration static until we were done. It would have\nbeen bad if we had a database failure during this time so minimizing the\namount of time in this state was important.\n\n\nNext, we ran a script on every machine that stopped the Chef client service\nand then changed the verify lines in the config files from true to false. It\nwouldn't help to have Chef trying to reconfigure anything as we made\nchanges. We did this using `mussh` in batches of 20 servers at a time. Any\nmore in parallel and our SSH agent and Yubikeys may not have been able to\nkeep up. We were not expecting change in the state of anything from this\nstep. The config files on disk should have the new values but the running\nservices wouldn't change, and more importantly, no TCP connections would\ndisconnect. That was what we got so it was time for some verification.\n\n\nOur third step was to check all of the servers and a random sampling of\nclient nodes to make sure config files had been modified appropriately. It\nwas also a good time to double-check that the Chef client was disabled. This\ncheck turned out to be a good thing to do, because there were a few nodes\nthat still had the Chef client active. It turned out that those nodes were\nin the middle of a run when we disabled the service, and it reenabled the\nservice for us when the run completed. Chef can be _so_ helpful. We disabled\nit manually on the few machines that were affected. This delayed our\nmaintenance window by a few minutes, so we were very glad we didn't schedule\nthe `at` commands first.\n\n\nFinally, we needed to remove the inactive `pgbouncer` node from the load\nbalancer, so when the load balancer went into its safety mode, it would only\nsend traffic to the two that were in a known state. You might think that\nremoving it from the load balancer would be enough, but since it also\nparticipates in a cluster via Consul the whole service needed to be shut\ndown along with the health check, which the load balancer uses to determine\nwhether to send it traffic. We made a note of the full command line from the\nprocess table, shut it down, and removed it from the pool.\n\n\n### The anxiety builds\n\n\nNow was the moment of truth. It was 02:10 UTC. We pushed the following\ncommand to every server (20 at a time, using `mussh`): `echo 'sudo systemctl\nrestart consul.service' | at 02:20` – it took about four minutes to\ncomplete. Then we waited. We monitored the Consul servers by running `watch\n-n 1 consul operator raft list-peers` on each of them in a separate\nterminal. We bit our nails. We watched the dashboards for signs of db\nconnection errors from the frontend nodes. We all held our breath, and\nwatched the database for signs of distress. Six minutes is a long time to\nthink: \"It's 4am in Europe, so they won't notice\" and \"It's dinner time on\nthe US west coast, maybe they won't notice\". Trust me, six minutes is a\n_really_ long time: \"Sorry APAC users for your day, which we are about to\nruin by missing something\".\n\n\nWe counted down the last few seconds and watched. In the first second, the\nConsul servers all shut down, severing the connections that were keeping\neverything working. All 255 of the clients restarted at the same time. In\nthe next second, we watched the servers return `Unexpected response code:\n500`, which means \"connection refused\" in this case. The third second...\nstill returning \"panic now\" or maybe it was \"connection refused\"... The\nfourth second all nodes returned `no leader found`, which meant that the\nconnection was not being refused but the cluster was not healthy. The fifth\nsecond, no change. I'm thinking, just breathe, they were probably all\ndiscovering each other. In the sixth second, still no change: Maybe they're\nelecting a leader? Second seven was the appropriate time for worry and\npanic. Then, the eighth second brought good news `node 04 is the leader`.\nAll other nodes healthy and communicating properly. In the ninth second, we\nlet out a collective (and globally distributed) exhale.\n\n\n### A quick assessment\n\n\nNow it was time to check what damage that painfully long eight seconds had\ndone. We went through our checklist:\n\n\n* The database was still processing requests, no change.\n\n* The web and API nodes hadn't thrown any errors. They must have restarted\nfast enough that the cached database addresses were still being used.\n\n* The most important metric – the graph of 500 errors seen by customers:\nThere was no change.\n\n\nWe expected to see a small spike in errors, or at least some identifiable\nchange, but there was nothing but the noise floor. This was excellent news!\n🎉\n\n\nThen we checked whether the database was communicating with the Consul\nservers. It was not. Everyone quickly turned their attention to the backend\ndatabase servers. If they had been running normally and the high\navailability tool hadn't been paused, an unplanned failover would be the\nminimum outage we could have hoped for. It's likely that they would have\ngotten into a very bad state. We started to troubleshoot why it wasn't\ncommunicating with the Consul server, but about one minute into the change,\nthe connection came up and everything synced. Apparently it just needed a\nlittle more time than the others. We verified everything, and when everyone\nwas satisfied we turned the high availability back on.\n\n\n## Cleanup\n\n\nNow that everything in the critical path was working as expected, we\nreleased the tension from our shoulders. We re-enabled Chef and merged the\nMR pinning the Chef recipes to the newer version, and the MR's CI job pushed\nthe newer version to our Chef server. After picking a few low-impact\nservers, we manually kicked off Chef runs after checking the `md5sum` of the\nConsul client config files. After Chef finished, there was no change to the\nfile, and the Chef client service was running normally again. We followed\nthe same process on the Consul servers with the same result, and manually\nimplemented it on the database servers, just for good measure. Once those\nall looked good, we used `mussh` to kick off a Chef run on all of the\nservers using the same technique we used to turn them off.\n\n\nNow all that was left was to straighten everything out with `pgbouncer` and\nthe database load balancer and then we could fully relax. Looking at the\nheath checks, we noticed that the two previously healthy nodes were not\nreturning healthy. The health checks are used to tell the load balancer\nwhich `pgbouncer` nodes have a Consul lock and therefore which nodes to send\nthe traffic. A little digging showed that after retrying to connect to the\nConsul service a few times, they gave up. This was not ideal, so we [opened\nan Infrastructure\nissue](https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7612) to\nfix it later and restarted the health checks manually. Everything showed\nnormal so we added the inactive node back to the load balancer. The inactive\nnode's health check told the load balancer not to select it, and since the\nload balancer was no longer in failsafe mode (due to the other node's health\nchecks succeeding) the load balancer refrained from sending it traffic.\n\n\n## Conclusion\n\n\nSimultaneously restarting all of the Consul components with the new\nconfiguration put everything back into its original state, other than the\nvalidation setting which we set to false, and the TCP sessions which we\nrestarted. After this change, the Consul clients will still be using TLS\nencryption but will ignore the fact that our cert is now expired. This is\nstill not an ideal state but it gives us time to get there in a thoughtful\nway rather than as a rushed workaround.\n\n\nEvery once in a while we get into a situation that all of the fancy\nmanagement tools just can't fix. There is no run book for situations such as\nthe one we encountered. The question we were asked most frequently once\npeople got up to speed was: \"Isn't there some instructional walkthrough\npublished somewhere for this type of thing?\". For replacing a certificate\nfrom the same authority, yes definitely. For replacing a certificate on\nmachines that can have downtime, there are plenty. But for keeping traffic\nflowing when hundreds of nodes need to change a setting and reconnect within\na few seconds of each other... that's just not something that comes up very\noften. Even if someone wrote up the procedure it wouldn't work in our\nenvironment with all of the peripheral moving parts that required our\nattention.\n\n\nIn these types of situations there is no shortcut around thinking things\nthrough methodically. In this case, there were no tools or technologies that\ncould solve the problem. Even in this new world of infrastructure as code,\nsite reliability engineering, and cloud automation, there is still room for\nold fashioned system administrator tricks. There is just no substitute for\nunderstanding how everything works. We can try to abstract it away to make\nour day-to-day responsibilities easier, but when it comes down to it there\nwill always be times when the best tool for the job is a solid plan.\n\n\nCover image by [Thomas\nJensen](https://unsplash.com/@thomasjsn?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)\non [Unsplash](https://unsplash.com)\n\n{: .note}\n","engineering",[23],"production",{"slug":25,"featured":6,"template":26},"the-consul-outage-that-never-happened","BlogPost","content:en-us:blog:the-consul-outage-that-never-happened.yml","yaml","The Consul Outage That Never Happened","content","en-us/blog/the-consul-outage-that-never-happened.yml","en-us/blog/the-consul-outage-that-never-happened","yml",{"_path":35,"_dir":36,"_draft":6,"_partial":6,"_locale":7,"data":37,"_id":459,"_type":28,"title":460,"_source":30,"_file":461,"_stem":462,"_extension":33},"/shared/en-us/main-navigation","en-us",{"logo":38,"freeTrial":43,"sales":48,"login":53,"items":58,"search":390,"minimal":421,"duo":440,"pricingDeployment":449},{"config":39},{"href":40,"dataGaName":41,"dataGaLocation":42},"/","gitlab logo","header",{"text":44,"config":45},"Get free trial",{"href":46,"dataGaName":47,"dataGaLocation":42},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com&glm_content=default-saas-trial/","free trial",{"text":49,"config":50},"Talk to sales",{"href":51,"dataGaName":52,"dataGaLocation":42},"/sales/","sales",{"text":54,"config":55},"Sign in",{"href":56,"dataGaName":57,"dataGaLocation":42},"https://gitlab.com/users/sign_in/","sign in",[59,103,201,206,311,371],{"text":60,"config":61,"cards":63,"footer":86},"Platform",{"dataNavLevelOne":62},"platform",[64,70,78],{"title":60,"description":65,"link":66},"The most comprehensive AI-powered DevSecOps Platform",{"text":67,"config":68},"Explore our Platform",{"href":69,"dataGaName":62,"dataGaLocation":42},"/platform/",{"title":71,"description":72,"link":73},"GitLab Duo (AI)","Build software faster with AI at every stage of development",{"text":74,"config":75},"Meet GitLab Duo",{"href":76,"dataGaName":77,"dataGaLocation":42},"/gitlab-duo/","gitlab duo ai",{"title":79,"description":80,"link":81},"Why GitLab","10 reasons why Enterprises choose GitLab",{"text":82,"config":83},"Learn more",{"href":84,"dataGaName":85,"dataGaLocation":42},"/why-gitlab/","why gitlab",{"title":87,"items":88},"Get started with",[89,94,99],{"text":90,"config":91},"Platform Engineering",{"href":92,"dataGaName":93,"dataGaLocation":42},"/solutions/platform-engineering/","platform engineering",{"text":95,"config":96},"Developer Experience",{"href":97,"dataGaName":98,"dataGaLocation":42},"/developer-experience/","Developer experience",{"text":100,"config":101},"MLOps",{"href":102,"dataGaName":100,"dataGaLocation":42},"/topics/devops/the-role-of-ai-in-devops/",{"text":104,"left":105,"config":106,"link":108,"lists":112,"footer":183},"Product",true,{"dataNavLevelOne":107},"solutions",{"text":109,"config":110},"View all Solutions",{"href":111,"dataGaName":107,"dataGaLocation":42},"/solutions/",[113,138,162],{"title":114,"description":115,"link":116,"items":121},"Automation","CI/CD and automation to accelerate deployment",{"config":117},{"icon":118,"href":119,"dataGaName":120,"dataGaLocation":42},"AutomatedCodeAlt","/solutions/delivery-automation/","automated software delivery",[122,126,130,134],{"text":123,"config":124},"CI/CD",{"href":125,"dataGaLocation":42,"dataGaName":123},"/solutions/continuous-integration/",{"text":127,"config":128},"AI-Assisted Development",{"href":76,"dataGaLocation":42,"dataGaName":129},"AI assisted development",{"text":131,"config":132},"Source Code Management",{"href":133,"dataGaLocation":42,"dataGaName":131},"/solutions/source-code-management/",{"text":135,"config":136},"Automated Software Delivery",{"href":119,"dataGaLocation":42,"dataGaName":137},"Automated software delivery",{"title":139,"description":140,"link":141,"items":146},"Security","Deliver code faster without compromising security",{"config":142},{"href":143,"dataGaName":144,"dataGaLocation":42,"icon":145},"/solutions/security-compliance/","security and compliance","ShieldCheckLight",[147,152,157],{"text":148,"config":149},"Application Security Testing",{"href":150,"dataGaName":151,"dataGaLocation":42},"/solutions/application-security-testing/","Application security testing",{"text":153,"config":154},"Software Supply Chain Security",{"href":155,"dataGaLocation":42,"dataGaName":156},"/solutions/supply-chain/","Software supply chain security",{"text":158,"config":159},"Software Compliance",{"href":160,"dataGaName":161,"dataGaLocation":42},"/solutions/software-compliance/","software compliance",{"title":163,"link":164,"items":169},"Measurement",{"config":165},{"icon":166,"href":167,"dataGaName":168,"dataGaLocation":42},"DigitalTransformation","/solutions/visibility-measurement/","visibility and measurement",[170,174,178],{"text":171,"config":172},"Visibility & Measurement",{"href":167,"dataGaLocation":42,"dataGaName":173},"Visibility and Measurement",{"text":175,"config":176},"Value Stream Management",{"href":177,"dataGaLocation":42,"dataGaName":175},"/solutions/value-stream-management/",{"text":179,"config":180},"Analytics & Insights",{"href":181,"dataGaLocation":42,"dataGaName":182},"/solutions/analytics-and-insights/","Analytics and insights",{"title":184,"items":185},"GitLab for",[186,191,196],{"text":187,"config":188},"Enterprise",{"href":189,"dataGaLocation":42,"dataGaName":190},"/enterprise/","enterprise",{"text":192,"config":193},"Small Business",{"href":194,"dataGaLocation":42,"dataGaName":195},"/small-business/","small business",{"text":197,"config":198},"Public Sector",{"href":199,"dataGaLocation":42,"dataGaName":200},"/solutions/public-sector/","public sector",{"text":202,"config":203},"Pricing",{"href":204,"dataGaName":205,"dataGaLocation":42,"dataNavLevelOne":205},"/pricing/","pricing",{"text":207,"config":208,"link":210,"lists":214,"feature":298},"Resources",{"dataNavLevelOne":209},"resources",{"text":211,"config":212},"View all resources",{"href":213,"dataGaName":209,"dataGaLocation":42},"/resources/",[215,248,270],{"title":216,"items":217},"Getting started",[218,223,228,233,238,243],{"text":219,"config":220},"Install",{"href":221,"dataGaName":222,"dataGaLocation":42},"/install/","install",{"text":224,"config":225},"Quick start guides",{"href":226,"dataGaName":227,"dataGaLocation":42},"/get-started/","quick setup checklists",{"text":229,"config":230},"Learn",{"href":231,"dataGaLocation":42,"dataGaName":232},"https://university.gitlab.com/","learn",{"text":234,"config":235},"Product documentation",{"href":236,"dataGaName":237,"dataGaLocation":42},"https://docs.gitlab.com/","product documentation",{"text":239,"config":240},"Best practice videos",{"href":241,"dataGaName":242,"dataGaLocation":42},"/getting-started-videos/","best practice videos",{"text":244,"config":245},"Integrations",{"href":246,"dataGaName":247,"dataGaLocation":42},"/integrations/","integrations",{"title":249,"items":250},"Discover",[251,256,260,265],{"text":252,"config":253},"Customer success stories",{"href":254,"dataGaName":255,"dataGaLocation":42},"/customers/","customer success stories",{"text":257,"config":258},"Blog",{"href":259,"dataGaName":5,"dataGaLocation":42},"/blog/",{"text":261,"config":262},"Remote",{"href":263,"dataGaName":264,"dataGaLocation":42},"https://handbook.gitlab.com/handbook/company/culture/all-remote/","remote",{"text":266,"config":267},"TeamOps",{"href":268,"dataGaName":269,"dataGaLocation":42},"/teamops/","teamops",{"title":271,"items":272},"Connect",[273,278,283,288,293],{"text":274,"config":275},"GitLab Services",{"href":276,"dataGaName":277,"dataGaLocation":42},"/services/","services",{"text":279,"config":280},"Community",{"href":281,"dataGaName":282,"dataGaLocation":42},"/community/","community",{"text":284,"config":285},"Forum",{"href":286,"dataGaName":287,"dataGaLocation":42},"https://forum.gitlab.com/","forum",{"text":289,"config":290},"Events",{"href":291,"dataGaName":292,"dataGaLocation":42},"/events/","events",{"text":294,"config":295},"Partners",{"href":296,"dataGaName":297,"dataGaLocation":42},"/partners/","partners",{"backgroundColor":299,"textColor":300,"text":301,"image":302,"link":306},"#2f2a6b","#fff","Insights for the future of software development",{"altText":303,"config":304},"the source promo card",{"src":305},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758208064/dzl0dbift9xdizyelkk4.svg",{"text":307,"config":308},"Read the latest",{"href":309,"dataGaName":310,"dataGaLocation":42},"/the-source/","the source",{"text":312,"config":313,"lists":315},"Company",{"dataNavLevelOne":314},"company",[316],{"items":317},[318,323,329,331,336,341,346,351,356,361,366],{"text":319,"config":320},"About",{"href":321,"dataGaName":322,"dataGaLocation":42},"/company/","about",{"text":324,"config":325,"footerGa":328},"Jobs",{"href":326,"dataGaName":327,"dataGaLocation":42},"/jobs/","jobs",{"dataGaName":327},{"text":289,"config":330},{"href":291,"dataGaName":292,"dataGaLocation":42},{"text":332,"config":333},"Leadership",{"href":334,"dataGaName":335,"dataGaLocation":42},"/company/team/e-group/","leadership",{"text":337,"config":338},"Team",{"href":339,"dataGaName":340,"dataGaLocation":42},"/company/team/","team",{"text":342,"config":343},"Handbook",{"href":344,"dataGaName":345,"dataGaLocation":42},"https://handbook.gitlab.com/","handbook",{"text":347,"config":348},"Investor relations",{"href":349,"dataGaName":350,"dataGaLocation":42},"https://ir.gitlab.com/","investor relations",{"text":352,"config":353},"Trust Center",{"href":354,"dataGaName":355,"dataGaLocation":42},"/security/","trust center",{"text":357,"config":358},"AI Transparency Center",{"href":359,"dataGaName":360,"dataGaLocation":42},"/ai-transparency-center/","ai transparency center",{"text":362,"config":363},"Newsletter",{"href":364,"dataGaName":365,"dataGaLocation":42},"/company/contact/","newsletter",{"text":367,"config":368},"Press",{"href":369,"dataGaName":370,"dataGaLocation":42},"/press/","press",{"text":372,"config":373,"lists":374},"Contact us",{"dataNavLevelOne":314},[375],{"items":376},[377,380,385],{"text":49,"config":378},{"href":51,"dataGaName":379,"dataGaLocation":42},"talk to sales",{"text":381,"config":382},"Get help",{"href":383,"dataGaName":384,"dataGaLocation":42},"/support/","get help",{"text":386,"config":387},"Customer portal",{"href":388,"dataGaName":389,"dataGaLocation":42},"https://customers.gitlab.com/customers/sign_in/","customer portal",{"close":391,"login":392,"suggestions":399},"Close",{"text":393,"link":394},"To search repositories and projects, login to",{"text":395,"config":396},"gitlab.com",{"href":56,"dataGaName":397,"dataGaLocation":398},"search login","search",{"text":400,"default":401},"Suggestions",[402,404,408,410,414,418],{"text":71,"config":403},{"href":76,"dataGaName":71,"dataGaLocation":398},{"text":405,"config":406},"Code Suggestions (AI)",{"href":407,"dataGaName":405,"dataGaLocation":398},"/solutions/code-suggestions/",{"text":123,"config":409},{"href":125,"dataGaName":123,"dataGaLocation":398},{"text":411,"config":412},"GitLab on AWS",{"href":413,"dataGaName":411,"dataGaLocation":398},"/partners/technology-partners/aws/",{"text":415,"config":416},"GitLab on Google Cloud",{"href":417,"dataGaName":415,"dataGaLocation":398},"/partners/technology-partners/google-cloud-platform/",{"text":419,"config":420},"Why GitLab?",{"href":84,"dataGaName":419,"dataGaLocation":398},{"freeTrial":422,"mobileIcon":427,"desktopIcon":432,"secondaryButton":435},{"text":423,"config":424},"Start free trial",{"href":425,"dataGaName":47,"dataGaLocation":426},"https://gitlab.com/-/trials/new/","nav",{"altText":428,"config":429},"Gitlab Icon",{"src":430,"dataGaName":431,"dataGaLocation":426},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203874/jypbw1jx72aexsoohd7x.svg","gitlab icon",{"altText":428,"config":433},{"src":434,"dataGaName":431,"dataGaLocation":426},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203875/gs4c8p8opsgvflgkswz9.svg",{"text":436,"config":437},"Get Started",{"href":438,"dataGaName":439,"dataGaLocation":426},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com/compare/gitlab-vs-github/","get started",{"freeTrial":441,"mobileIcon":445,"desktopIcon":447},{"text":442,"config":443},"Learn more about GitLab Duo",{"href":76,"dataGaName":444,"dataGaLocation":426},"gitlab duo",{"altText":428,"config":446},{"src":430,"dataGaName":431,"dataGaLocation":426},{"altText":428,"config":448},{"src":434,"dataGaName":431,"dataGaLocation":426},{"freeTrial":450,"mobileIcon":455,"desktopIcon":457},{"text":451,"config":452},"Back to pricing",{"href":204,"dataGaName":453,"dataGaLocation":426,"icon":454},"back to pricing","GoBack",{"altText":428,"config":456},{"src":430,"dataGaName":431,"dataGaLocation":426},{"altText":428,"config":458},{"src":434,"dataGaName":431,"dataGaLocation":426},"content:shared:en-us:main-navigation.yml","Main Navigation","shared/en-us/main-navigation.yml","shared/en-us/main-navigation",{"_path":464,"_dir":36,"_draft":6,"_partial":6,"_locale":7,"title":465,"button":466,"image":471,"config":475,"_id":477,"_type":28,"_source":30,"_file":478,"_stem":479,"_extension":33},"/shared/en-us/banner","is now in public beta!",{"text":467,"config":468},"Try the Beta",{"href":469,"dataGaName":470,"dataGaLocation":42},"/gitlab-duo/agent-platform/","duo banner",{"altText":472,"config":473},"GitLab Duo Agent Platform",{"src":474},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1753720689/somrf9zaunk0xlt7ne4x.svg",{"layout":476},"release","content:shared:en-us:banner.yml","shared/en-us/banner.yml","shared/en-us/banner",{"_path":481,"_dir":36,"_draft":6,"_partial":6,"_locale":7,"data":482,"_id":686,"_type":28,"title":687,"_source":30,"_file":688,"_stem":689,"_extension":33},"/shared/en-us/main-footer",{"text":483,"source":484,"edit":490,"contribute":495,"config":500,"items":505,"minimal":678},"Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license",{"text":485,"config":486},"View page source",{"href":487,"dataGaName":488,"dataGaLocation":489},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/","page source","footer",{"text":491,"config":492},"Edit this page",{"href":493,"dataGaName":494,"dataGaLocation":489},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/content/","web ide",{"text":496,"config":497},"Please contribute",{"href":498,"dataGaName":499,"dataGaLocation":489},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/CONTRIBUTING.md/","please contribute",{"twitter":501,"facebook":502,"youtube":503,"linkedin":504},"https://twitter.com/gitlab","https://www.facebook.com/gitlab","https://www.youtube.com/channel/UCnMGQ8QHMAnVIsI3xJrihhg","https://www.linkedin.com/company/gitlab-com",[506,529,585,614,648],{"title":60,"links":507,"subMenu":512},[508],{"text":509,"config":510},"DevSecOps platform",{"href":69,"dataGaName":511,"dataGaLocation":489},"devsecops platform",[513],{"title":202,"links":514},[515,519,524],{"text":516,"config":517},"View plans",{"href":204,"dataGaName":518,"dataGaLocation":489},"view plans",{"text":520,"config":521},"Why Premium?",{"href":522,"dataGaName":523,"dataGaLocation":489},"/pricing/premium/","why premium",{"text":525,"config":526},"Why Ultimate?",{"href":527,"dataGaName":528,"dataGaLocation":489},"/pricing/ultimate/","why ultimate",{"title":530,"links":531},"Solutions",[532,537,539,541,546,551,555,558,562,567,569,572,575,580],{"text":533,"config":534},"Digital transformation",{"href":535,"dataGaName":536,"dataGaLocation":489},"/topics/digital-transformation/","digital transformation",{"text":148,"config":538},{"href":150,"dataGaName":148,"dataGaLocation":489},{"text":137,"config":540},{"href":119,"dataGaName":120,"dataGaLocation":489},{"text":542,"config":543},"Agile development",{"href":544,"dataGaName":545,"dataGaLocation":489},"/solutions/agile-delivery/","agile delivery",{"text":547,"config":548},"Cloud transformation",{"href":549,"dataGaName":550,"dataGaLocation":489},"/topics/cloud-native/","cloud transformation",{"text":552,"config":553},"SCM",{"href":133,"dataGaName":554,"dataGaLocation":489},"source code management",{"text":123,"config":556},{"href":125,"dataGaName":557,"dataGaLocation":489},"continuous integration & delivery",{"text":559,"config":560},"Value stream management",{"href":177,"dataGaName":561,"dataGaLocation":489},"value stream management",{"text":563,"config":564},"GitOps",{"href":565,"dataGaName":566,"dataGaLocation":489},"/solutions/gitops/","gitops",{"text":187,"config":568},{"href":189,"dataGaName":190,"dataGaLocation":489},{"text":570,"config":571},"Small business",{"href":194,"dataGaName":195,"dataGaLocation":489},{"text":573,"config":574},"Public sector",{"href":199,"dataGaName":200,"dataGaLocation":489},{"text":576,"config":577},"Education",{"href":578,"dataGaName":579,"dataGaLocation":489},"/solutions/education/","education",{"text":581,"config":582},"Financial services",{"href":583,"dataGaName":584,"dataGaLocation":489},"/solutions/finance/","financial services",{"title":207,"links":586},[587,589,591,593,596,598,600,602,604,606,608,610,612],{"text":219,"config":588},{"href":221,"dataGaName":222,"dataGaLocation":489},{"text":224,"config":590},{"href":226,"dataGaName":227,"dataGaLocation":489},{"text":229,"config":592},{"href":231,"dataGaName":232,"dataGaLocation":489},{"text":234,"config":594},{"href":236,"dataGaName":595,"dataGaLocation":489},"docs",{"text":257,"config":597},{"href":259,"dataGaName":5,"dataGaLocation":489},{"text":252,"config":599},{"href":254,"dataGaName":255,"dataGaLocation":489},{"text":261,"config":601},{"href":263,"dataGaName":264,"dataGaLocation":489},{"text":274,"config":603},{"href":276,"dataGaName":277,"dataGaLocation":489},{"text":266,"config":605},{"href":268,"dataGaName":269,"dataGaLocation":489},{"text":279,"config":607},{"href":281,"dataGaName":282,"dataGaLocation":489},{"text":284,"config":609},{"href":286,"dataGaName":287,"dataGaLocation":489},{"text":289,"config":611},{"href":291,"dataGaName":292,"dataGaLocation":489},{"text":294,"config":613},{"href":296,"dataGaName":297,"dataGaLocation":489},{"title":312,"links":615},[616,618,620,622,624,626,628,632,637,639,641,643],{"text":319,"config":617},{"href":321,"dataGaName":314,"dataGaLocation":489},{"text":324,"config":619},{"href":326,"dataGaName":327,"dataGaLocation":489},{"text":332,"config":621},{"href":334,"dataGaName":335,"dataGaLocation":489},{"text":337,"config":623},{"href":339,"dataGaName":340,"dataGaLocation":489},{"text":342,"config":625},{"href":344,"dataGaName":345,"dataGaLocation":489},{"text":347,"config":627},{"href":349,"dataGaName":350,"dataGaLocation":489},{"text":629,"config":630},"Sustainability",{"href":631,"dataGaName":629,"dataGaLocation":489},"/sustainability/",{"text":633,"config":634},"Diversity, inclusion and belonging (DIB)",{"href":635,"dataGaName":636,"dataGaLocation":489},"/diversity-inclusion-belonging/","Diversity, inclusion and belonging",{"text":352,"config":638},{"href":354,"dataGaName":355,"dataGaLocation":489},{"text":362,"config":640},{"href":364,"dataGaName":365,"dataGaLocation":489},{"text":367,"config":642},{"href":369,"dataGaName":370,"dataGaLocation":489},{"text":644,"config":645},"Modern Slavery Transparency Statement",{"href":646,"dataGaName":647,"dataGaLocation":489},"https://handbook.gitlab.com/handbook/legal/modern-slavery-act-transparency-statement/","modern slavery transparency statement",{"title":649,"links":650},"Contact Us",[651,654,656,658,663,668,673],{"text":652,"config":653},"Contact an expert",{"href":51,"dataGaName":52,"dataGaLocation":489},{"text":381,"config":655},{"href":383,"dataGaName":384,"dataGaLocation":489},{"text":386,"config":657},{"href":388,"dataGaName":389,"dataGaLocation":489},{"text":659,"config":660},"Status",{"href":661,"dataGaName":662,"dataGaLocation":489},"https://status.gitlab.com/","status",{"text":664,"config":665},"Terms of use",{"href":666,"dataGaName":667,"dataGaLocation":489},"/terms/","terms of use",{"text":669,"config":670},"Privacy statement",{"href":671,"dataGaName":672,"dataGaLocation":489},"/privacy/","privacy statement",{"text":674,"config":675},"Cookie preferences",{"dataGaName":676,"dataGaLocation":489,"id":677,"isOneTrustButton":105},"cookie preferences","ot-sdk-btn",{"items":679},[680,682,684],{"text":664,"config":681},{"href":666,"dataGaName":667,"dataGaLocation":489},{"text":669,"config":683},{"href":671,"dataGaName":672,"dataGaLocation":489},{"text":674,"config":685},{"dataGaName":676,"dataGaLocation":489,"id":677,"isOneTrustButton":105},"content:shared:en-us:main-footer.yml","Main Footer","shared/en-us/main-footer.yml","shared/en-us/main-footer",[691],{"_path":692,"_dir":693,"_draft":6,"_partial":6,"_locale":7,"content":694,"config":698,"_id":700,"_type":28,"title":18,"_source":30,"_file":701,"_stem":702,"_extension":33},"/en-us/blog/authors/devin-sylva","authors",{"name":18,"config":695},{"headshot":696,"ctfId":697},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749679087/Blog/Author%20Headshots/devin-headshot.jpg","devin",{"template":699},"BlogAuthor","content:en-us:blog:authors:devin-sylva.yml","en-us/blog/authors/devin-sylva.yml","en-us/blog/authors/devin-sylva",{"_path":704,"_dir":36,"_draft":6,"_partial":6,"_locale":7,"header":705,"eyebrow":706,"blurb":707,"button":708,"secondaryButton":712,"_id":714,"_type":28,"title":715,"_source":30,"_file":716,"_stem":717,"_extension":33},"/shared/en-us/next-steps","Start shipping better software faster","50%+ of the Fortune 100 trust GitLab","See what your team can do with the intelligent\n\n\nDevSecOps platform.\n",{"text":44,"config":709},{"href":710,"dataGaName":47,"dataGaLocation":711},"https://gitlab.com/-/trial_registrations/new?glm_content=default-saas-trial&glm_source=about.gitlab.com/","feature",{"text":49,"config":713},{"href":51,"dataGaName":52,"dataGaLocation":711},"content:shared:en-us:next-steps.yml","Next Steps","shared/en-us/next-steps.yml","shared/en-us/next-steps",1758326272945]