Categories
Mobile Syrup

Oxio warns of internet slowdowns for Ontario customers due to July 8 Rogers outage

Independent internet service provider (ISP) Oxio reached out to its Ontario customers via email Saturday to warn of potential slowdowns during peak hours (between 8pm and 10pm). Moreover, it requested customers try to not perform speed tests as it will “clog up the lines” and cause more slowdowns.

You can read a longer explanation below if you’re interested in the details of the internet in Canada, but the short version of what’s going on is Oxio needed a capacity increase from Rogers because of its growing customer base. Oxio requested that increase, but didn’t get it because of the July 8th outage, and is now stuck waiting for Rogers to resume performing network changes to get the increase it needs.

Oxio emailed customers to explain what was going on because it “promised to be up front” about everything. The main takeaways from the email include that Oxio is working on the problem with Rogers and the Competitive Network Operators of Canada (CNOC) since other ISPs might be affected. Another takeaway is that Ontario customers (disclosure, I am one) might experience slowdowns because of this.

Rogers halted network changes, pushing back Oxio’s capacity increase

A segment of the Oxio email explaining what’s going on with Rogers.

Oxio says that its growing customer base requires an increase in capacity in Ontario, and since Oxio runs on Rogers in the province, it needs to purchase capacity from Rogers. However, issues related to the July 8th Rogers outage — called ‘Red Friday’ by some — resulted in Oxio not getting the capacity increase that it needed.

In the email, Oxio explained that it submitted a request to Rogers to increase capacity on June 22nd, and the change was supposed to go into effect on July 7th. Oxio said Rogers didn’t increase capacity when it was supposed to (apparently, “this is pretty normal” with Rogers). However, unlike previous capacity increases, Oxio says Rogers implemented a “company-wide change embargo” after Red Friday.

Again, Oxio says this is pretty normal after an outage since network changes are responsible for most problems. Rogers has already detailed how its maintenance update caused a cascading problem through its core network that ultimately took out wireline, wireless, and several other services nationwide.

The embargo was set to end on July 18th. However, Rogers extended it several times, leaving Oxio with no scheduled date for the capacity increase:

“Since then, Rogers has extended their change embargo twice. The first time until July 25, 2022 and, recently, again for an indeterminate period, which means there’s no scheduled date to complete our request for additional capacity.”

Oxio says it’s not “too worried” about the embargo since it hasn’t hit maximum capacity yet. However, the company says its “rapid growth, means [it is] quickly running out of bandwidth,” which could lead to slowdowns at peak times.

Thankfully, it’s not all bad news. Oxio also told customers it hopes “to have all of this sorted before you notice anything.” The company says it’s talking with “the right people at Rogers” and has reached out to the CNOC because it likely isn’t the only other independent ISP impacted by the Rogers embargo.

Ultimately, if you’re with Oxio or another ISP that runs on Rogers’ network, you may want to keep an eye out for potential slowdowns and avoid doing speed tests until this all gets sorted out.

Categories
Mobile Syrup

Rogers responds to CRTC questions over outage, will split network

Rogers’ response to questions from the Canadian Radio-television and Telecommunications Commission (CRTC) about the July 8th outage — or ‘Red Friday,’ as Vass Bednar has taken to calling it — arrived late on July 22nd in a document filed to the CRTC’s website.

The lengthy, partially-redacted document (which downloads a .docx file) includes responses to various CRTC questions, with explanations about what happened, what Rogers will do to keep it from happening again, who was affected, and more. Rogers opens the document with a note that it will be “as transparent as possible” when answering the CRTC’s questions but also asked the CRTC to treat certain information in the document as confidential to protect the company’s customers, network, and vendors.

Frustratingly, Rogers redacted many details of its plans to prevent future outages.

Still, some of the broader goals remain available to the public. Rogers confirmed in the document that it plans to “increase resiliency in our networks and systems which will include fully segregating our wireless and wireline core networks,” as was previously reported by MobileSyrup.

Details on what caused the outage

Moreover, Rogers provided additional details about the cause of the outage. Previously, the company had said a maintenance update caused routers in its network to malfunction.

In the Friday disclosures, Rogers detailed that the update was the sixth in a seven-phase process that started on February 8th. The previous five phases “proceeded without incident.” That sixth stage began at 2:27am on July 8th (the company notes it usually performs upgrades at times when traffic is low). The update contained a coding error that started the issue at 4:43am, which cascaded through Rogers’ core network “very quickly.”

That coding error deleted a “routing filter” in Rogers’ distribution routers, which allowed all possible routes to the internet to flow through the routers. Rogers explains that this caused the routers to propagate “abnormally high volumes of routes throughout the core network,” leading certain network equipment to exceed capacity and fail.

Rogers goes on to describe that it uses a “common core” network — like “many large Telecommunications Services Providers” (TSPs) — that combines wireless, wireline and other sources. The company explains that its core consists of various vendors’ equipment, that different equipment can have different designs and routing management protocols, and that these differences are “at the heart of the outage.”

Rogers notes that the outage impacted employees, preventing them from connecting to the company’s IT and network infrastructure. While some Rogers employees were able to communicate with each other using Bell or Telus SIM cards they received as part of a 2015 emergency contingency plan established between the carriers, staff still had to travel to centralized locations to access the network and begin sorting out what went wrong and how to fix it. This contributed to delays in restoring service.

Again, much of this mirrors previous MobileSyrup reporting about what caused the outage, although there are some new details that weren’t known before. Primarily, previous external analysis of the outage indicated that the issues stemmed from gateway routers, whereas Rogers says the outage started with distribution routers.

Rogers says it couldn’t transfer customers to competitors’ networks

As the Globe and Mail highlights in its report, Rogers revealed in the disclosure that it couldn’t transfer customers to competitors’ networks during the outage.

Bell and Telus offered Rogers assistance, but the company determined it couldn’t transfer customers to the other networks since some aspects of Rogers’ network — such as the centralized user database — weren’t accessible due to the outage. Moreover, Rogers says that competitors’ networks wouldn’t “have been able to handle the extra and sudden volume of wireless users (over 10.2M) and the related voice/data traffic surge.”

Particularly interesting about this is the government response. Industry Minister François-Philippe Champagne directed Canadian telecom companies to develop a mutual assistance agreement to help each other during outages following the events of Red Friday. Given that Rogers couldn’t transfer customers to other networks and the claim that other networks couldn’t handle the surge in traffic, it remains unclear how telecoms could implement a mutual assistance structure without significant changes to each company’s network. Moreover, if Bell and Telus also use common core networks — as Rogers implies — then those networks are also potentially vulnerable to the same failure as Rogers’ network.

Still, Rogers said it will explore various mutual assistance options with other companies before delivering a formalized agreement to the minister in September.

Changes to the update review process and communication

Rogers also noted in the disclosure that it went through a “comprehensive planning process including scoping, budget approval, project approval, kickoff, design document, method of procedure, risk assessment, and testing, finally culminating in the engineering and implementation phases” for the update.

The company stressed that it makes updates to the core network “very carefully.”

However, Rogers said it would review the process it uses to plan and implement updates to the network. The company also detailed plans to improve communication between its teams and the public when it comes to outages.

Changes include giving communication teams backup devices on alternate networks to use if Rogers’ network fails, updating policies and procedures for sharing updates in the event of a “network blackout,” increasing the frequency of updates, providing information across all channels about impacts to critical services like 9-1-1, and ensuring all statements posted to social media include the use of alt text.

That last one is particularly interesting given Rogers and its flanker brands posted updates to Twitter using pictures, but people with visual impairments may not be able to read the text in a picture. Alt text provides descriptions of image appearance and function, which can be picked up by technology like screen readers to help people with visual impairments understand images.

What’s next

A House of Commons committee on industry and technology plans to study the outage and will have a hearing Monday. The Globe and Mail notes that Rogers replaced its chief technology officer (CTO), Jorge Fernandes, just days before this hearing.

Telecom veteran Ron McKenzie replaced Fernandes — as MobileSyrup reported, the change is unlikely to disrupt Rogers’ plans to address the outage by separating wireless and wireline traffic.

The Globe expects the committee, which includes members from all four major federal parties, will question Rogers executives about the outage and five-day credit delivered to customers to compensate them for the outage. Critics previously questioned whether the credit was enough, given the scope of the damage was far beyond not having service for several days. Moreover, a Quebec resident has filed a class-action lawsuit against the company seeking $400 for each customer impacted by the outage.

Those interested in diving into the details shared by Rogers can read the disclosure in full here (note the link will download a .docx file).

Source: CRTC (.docx file) Via: The Globe and Mail