By on October 17, 2019

As the most-successful manufacturer of electric vehicles, Tesla is often at the forefront of new challenges relating to advanced automotive technologies. While the brunt of this has revolved around its software, mainly Autopilot, it’s also going to be among the first automakers to confront widespread battery recycling ⁠— something it’s already planning for at its Nevada Gigafactory.

Nothing last forever and, like every internal combustion vehicle, EVs have parts that go bad. Over the last six months, there’s been a growing number of reports of customers claiming their Teslas are bricking out like old phones. Displays are going dark, accessories are… inaccessible, and charging is often not an option. The culprit appears to be the embedded Multi-Media Controller (eMMC) embedded on MCUv1 units, which logs data using flash memory.

Apparently, Tesla is overworking these systems (at least on some models) to a point where they can’t take it anymore. It’s basically the same thing that would happen if you filled and wiped a USB drive hundreds of times everyday. One morning you’d plug it in and find that it’s no longer functional due to being burnt out from overuse. 

Coverage of the issue started to really pick up after Jason Hughes, an independent repair professional from North Carolina’s 057 Technology, managed to get Elon Musk to respond to the issue. On October 9th, he tweeted out that he had encountered dozens of “MCUv1 units for customers suffering from eMMC flash failure,” suggesting that Tesla’s engineering team do something about it.

Musk responded that the problem should be “much better at this point.”

Hughes responded that he had seen models suffering from the problem from a version that was just a couple of weeks old, adding that he would keep an eye on newer versions.

Complaints of the flash-memory failure go back much further, however, and were gaining serious steam online as early as 2018. In May, the YouTube channel Rich Rebuilds (which is pretty good if you want an unbiased look into EVs) discussed the matter and opened up an MCUv1 to get a gander at the hardware with Phil Sadow.

“[Tesla] creates so many logs in the car, they write to [the eMMC] so fast that it basically burns them out. They have a finite amount of writes; they can only do so many writes. The amount of logging they’re doing is excessive,” Sadow explained.

InsideEVs, which has one of the most comprehensive explanations of the problem available, spoke with several of the repair professionals expressing concerns to get their opinion.

“The main issue is that this excessive log file writing causes eMMC flash wear,” explained Hughes. “Flash memory is generally only rated for some tens of thousands of write cycles. What happens is that the flash memory starts to fail when writings can no longer be completed. When one block fails, parts of the firmware may also become unreadable, leading to poor operation or failure of the MCU completely.”

Tesla Forums covered the issue back in April, with a poster noting the MCU repeatedly rebooted after 39,000 miles on their 2016 Model S. Repeated repair attempts failed to solve the issue, and they were understandably upset that they had to pay half the cost out of pocket ⁠— despite the vehicle still being under warranty. Tesla Motors Club has also covered the matter at length, with customers giving the play-by-play of their flash memory headaches.

Most owners seem to be encountering the problem very near the warranty cutoff. Those that make it technically should have their MCU replacement covered. If not, repairs can cost anywhere between $400 and $3,800 ⁠— depending on where you’re having the work done (and how they’re doing it).

Some independent shops have admitted that, while they’ve managed to find ways to replace the failed chip, reprogramming the system to not bombard itself with ever-cycling data has been harder to manage. Most units are still subject to overloading over time but some feel confident they’ve managed to sort that out in part by installing more robust memory chips or by simply sending some of those unnecessary ones and zeros to the RAM.

“Tesla needs to just disable syslog on all vehicles unless specifically required on a development car or to diagnose an infotainment issue on a specific car. There are absolutely zero reasons to log hundreds of MB per day to a small built-in flash chip,” said Hughes. “Tesla has known about this issue for years now and has done nothing to mitigate it. I’ve personally reported it on multiple occasions, and I know others have as well. I’ve noted this to Tesla on several occasions, starting in late 2015, and several times since.”

While everyone is largely concerned with MCUv1, which are logging more data than ever before, MCUv2 vehicles technically suffer from the issue as well. But their larger chip size is likely to buy the owner significantly more time, which is about all anyone outside the manufacturer can do anyway. Tesla will ultimately be the one that has to address the high amount of logging that’s ultimately creating this problem ⁠— especially for vehicles still under warranty.

“The cynic in me looks at this as a planned obsolescence type of thing,” said Hughes. “However, the reality is probably a lot more benign: laziness.”

[Image: JL IMAGES/Shutterstock]

Get the latest TTAC e-Newsletter!

Recommended

57 Comments on “Tesla Troubles: Models Bricking Over Flash Memory Problem...”


  • avatar
    ToddAtlasF1

    When a Tesla bricks, an angel gets his wings!

  • avatar
    SCE to AUX

    It’s safe to assume that other long-term EV mfrs don’t have this problem – namely Nissan, Chevy, Ford, and BMW – since we haven’t heard of such issues there.

    I agree that Tesla is being lazy on this. This design seems to reflect a developer’s mindset, in that they’d like to access abundant log files ‘just in case’.

    Ongoing Tesla drama is one reason I didn’t buy their product.

    • 0 avatar
      black_ice_so_nice

      I’m skeptical it’s just a high log level because that’s a pretty easy thing to mitigate post-release. Just turn the log level down, or if you’re desperate, pipe it all to /dev/null, axe logs completely, and at least it won’t brick more hardware while you fix it. Devs usually catch excess logging early because it causes a noticeable performance hit too. Musk’s response that it’s ‘better now’ implies it’s something they can’t just patch out, which is odd.

      • 0 avatar
        Vulpine

        My guess is that one of those OTA updates carried a bug that grossly increased the logging rate or maybe leaks too much data into the card. If the reported analyst is correct, then the simplest fix is to replace the card with a higher-capacity version so that it doesn’t have to read/write as often before wiping.

        Either way, the fix should ultimately be pretty simple and maybe even something Tesla’s Road Rangers (or whatever they call them) can handle in a few minutes. If we assume the cards, as suggested, are dying, then a replacement with a higher-capacity card should both bring the cars back on line AND delay the next round of failures. On the other hand, if it is a software glitch then the software team should be able to find it and clear it, reducing the number of write cycles while larger capacity can reduce the frequency of write cycles.

        I still, however, find it strange that if this is supposedly a four-year type of issue, it’s not showing up until now when Tesla has had cars on the road for over 8 years.

    • 0 avatar
      vvk

      > Ongoing Tesla drama

      You realize this is a complete and utter fabrication? There is no drama. The cars are amazing, the service is outstanding, there is literally zero drama associated with ownership.

      When the title of this article says “bricking,” the implication is that the car is dead. But when the eMMC fails, the car is still driveable and can be driven anywhere for any period of time. Yes, it is a serious issue that could cause an expensive repair but it is FAR less serious than many other issues that EVs or other conventional vehicles could face.

      • 0 avatar
        Fliggin_De_Fluge

        Are you a paid shill? Because you sound like a paid shill.

      • 0 avatar
        White Shadow

        A Tesla isn’t driving anywhere without the ability to charge.

      • 0 avatar
        baggins

        There is quite a bit of evidence that the cars are quite nice to drive and even “amazing”

        However,even fanboy forums contain lots of evidence that Tesla is struggling to efficiently scale its support and service operations to the volumes they are selling now. Wait times for repairs can be long, and defects on delivery can be challenging to get fixed quickly.

        If you deny these issues, then either you are ignorant/deluded or a shill.

        • 0 avatar
          vvk

          > If you deny these issues, then either you are ignorant/deluded or a shill.

          I am just a satisfied owner who does not live in California, Toronto, Vancouver or New York. My local service service center is excellent.

      • 0 avatar
        Flipper35

        If you are unable to charge or use the accessories and touch screen, then yes it is bricked.

        • 0 avatar
          vvk

          > If you are unable to charge or use the accessories and touch screen, then yes it is bricked.

          No, the car is still driveable. If this happens while crossing the railroad tracks, you are still able to drive the car to safety. Also, the bit about not being able to charge is not true in vast majority of cases. The biggest safety issue is that you are not able to turn on the AC and change distribution of air to the windshield.

          • 0 avatar
            JimZ

            “The biggest safety issue is that you are not able to turn on the AC and change distribution of air to the windshield.”

            that can and should be grounds for a recall.

          • 0 avatar
            vvk

            > that can and should be grounds for a recall.

            Completely agree. Hopefully, it will happen at some point.

  • avatar
    jalop1991

    “Some independent shops have admitted that, while they’ve managed to find ways to replace the failed chip, reprogramming the system to not bombard itself with ever-cycling data has been harder to manage. Most units are still subject to overloading over time but some feel confident they’ve managed to sort that out in part by installing more robust memory chips or by simply sending some of those unnecessary ones and zeros to the RAM.”

    You mean independent shops can get around Musk’s locks and reprogram the car??

    I’m waiting for a John Deere-like announcement from Tesla, declaring that to be illegal…

  • avatar
    sirwired

    Somebody over at Jalopnik also figured out that they set up the system in a way that did not allow the use of the load-leveling features that would otherwise help this issue.

  • avatar
    1337cr3w

    Should’ve bought a camry

  • avatar
    Vulpine

    Why haven’t we heard about this “issue” earlier? They’re talking about a 4-year limit yet there are 8- and even 10-year-old Teslas on the road which have been running trouble-free. Even Tesloop would have reported an issue like this before, would it not? Are we REALLY sure this is the problem or is it a symptom of a completely different issue?

    • 0 avatar
      JimZ

      the oldest Model S cars on the road are a little over 7 years old. The roadster likely didn’t have anything like the Model S electronics architecture. I’d wager how long it takes to kill the flash chip(s) depends on how much driving the owner has done and what features the car has which would influence the amount of data logging.

  • avatar
    Brett Woods

    This video is how I learned about the problem of too much data writing to wrong kind of media hardware in old Teslas. Not sure if Tesla have addressed this yet.

    https://www.youtube.com/watch?v=o-7b1waoj9Q

  • avatar

    I developed NAND flash memory controllers for embedded applications including eMMC long time ago. NAND flash memory has nigher cell density than NOR flash but does not natively support random memory access. You can imitate it but to do that have to read page (same as sector in hard drive), modify one byte in page image in RAM and and then save it on new erased page and mark previous page for erase. You cannot erase single page – you have to erase the block (e.g. 128 pages) called Erase Block. In MLC erase block lasts about 100,000 erase cycles. After that it has to be replaced with already erased block from reserve ECBs. After you run out of reserve ECBs you need replace NAND chip with new fresh one. Anyway, any write operation involves read page, modify, write to new page.

    So as you see NAND flash is is better suited for mostly reading, though read op disturbs MLC cells also but not that bad. Constantly writing to NAND flash guarantees that it will run out its resource pretty quickly. I hope Tesla does not outsource engineering to India or other developing country.

    • 0 avatar
      Daniel J

      I’ve done similar work. I’ve written device drivers for both Nor and Nand flash memory chips. I’ve also written drivers for nvSRAM. While much more expensive and much smaller, we used a 32KB nvSRAM for event logging. While nvSRAM has similar write cycles (1M), the store only occured during a power down event.

      One has to wonder how big these logs are where something like a 32K nvSRAM is too small to use.

      • 0 avatar
        karonetwentyc

        Speaking from an IT management perspective: I agree with everything both you and Inside Looking Out have said. My hardware development (embedded systems) days are well past, but there is one other thing to add to the mix:

        From my reading of the issue both here and elsewhere, it sounds as though mission-critical software was also installed on the failing flash devices that were being hit with way too many writes for comfort. Who thought that this was a good idea?

        Systems engineering 101: don’t use your system disk for data storage.

        If a disk is going to fail (be it mechanical or flash), it’s most likely to be the one that’s doing the greatest number of read/write cycles. While I am in total agreement that turning down syslog verbosity, wear levelling, different flash media compositions, etc. can all help with the problem, it’s still not clear as to why this basic dictum wasn’t followed in the first place.

        Sure, in that scenario, the flash media receiving the massive number of writes is still likely to fail faster, but at least when it does, the system (aka ‘car’) is still bootable (aka ‘driveable’) because the actual software necessary for actual things to actually work hasn’t sunk along with the logs and left you with a very shiny brick in your driveway.

        I’m sure that Tesla saved $0.79 per vehicle by not separating duties out properly, but it still boggles my mind that this (along with the overly-chatty logs) wasn’t caught before the devices went to production.

      • 0 avatar

        Logs can be pretty big esp if you hire substandard programmers. I saw cases when logging crashed the whole system by overwhelming kernel-user mode pipe by excessive logging and using the same pipe to send critical commands and data. I had to redesign the whole thing. But that was a leading company in field and SW was very poorly written. By Indian engineers.

        Regarding system SW, it is usually stored in EPROM or ROM and is unpacked and loaded into code segment of system RAM during boot up. It should not be in RW flash memory where you store system settings and etc. So even if flash is dead system SW will still work with default settings and has to get through learning cycle again.

  • avatar
    schmitt trigger

    What Inside Looking Out described is a well known caveat for nand flash memory.
    Every engineer worth its salt knows this fact and takes the necessary measures to avoid writing to the memory too often.

    There are many programming tricks, one of them is to write data to battery backed RAM, which should be trivial on an electric vehicle, and only write to EEPROM the whole block once a day or so.

    • 0 avatar
      JimZ

      The operation that “wears out” flash cells is erasing them. it’s academic, to be sure, since before you can over-write a cell you have to erase it first. But the other gotcha with flash is that you can only erase blocks; not individual cells. so you’re generally always erasing more cells than you actually need to.

      • 0 avatar

        BTW flash controller is usually developed by specialized companies who know what they are doing – I worked in one of them. I suspect the problem with Tesla is not controller but the architecture and implementation, in other words how eMMC was used and how logging mechanism was implemented and why no mission critical measures were taken. It should not brick no matter what. May be they saved time on design review or QA process.

  • avatar
    MBella

    All I can say is the local Tesla service center uses an old Sears parking lot for their overflow. It’s filled with customer cars with plates, and there is a large layer of dust on many of them.

  • avatar
    Daniel J

    Flash memory (Nor or Nand) has a limited number of write cycles. This is the problem when people who aren’t in computer engineering or hardware engineering design such systems. From what I’ve read in other news outlets, they even set the system up where standard wear leveling practices were bypassed for some reason. For logging purposes, nvSRAM probably would have been a better choice even though chips are smaller and more expensive.

    I just can’t imagine any engineer designing this system in such a way the car is bricked once the logs couldn’t be written to.

    • 0 avatar
      28-Cars-Later

      Oh I can, its akin to those who set up verbose logging but not configure the log files to roll over after a certain threshold. What happens is as the log files grow over time the whole system slows down as it has to constantly open and close an ever growing file on every logging operation.

      • 0 avatar
        Cactuar

        I’m a software developer. The size of the log directory for the project I’m working on at the moment is 5.73 GB. And I don’t even use the highest verbose level.

        • 0 avatar
          28-Cars-Later

          What is performance like? My last project the WPF client with WCF services thing sped up dramatically when I deleted the 5.7GB log file on the web server and configured Nlog to roll into five 10mg files in the future.

    • 0 avatar

      “For logging purposes, nvSRAM probably would have been a better choice”

      It does not provide enough memory. Log can be pretty big and you may need to store other stuff too. And it is too expensive too. SRAM is used in high speed applications, like e.g. SONET framers/routers etc.

  • avatar
    28-Cars-Later

    I find it curious the Model 3 really is only experiencing 25-30% depreciation in MY18, yet issues like this crop up.

  • avatar
    FreedMike

    OK, I’m not a computer engineer, and I don’t understand most of this, but apparently there are folks here who do, so…

    How do they fix this?

    • 0 avatar
      28-Cars-Later

      Ok so in programming you have different logging levels depending on what you are capturing in compiled code. Typically its something like ERROR, WARN, INFO, and DEBUG with debug being the most verbose (it will capture things like variable values in methods at time of execution etc).

      Typically a production site will run with ERROR or WARN level logging because the more verbose INFO or DEBUG affects site performance, and what you would do in the event of a prod issue is change the configuration so you can see what happens up until the point of failure. If performance were not as much of a concern (i.e. the server is overbuilt) and I flat out didn’t trust the users/the code I might enable the DEBUG logging with a rolling file appender on production (usually did 5 or 10 10 meg log files). The benefit of this is you have a look back of X time of everything the system did, so when an error occurs you have that window of activity to examine leading up to it. When you’re talking text files on a RAID array of spinning disks, this isn’t much of an issue. What’s happening to the Tesla is it is using flash memory for its storage, and flash memory has a finite amount of read and writes. So say its 100 mg for S&G, so Tesla’s logs are constantly refreshing every say 3 days because 100 mg is 3 days worth of logs for lookback. The flash memory can only accept say 100,000 reads and writes before it craps out. 3 days worth of reads/writes is 1,000 cycles. Do the math.

      • 0 avatar
        FreedMike

        Thanks.

        So is the flash memory unit tasked with retaining data permanently? Can you just swap it out without affecting anything?

        • 0 avatar
          Flipper35

          The way flash is built, as you swap ones and zeros it causes wear, just like using your brakes. Eventually the parts wear out and just like your brakes, heavy use will cause it to wear faster.

          Evidently, the units also have the code that actually runs the vehicle as well. Like the boot drive on your computer. Instead of a separate data drive they dump it all on the same unit.

          For $10 per car they could have had a place to drop the data flash storage unit in where it would be easily replaceable and not affect the “boot” drive.

        • 0 avatar
          28-Cars-Later

          I’m not sure if the intention is to keep the data, I would imagine its a limited look back as I described for troubleshooting purposes. However it is possible they are somehow uploading the data back to Tesla on an interval, which the more I think about it probably the ulterior if not real motive behind this.

          The article does not explicitly explain if the vehicles are still drive-able after the flash memory fails, IMO if they were designed correctly they should be but then people wouldn’t be complaining because it could just be an idiot light they ignore. The logging level should be user configurable but as I suspect Tesla is harvesting data so no dice. Therefore the system needs to be able to function without it or a longer lasting log storage system needs to be implemented such as a spinning disc (but this too will fail over time).

          • 0 avatar
            FreedMike

            So…not as easy as just swapping the thing out. Unfortunate.

            It’ll be interesting to see where this goes.

          • 0 avatar
            28-Cars-Later

            It sounds like that is the “fix” but evidently it is not easy to do and the replacement drive will also fail after a certain interval. Really what it comes down to is a poor design, should your logger fail it should not subsequently cause the product to fail. This may have been intentional but likely it was simply the result of let’s put everything on one flash device – or create a central point of failure.

          • 0 avatar
            vvk

            > The article does not explicitly explain if the vehicles are still drive-able after the flash
            > memory fails, IMO if they were designed correctly they should be

            The vehicles are still driveable. In vast majority of cases they can be charged, too. This is not a “bricking” issue like you would see if you “brick” your phone by screwing up the recovery partition.

        • 0 avatar
          vvk

          > So is the flash memory unit tasked with retaining data permanently? Can you just swap it out without affecting anything?

          Yes, that’s the “cheap” $400 fix. It requires disassembly of the center screen to gain physical access to the circuit board. People then de-solder the eMMC from the board and install a plug-in adapter that enables a generic eMMC chip to be plugged in. If this is done preemptively, you can copy the content of the old chip to the new, bigger capacity chip and then the car works normally when you put everything back together. If this happens after the eMMC dies, the only option is to go to Tesla (the “expensive” $1800 fix) in order to restore your car’s data on the new chip. They have a backup of it.

    • 0 avatar
      JimZ

      a big part of the problem is treating a car like a consumer electronics device. they run a lot of this stuff on Ubuntu and use shell scripts to configure things and manage them at run time. guessing they hired on a bunch of Linux geeks and that was the only thing they knew. When all you have is a hammer, every problem looks like a nail and all that.

      other car companies use a platform more suited to embedded applications, QNX being one of the big ones.

  • avatar
    Mike Beranek

    Wow, my buddy’s ’65 Buick Skylark has never had this problem. The flash memory in his car works just fine.

  • avatar
    jkross22

    I tried talking my cousin out of buying one of these a few months ago, but it’s what he wanted.

    This is really bad that Tesla cheaped out like this. We’ve seen them cheap out on materials and fit and finish, but the assumption (at least my assumption) was that all drivetrain/propulsion related electronics were solid.

    Huge black eye for Tesla. Their response is once again less than acceptable.

    • 0 avatar
      Vulpine

      And what does your cousin think of it? Clearly you’ve been biased from the beginning and this little episode is confirmation of your bias but your cousin may think completely differently and may never agree with you.

  • avatar
    Art Vandelay

    So I assume this is a confluence of issues. It looks like, per other articles there is no load leveling (trim) functionality on the drive. That combined with a high level of logging ensure the same sectors are written to over and over and frequently. Larger drives with trim implemented should Outlast the car. Not sure if they cheaper out on the memory…you want quality used in hard drives here…not the USB stick caliber stuff.

    Ideally though this is not a failure that should kill the car either way so I do question the implementation.


Back to TopLeave a Reply

You must be logged in to post a comment.

Recent Comments

  • Art Vandelay: So I assume this is a confluence of issues. It looks like, per other articles there is no load leveling...
  • Jeff S: I just bought my neighbor’s low mileage Lacrosse and I plan on not buying another vehicles for 10 years...
  • ThomasSchiffer: The experience shows that it comes down to maintenance and how the previous owner or owners treated...
  • Jeff S: @highdesertcat–Agree and I will add more automation and more overseas manufacturing. In the short term...
  • Jeff S: I remember back in the 80s when there were many turbocharged motors that my brother in law would let his...

New Car Research

Get a Free Dealer Quote

Staff

  • Contributors

  • Timothy Cain, Canada
  • Matthew Guy, Canada
  • Ronnie Schreiber, United States
  • Bozi Tatarevic, United States
  • Chris Tonn, United States
  • Corey Lewis, United States
  • Mark Baruth, United States
  • Moderators

  • Adam Tonge, United States
  • Corey Lewis, United States