13 hours debugging a segmentation fault in .NET Core on Raspberry Pi and the solution was…

Dev Tips


Debugging is a satisfying and special kind of hell. You really have to live it to understand it. When you’re deep into it you never know when it’ll be done. When you do finally escape it’s almost always a DOH! moment.

I spent an entire day debugging an issue and the solution ended up being a checkbox.

NOTE: If you get a third of the way through this blog post and already figured it out, well, poop on you. Where were you after lunch WHEN I NEEDED YOU?

I wanted to use a Raspberry Pi in a tech talk I’m doing tomorrow at a conference. I was going to show .NET Core 2.0 and ASP.NET running on a Raspberry Pi so I figured I’d start with Hello World. How hard could it be?

You’ll write and build a .NET app on Windows or Mac, then publish it to the Raspberry Pi. I’m using a preview build of the .NET Core 2.0 command line and SDK (CLI) I got from here.

C:raspberrypi> dotnet new console
C:raspberrypi> dotnet run
Hello World!
C:raspberrypi> dotnet publish -r linux-arm
Microsoft Build Engine version for .NET Core

raspberrypi1 -> C:raspberrypibinDebugnetcoreapp2.0linux-armraspberrypi.dll
raspberrypi1 -> C:raspberrypibinDebugnetcoreapp2.0linux-armpublish

Notice the simplified publish. You’ll get a folder for linux-arm in this example, but could also publish osx-x64, etc. You’ll want to take the files from the publish folder (not the folder above it) and move them to the Raspberry Pi. This is a self-contained application that targets ARM on Linux so after the prerequisites that’s all you need.

I grabbed a mini-SD card, headed over to https://www.raspberrypi.org/downloads/ and downloaded the latest Raspbian image. I used etcher.io – a lovely image burner for Windows, Mac, or Linux – and wrote the image to the SD Card. I booted up and got ready to install some prereqs. I’m only 15 min in at this point. Setting up a Raspberry Pi 2 or Raspberry Pi 3 is VERY smooth these days.

Here’s the prereqs for .NET Core 2 on Ubuntu or Debian/Raspbian. Install them from the terminal, natch.

sudo apt-get install libc6 libcurl3 libgcc1 libgssapi-krb5-2 libicu-dev liblttng-ust0 libssl-dev libstdc++6 libunwind8 libuuid1 zlib1g

I also added an FTP server and ran vncserver, so I’d have a few ways to talk to the Raspberry Pi. Yes, I could also SSH in but I have a spare monitor, and with that monitor plus VNC I didn’t see a need.

sudo apt-get pure-ftpd
vncserver

Then I fire up Filezilla – my preferred FTP client – and FTP the publish output folder from my dotnet publish above. I put the files in a folder off my ~Desktop.

Then from a terminal I

pi@raspberrypi:~/Desktop/helloworld $ chmod +x raspberrypi

(or whatever the name of your published “exe” is. It’ll be the name of your source folder/project with no extension. As this is a self-contained published app, again, all the .NET Core runtime stuff is in the same folder with the app.

pi@raspberrypi:~/Desktop/helloworld $ ./raspberrypi 
Segmentation fault

The crash was instant…not a pause and a crash, but it showed up as soon as I pressed enter. Shoot.

I ran “strace ./raspberrypi” and got this output. I figured maybe I missed one of the prerequisite libraries, and I just needed to see which one and apt-get it. I can see the ld.so.nohwcap error, but that’s a historical Debian-ism and more of a warning than a fatal.

strace on a bad exe in Linux

I used to be able to read straces 20 years ago but much like my Spanish, my skills are only good at Chipotle. I can see it just getting started loading libraries, seeking around in them, checking file status,  mapping files to memory, setting memory protection, then it all falls apart. Perhaps we tried to do something inappropriate with some memory that just got protected? We are dereferencing a null pointer.

Maybe you can read this and you already know what is going to happen! I did not.

I run it under gdb:

pi@raspberrypi:~/Desktop/WTFISTHISCRAP $ gdb ./raspberrypi 
GNU gdb (Raspbian 7.7.1+dfsg-5+rpi1) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
This GDB was configured as "arm-linux-gnueabihf".
"/home/pi/Desktop/helloworldWRONG/./raspberrypi1": not in executable format: File truncated
(gdb)

Ok, sick files?

I called Peter Marcu from the .NET team and we chatted about how he got it working and compared notes.

I was using a Raspberry Pi 2, he a Pi 3. Ok, I’ll try a 3. 30 minutes later, new SD card, new burn, new boot, pre-reqs, build, FTP, run, SAME RESULT – segfault.

Weird.

Maybe corruption? Here’s a thread about Corrupted Files on Raspbian Jesse 2017-07-05! That’s the version I have. OK, I’ll try the build of Raspbian from a week before.

30 minutes later, burn another SD card, new boot, pre-reqs, build, FTP, run, SAME RESULT – segfault.

BUT IT WORKS ON PETER’S MACHINE.

Weird.

Maybe a bad nuget.config? No.

Bad daily .NET build? No.

BUT IT WORKS ON PETER’S MACHINE.

Ok, I’ll try Ubuntu Mate for Raspberry Pi. TOTALLY different OS.

30 minutes later, burn another SD card, new boot, pre-reqs, build, FTP, run, SAME RESULT – segfault.

What’s the common thread here? Ok, I’ll try from another Windows machine.

SAME RESULT – segfault.

I call Peter back and we figure it’s gotta be prereqs…but the strace doesn’t show we’re even trying to load any interesting libraries. We fail FAST.

Ok, let’s get serious.

We both have Raspberry Pi 3s. Check.

What kind of SD card does he have? Sandisk? Ok,  I’ll use Sandisk. But disk corruption makes no sense at that level…because the OS booted!

What did he burn with? He used Win32diskimager and I used Etcher. Fine, I’ll bite.

30 minutes later, burn another SD card, new boot, pre-reqs, build, FTP, run, SAME RESULT – segfault.

He sends me HIS build of a HelloWorld and I FTP it over to the Pi. SAME RESULT – segfault.

Peter is freaking out. I’m deeply unhappy and considering quitting my job. My kids are going to sleep because it’s late.

I ask him what he’s FTPing with, and he says WinSCP. I use FileZilla, ok, I’ll try WinSCP.

WinSCP’s New Session dialog starts here:

SFTP is Default

I say, WAIT. Are you using SFTP or FTP? Peter says he’s using SFTP so I turn on SSH on the Raspberry Pi and SFTP into it with WinSCP and copy over my Hello World.

IT FREAKING WORKS. IMMEDIATELY.

Hello World on a Raspberry Pi

BUT WHY.

I make a folder called Good and a folder called BAD. I copy with FileZilla to BAD and with WinSCP to GOOD. Then I run a compare. Maybe some part of .NET Core got corrupted? Maybe a supporting native library?

pi@raspberrypi:~/Desktop $ diff --brief -r helloworld/ helloworldWRONG/
Files helloworld/raspberrypi1 and helloworldWRONG/raspberrypi1 differ

Wait, WHAT? The executable are different? One is 67,684 bytes and the bad one is 69,632 bytes.

Time for a  visual compare.

All the ODs are gone

At this point I saw it IMMEDIATELY.

0D is CR (13) and 0A is LF (11). I know this because I’m old and I’ve written printer drivers for printers that had both carriages and lines to feed. Why do YOU know this? Likely because you’ve transferred files between Unix and Windows once or thrice, perhaps with FTP or Git.

All the CRs are gone. From my binary file.

Why?

I went straight to settings in FileZilla:

Treat files without extensions as ASCII files

See it?

Treat files without extensions as ASCII files

That’s the default in FileZilla. To change files that are just chilling, minding their own business, as ASCII, and then just randomly strip out carriage returns. What could go wrong? And it doesn’t even look for CR LF pairs! No, it just looks for CRs and strips them. Classy.

In retrospect I should have used known this, but it wasn’t even the switch to SFTP, it was the switch to an FTP program with different defaults.

This bug/issue whatever burned my whole Monday. But, it’ll never burn another Monday, Dear Reader, because I’ve seen it before now.

FAIL FAST FAIL OFTEN my friends!

Why does experience matter? It means I’ve failed a lot in the past and it’s super useful if I remember those bugs because then next time this happens it’ll only burn a few minutes rather than a day.

Go forth and fail a lot, my loves.

Oh, and FTP sucks.


Sponsor: Thanks to Redgate! A third of teams don’t version control their database. Connect your database to your version control system with SQL Source Control and find out who made changes, what they did, and why. Learn more


















Source link

Leave a Reply