How do I process SDR data fast enough?

Status
Not open for further replies.

KE7IZL

Member
Joined
Aug 2, 2011
Messages
151
Location
Seattle, WA
I'm usin RTL_TCP as my software, which allows me to write my own program to control my RTL-SDR receiver, and to get the signal from it. My client software communicates with RTL_TCP by an ordinary TCP connection. It reads the data and does a small amount of processing on the IQ signal to convert it to a real value signal. To do this, it resamples the signal to twice the sample rate, then uses a small FIR lowpass filter on each channel to remove the upper copy of the lower frequency band, and then multiplies I-channel by a cosine wave at half the bandwidth frequency, and multiplies the Q-channel by a sinewave at half the bandwidth frequency. Then it adds these processed channels together, and saves them in a file. The result is a real-valued (not complex-valued) signal, at twice the sample rate as the input IQ signal. This output is then saved directly to a file.

Problem is the IQ-to-real signal conversion takes a bit too long. The sample rate of the SDR is 2.4 million samples per second. That's a LOT of data that has to be processed VERY fast, and the IQ conversion takes just a bit too long, which then lags the whole process.
 

boatbod

Member
Joined
Mar 3, 2007
Messages
3,338
Location
Talbot Co, MD
I'm usin RTL_TCP as my software, which allows me to write my own program to control my RTL-SDR receiver, and to get the signal from it. My client software communicates with RTL_TCP by an ordinary TCP connection. It reads the data and does a small amount of processing on the IQ signal to convert it to a real value signal. To do this, it resamples the signal to twice the sample rate, then uses a small FIR lowpass filter on each channel to remove the upper copy of the lower frequency band, and then multiplies I-channel by a cosine wave at half the bandwidth frequency, and multiplies the Q-channel by a sinewave at half the bandwidth frequency. Then it adds these processed channels together, and saves them in a file. The result is a real-valued (not complex-valued) signal, at twice the sample rate as the input IQ signal. This output is then saved directly to a file.

Problem is the IQ-to-real signal conversion takes a bit too long. The sample rate of the SDR is 2.4 million samples per second. That's a LOT of data that has to be processed VERY fast, and the IQ conversion takes just a bit too long, which then lags the whole process.

Faster processor?
More efficient language?

It's hard to kow what to say until you let us know what hardware you are running and what language or tools you are using to do the signal processing.
 

KE7IZL

Member
Joined
Aug 2, 2011
Messages
151
Location
Seattle, WA
Using rtl_tcp.exe as the server, I then wrote a client in VB6, using the MSWinsock ActiveX control for the network capabilities.

The main code to process the data is
Code:
Private Sub Process(ByRef WaveIn() As Byte, ByRef WaveOut() As Integer)
    Dim n As Long
    Dim Wave() As Single
    Dim WaveResampled() As Single
    Dim WaveFiltered() As Single
    
    'Set dimensions of arrays created in this function
    ReDim Wave(1, &HFFFF&)
    ReDim WaveResampled(1, &H1FFFF)
    ReDim WaveFiltered(1, &H1FFFF)


    'Convert unsigned byte values to single precision floating point values
    For n = 0 To &HFFFF&
        Wave(0, n) = (WaveIn(n * 2) - 128) / 128
        Wave(1, n) = (WaveIn(n * 2 + 1) - 128) / 128
    Next n
    
    'Resample to twice the sample rate
    For n = 0 To &HFFFF&
        WaveResampled(0, n * 2) = Wave(0, n)
        WaveResampled(1, n * 2) = Wave(1, n)
    Next n
    
    'Use FIR lowpass filters to remove upper mirror copy of spectrum
    'Then use IIR DC blocking filters to get rid of DC spike in spectrum
    For n = 0 To &H1FFFF
        WaveFiltered(0, n) = DCB_I(LPF2_I(LPF_I(WaveResampled(0, n))))
        WaveFiltered(1, n) = DCB_Q(LPF2_Q(LPF_Q(WaveResampled(1, n))))
    Next n
    
    'Multiply I and Q channels of the complex valued signal by cosine and sine waves
    'Then add results of the multiplication together to reconstruct real valued signal
    'This gives the full bandwidth signal in one channel
    'Then multiply by 32767 (hex value 7FFF) to put signal in range of Integer values (known in C as Short values)
    For n = 0 To &H1FFFF
        WaveOut(n) = (WaveFiltered(0, n) * Cos(Pi / 2 * n) + WaveFiltered(1, n) * Sin(Pi / 2 * n)) * &H7FFF
    Next n
End Sub

Pi is defined elsewhere in the code (it's a constant). The filters LPF_I, LPF2_I, LPF_Q, and LPF2_Q, are all instances of the class LPF. The code for that class is

Code:
Private Const K1 As Single = -0.1061
Private Const K3 As Single = 0.31831
Private Const K4 As Single = 0.5
Private Const K5 As Single = K3
Private Const K7 As Single = K1

Public Function Execute(ByVal Value As Single) As Single
    Static Inputs(1 To 7) As Single
    Inputs(1) = Inputs(2)
    Inputs(2) = Inputs(3)
    Inputs(3) = Inputs(4)
    Inputs(4) = Inputs(5)
    Inputs(5) = Inputs(6)
    Inputs(6) = Inputs(7)
    Inputs(7) = Value
    Execute = Inputs(1) * K1 + Inputs(3) * K3 + Inputs(4) * K4 + Inputs(5) * K5 + Inputs(7) * K7
End Function
Because Execute is defined as the default function for that class (using VB6's procedure attribute settings), it is called whenever the name of an instance of the class is called without the function's name.

Likewise, DCB_I and DCB_Q are instances of a class called DCBlock. The code for that class is as follows
Code:
Public Function Execute(ByVal Value As Single) As Single
    Static LastInput As Single
    Static LastOutput As Single
    Execute = Value - LastInput + LastOutput * 0.9999
    LastInput = Value
    LastOutput = Execute
End Function
 
Last edited:

jonwienke

More Info Coming Soon!
Joined
Jul 18, 2014
Messages
13,416
Location
VA
Floating-point calculations are much slower than integer. Rewrite your code to use integers. Even 32-bit integer calculations will be much faster.
 

boatbod

Member
Joined
Mar 3, 2007
Messages
3,338
Location
Talbot Co, MD
D

DaveNF2G

Guest
Sounds like you're trying to reinvent the wheel.

There are numerous programs already available that do all that and more.
 

KE7IZL

Member
Joined
Aug 2, 2011
Messages
151
Location
Seattle, WA
That and VB6 is typically compiled to p-code then interpreted so it's not going to be as fast as compiled languages such as C/C++. There's also the issue of the .NET overhead.
https://social.msdn.microsoft.com/F...y-vb-is-so-much-slower-than-c?forum=vbgeneral


VB6 doesn't use .net framework. VB.net (like VB2008, VB2012, etc) uses .net but VB6 doesn't. VB6 is the last version of the "classic" Visual Basic, and has no .net dependencies, because .net didn't even exist when VB6 came out.

VB6 only uses p-code in the interpreter (running the program in the VB6 IDE itself). However, the default for compiled EXE files is native code, etc. My problem was that even the compiled EXE file, even with all optimizations enabled (removed all integer overflow checks, removed all floating point error checks, removed array bounds checks, etc), it still ran too slow to keep up with the sample rate of the SDR (2.4 million samples per second).

As of right now I don't know why it has this problem.
 

slicerwizard

Member
Joined
Sep 19, 2002
Messages
7,643
Location
Toronto, Ontario
First off, I'm curious why you need to convert I/Q data to real, since I/Q is generally more versatile.


As to the problems:

- using floating point

- actually zero-stuffing, then passing those zeroes to a function that applies expensive floating point multiplies to those zeroes; many cycles burned just to produce 0.0 * tap value = 0.0

- multiple functions called for every I and Q value; so function call overhead is through the roof


Execute = Inputs(1) * K1 + Inputs(3) * K3 + Inputs(4) * K4 + Inputs(5) * K5 + Inputs(7) * K7

Are inputs 2 and 6 skipped because the tap values are zero? Should be doing the same thing with the resampled input data...
 

KE7IZL

Member
Joined
Aug 2, 2011
Messages
151
Location
Seattle, WA
First off, I'm curious why you need to convert I/Q data to real, since I/Q is generally more versatile.


As to the problems:

- using floating point

- actually zero-stuffing, then passing those zeroes to a function that applies expensive floating point multiplies to those zeroes; many cycles burned just to produce 0.0 * tap value = 0.0

- multiple functions called for every I and Q value; so function call overhead is through the roof


Execute = Inputs(1) * K1 + Inputs(3) * K3 + Inputs(4) * K4 + Inputs(5) * K5 + Inputs(7) * K7

Are inputs 2 and 6 skipped because the tap values are zero? Should be doing the same thing with the resampled input data...

Yep, inputs 2 and 6 have have corresponding filter kernel values of 0. I already skip every other entry in the array of resampled data. Check out my code again for doing that and you will see the use of n*2. I don't waste CPU cycles by needlessly writing extra zeroes.
Code:
For n = 0 To &HFFFF&
WaveResampled(0, n * 2) = Wave(0, n)
WaveResampled(1, n * 2) = Wave(1, n)
Next n
 

jonwienke

More Info Coming Soon!
Joined
Jul 18, 2014
Messages
13,416
Location
VA
In your example, you are calculating n * 2 twice. A faster approach would be to declare a second variable (I'll call it n2), and then

For n = 0 To &HFFFF&
n2 = n * 2
WaveResampled(0, n2) = Wave(0, n)
WaveResampled(1, n2) = Wave(1, n)
Next n

As the number of references to (n * 2) increases in the loop, the speed advantage of this technique increases.

And again, using 32-bit signed integers for data values, rather than single-precision floating-point, will accelerate things significantly.
 

slicerwizard

Member
Joined
Sep 19, 2002
Messages
7,643
Location
Toronto, Ontario
I already skip every other entry in the array of resampled data. Check out my code again for doing that and you will see the use of n*2. I don't waste CPU cycles by needlessly writing extra zeroes.
Code:
For n = 0 To &HFFFF&
WaveResampled(0, n * 2) = Wave(0, n)
WaveResampled(1, n * 2) = Wave(1, n)
Next n
Looks like you pass all of the zeroed samples to LPF_I and LPF_Q.

For n = 0 To &H1FFFF
WaveFiltered(0, n) = DCB_I(LPF2_I(LPF_I(WaveResampled(0, n))))
WaveFiltered(1, n) = DCB_Q(LPF2_Q(LPF_Q(WaveResampled(1, n))))
Next n


In your example, you are calculating n * 2 twice. A faster approach would be to declare a second variable (I'll call it n2)
Any modern compiler will only calculate a common expression once.
 

KE7IZL

Member
Joined
Aug 2, 2011
Messages
151
Location
Seattle, WA
Looks like you pass all of the zeroed samples to LPF_I and LPF_Q.

For n = 0 To &H1FFFF
WaveFiltered(0, n) = DCB_I(LPF2_I(LPF_I(WaveResampled(0, n))))
WaveFiltered(1, n) = DCB_Q(LPF2_Q(LPF_Q(WaveResampled(1, n))))
Next n

Actually, while the values at those spots are initally zero, they get filled in later as the result of the filtering process. In fact, this filtering step is the second step of the whole resampling process. The zeroes absolutely must get filled in in this step.
 

jonwienke

More Info Coming Soon!
Joined
Jul 18, 2014
Messages
13,416
Location
VA
Any modern compiler will only calculate a common expression once.

Relying on the compiler to optimize your code for you is sloppy. There are limits to how far the compiler can optimize your code for you, as evidenced by the existence of this thread.
 

KE7IZL

Member
Joined
Aug 2, 2011
Messages
151
Location
Seattle, WA
I think another problem with VB6 is that when it uses arrays, it actually uses objects called safearrays. These are created by a call to a Windows API function that then allocates the memory. The OS itself then manages the safearray objects, and any access to elements of the array requires that your program calls additional Windows API functions. You don't see those calls when writing the program, and it just looks like you are using ordinary C style arrays, but that's just because the Windows API calls are written into the compiled code behind the scenes.

Meanwhile in C and C++, arrays are not safearray objects managed by the system. They are simply pointers to an allocated portion of memory, and any use of C arrays involves directly copying data to or from those arrays. So no extra overhead of calling Windows API functions every time you access elements of the array when using C or C++. I think that may be the largest slowdown in VB6, more than using floating point math or anything else.
 

slicerwizard

Member
Joined
Sep 19, 2002
Messages
7,643
Location
Toronto, Ontario
Actually, while the values at those spots are initally zero, they get filled in later as the result of the filtering process. In fact, this filtering step is the second step of the whole resampling process. The zeroes absolutely must get filled in in this step.
Any requirement for zero stuffing is caused by the code that follows the stuffing. If you wrote the filter functions to require pre-stuffing, that's on you and it leads to twice as much work as required being done in the first filtering stage. That's a significant hit.


Relying on the compiler to optimize your code for you is sloppy.
Hardly. Modern compiler optimizations are well defined and your change does nothing for efficiency or readability.


There are limits to how far the compiler can optimize your code for you, as evidenced by the existence of this thread.
Well then you'd better do all of the optimizing yourself. Now reorder all of your code to ensure the CPU execution pipelines stay full. Any pipeline stalls are on you.


I think another problem with VB6 is that when it uses arrays, it actually uses objects called safearrays. These are created by a call to a Windows API function that then allocates the memory. The OS itself then manages the safearray objects, and any access to elements of the array requires that your program calls additional Windows API functions. You don't see those calls when writing the program, and it just looks like you are using ordinary C style arrays, but that's just because the Windows API calls are written into the compiled code behind the scenes.
If that's the case, VB6 is a toy language and you're boned until you switch to something sane. But you've already figured that out.

BTW, still curious why you're converting I/Q to real. Any clues?
 

KE7IZL

Member
Joined
Aug 2, 2011
Messages
151
Location
Seattle, WA
Any requirement for zero stuffing is caused by the code that follows the stuffing. If you wrote the filter functions to require pre-stuffing, that's on you and it leads to twice as much work as required being done in the first filtering stage. That's a significant hit.



Hardly. Modern compiler optimizations are well defined and your change does nothing for efficiency or readability.


Well then you'd better do all of the optimizing yourself. Now reorder all of your code to ensure the CPU execution pipelines stay full. Any pipeline stalls are on you.



If that's the case, VB6 is a toy language and you're boned until you switch to something sane. But you've already figured that out.

BTW, still curious why you're converting I/Q to real. Any clues?



Unless you are prepared to do some serious FFT work, then you absolutely need real values. All my processing is in the time domain, which (unless you plan to convert to the frequency domain) is a domain easiest to work with if it's purely real. I can easily write up some IIR or FIR (convolution) filters that work on time domain signals, but I've never had any success coding my own FFT. I've done some pure DFT stuff (not speed optimized, so it would be incorrect to to call it FFT), but that's just too slow for real-time signal processing. So I'm sticking to purely time-domain signal processing. For example, there's no such thing as a complex-value-time-domain lowpass filter, but there is a complex-value-frequency-domain lowpass filter, as well as a real-value-time-domain lowpass filter. So in order to perform any kernel/convolution/FIR filtering, or even any IIR filtering, on a time domain signal, it absolutely MUST be using real-value-time-domain. The RTL-SDR units only output complex-value-time-domain signals unfortunately. This means that my complex-to-real conversion is MANDATORY. I can do no further processing without this conversion being done first.

By the way, I can't even begin to figure out how to write the code to calculate an a true FFT (as opposed to the very easy to code DFT). I've looked at some of the online explanations of the math behind it, and I feel I'm taking a PHD-level physics course on quantum mechanics. So I just leave the website, feeling even more confused than before I visited it, and have vowed to NEVER waste may time learning how to write the code for an FFT. I'll leave that to the true signal processing experts/professionals.

As for VB6 being a toy language, it isn't really. Yes, it uses safearrays, but that simply allows it to check things like number and size of dimensions and other stuff when you use them. Without the arrays being represented by safearray objects, there is no way for VB6 to pop up an error box when you have attempted to access a cell to the array that's out of bounds. Safearrays aren't strictly for VB6 either. They are a Micorsoft thing, and the Windows APIs that use these objects can be called from C++. When used with the Windows API functions that operate on the safe arrays, they can be useful for writing code that checks to make sure you always read and write to the array within bounds so that errors can be given when you go out of bounds. That's as opposed to standard C and C++ arrays which are not actual objects, but just allocated memory regions, so are not safe, and so have no functions based around them that can be used to check if you are out of bounds. C and C++ arrays are dangerous and allow you to write all over your program's memory space if improperly used, causing hard crashes, or potentially allowing buffer overflows and other crap that might even allow a hacker to run carefully crafted code that exploits it (potentially allowing a hacker to cause your program to execute malware that further damages your computer or steals personal information).


The way that C and C++ does things may be faster, but it's easier for bugs to go completely unnoticed, except that every once in a while the program will do a hard crash without warning. Then debugging it can be quite difficult.

I use VB6 because of its "ease of use". Also it allows you to create a GUI based program. C and C++ are best used to write console applications, that have no visible window of their own displayed to the user (so no nice buttons or menus)

The only times I use MS Visual C++ are to compile a DLL file that can perform certain tasks at a greater speed than VB6 can, and then I call the DLL function(s) from within VB6 at the points in the code where this speed boost is needed. For everything else (making a program with a GUI, and writing 99% of the program's code) I use VB6.


By the way, I finally found out where the speed was REALLY taking a hit. It turns out there were 2 places. One was the way I was using classes, where each filter was an instance of a class. VB6's class function calling seems to be HUGELY slow. So instead I replace the classes with calls to standard functions, and just stored the internal state of instance of the function in a structure (known in C++ as a struct, or in VB6 as a User Defined Type).

Below is module's code that contains all of the functions and User Defined Types.
Code:
Private Const K1 As Double = -0.1061
Private Const K3 As Double = 0.31831
Private Const K4 As Double = 0.5
Private Const K5 As Double = K3
Private Const K7 As Double = K1

Public Type LPF_State
    Input1 As Double
    Input2 As Double
    Input3 As Double
    Input4 As Double
    Input5 As Double
    Input6 As Double
    Input7 As Double
End Type

Public Type DCB_State
    LastInput As Double
    LastOutput As Double
End Type

Public Function LPF(ByRef State As LPF_State, ByVal Value As Double) As Double
    With State
        .Input1 = .Input2
        .Input2 = .Input3
        .Input3 = .Input4
        .Input4 = .Input5
        .Input5 = .Input6
        .Input6 = .Input7
        .Input7 = Value
        LPF = .Input1 * K1 + .Input3 * K3 + .Input4 * K4 + .Input5 * K5 + .Input7 * K7
    End With
End Function

Public Function DCBlock(ByRef State As DCB_State, ByVal Value As Double) As Double
    With State
        DCBlock = Value - .LastInput + .LastOutput * 0.9999
        .LastInput = Value
        .LastOutput = DCBlock
    End With
End Function

I also optimized the use of Cos and Sin, since calling trig functions was another major hit to speed. Since there are only 4 possible values for a cosine wave and a sinewave, at half-bandwidth frequency and no phase-shift (the 4 values being 1,0,-1,0 for cosine, and 0,1,0,-1 for sine), I only need to create a 4 element array for Cosine and a 4 element array for Sine. And instead of the inputs being an angle, now the input is simply an integer index into the array.

Below is my new signal processing function with these optimizations.
Code:
Private Sub Process(ByRef WaveIn() As Byte, ByRef WaveOut() As Integer)
    Dim n As Long
    Dim i As Long
    
    For n = 0 To &HFFFF&
        Wave(0, n) = (WaveIn(n * 2) - 128) / 128
        Wave(1, n) = (WaveIn(n * 2 + 1) - 128) / 128
    Next n
    

    For n = 0 To &HFFFF&
        WaveResampled(0, n * 2) = Wave(0, n)
        WaveResampled(1, n * 2) = Wave(1, n)
    Next n
    
    For n = 0 To &H1FFFF
        WaveFiltered(0, n) = DCBlock(DCB_I, LPF(LPF2_I, LPF(LPF_I, WaveResampled(0, n))))
        WaveFiltered(1, n) = DCBlock(DCB_Q, LPF(LPF2_Q, LPF(LPF_Q, WaveResampled(1, n))))
    Next n

    For n = 0 To &H1FFFF
        i = n And 3
        WaveOut(n) = (WaveFiltered(0, n) * Cosine(i) + WaveFiltered(1, n) * Sine(i)) * &H7FFF
    Next n
End Sub

DCB_I and DCB_Q are variables of type DCB_State (one of the above mentioned user defined types)
LPF_I, LPF_Q, LPF2_I, and LPF2_Q are variables of type DCB_State (the other one of the above mentioned user defined types)

I've not stopped using floating point values in favor of integer values, and because this is done in VB6 I've not been able to switch to using faster unsafe arrays. However, the optimizations I have done are enough to keep the program from having an ever increasing lag. Now it runs at least as fast as the signal speed of 2.4 MSPS.

In the past, because the lag would increase the longer the program ran, the result was that after about a minute of processing, you'd have maybe processed only about 10 seconds of signal. This is no longer the case, which is REALLY GOOD. It now runs fast enough to keep up with the signal (not sure by how much though). I've got a couple ideas for further improving speed though, but none of them involve using integer values. There's a good reason for this. Integers don't support numbers that contain a fractional part (like 0.5 or 1.8), and values that are not integers routinely occur when performing filtering, which contains a division or multiplying by a non-integer number (check out the kernel values in the FIR filter and you will see that they all have an absolute value less than 1). Loss of the fractional part hurts the filter's functionality, and leaves artifacts in the processed signal.
 

slicerwizard

Member
Joined
Sep 19, 2002
Messages
7,643
Location
Toronto, Ontario
Unless you are prepared to do some serious FFT work, then you absolutely need real values. All my processing is in the time domain, which (unless you plan to convert to the frequency domain) is a domain easiest to work with if it's purely real. I can easily write up some IIR or FIR (convolution) filters that work on time domain signals, but I've never had any success coding my own FFT. I've done some pure DFT stuff (not speed optimized, so it would be incorrect to to call it FFT), but that's just too slow for real-time signal processing. So I'm sticking to purely time-domain signal processing. For example, there's no such thing as a complex-value-time-domain lowpass filter, but there is a complex-value-frequency-domain lowpass filter, as well as a real-value-time-domain lowpass filter. So in order to perform any kernel/convolution/FIR filtering, or even any IIR filtering, on a time domain signal, it absolutely MUST be using real-value-time-domain. The RTL-SDR units only output complex-value-time-domain signals unfortunately. This means that my complex-to-real conversion is MANDATORY. I can do no further processing without this conversion being done first.
How did you come to this conclusion? I ask because the Internet seems to think otherwise.

You're saying that this isn't possible?

I/Q stream ---> FIR filtering / downsampling ---> (repeat as required) ---> filtered/downsampled I/Q ---> audio demod

That would come as a shock to some parties - the author of rtl_fm for starters.


By the way, I can't even begin to figure out how to write the code to calculate an a true FFT (as opposed to the very easy to code DFT). I've looked at some of the online explanations of the math behind it, and I feel I'm taking a PHD-level physics course on quantum mechanics. So I just leave the website, feeling even more confused than before I visited it, and have vowed to NEVER waste may time learning how to write the code for an FFT. I'll leave that to the true signal processing experts/professionals.
Simple FFT source code: Pitch Shifting Using The Fourier Transform | Stephan Bernsee's Blog

Or just call an external FFT function: FFTW Home Page


As for VB6 being a toy language, it isn't really. Yes, it uses safearrays, but that simply allows it to check things like number and size of dimensions and other stuff when you use them. Without the arrays being represented by safearray objects, there is no way for VB6 to pop up an error box when you have attempted to access a cell to the array that's out of bounds. Safearrays aren't strictly for VB6 either. They are a Micorsoft thing, and the Windows APIs that use these objects can be called from C++. When used with the Windows API functions that operate on the safe arrays, they can be useful for writing code that checks to make sure you always read and write to the array within bounds so that errors can be given when you go out of bounds. That's as opposed to standard C and C++ arrays which are not actual objects, but just allocated memory regions, so are not safe, and so have no functions based around them that can be used to check if you are out of bounds. C and C++ arrays are dangerous and allow you to write all over your program's memory space if improperly used, causing hard crashes, or potentially allowing buffer overflows and other crap that might even allow a hacker to run carefully crafted code that exploits it (potentially allowing a hacker to cause your program to execute malware that further damages your computer or steals personal information).
That's on the coder. If one needs that level of handholding, fine, but you take the performance hit. Your SDR software isn't taking inputs that a hacker can use to force a buffer overflow. But your software is dealing with a high data rate. That makes VB a non-optimal choice.


The way that C and C++ does things may be faster, but it's easier for bugs to go completely unnoticed, except that every once in a while the program will do a hard crash without warning. Then debugging it can be quite difficult.
"May" be faster? :)

Again, that's on the coder. If one writes flaky code, then a managed language makes life easier, but you always pay a price.


I use VB6 because of its "ease of use". Also it allows you to create a GUI based program. C and C++ are best used to write console applications, that have no visible window of their own displayed to the user (so no nice buttons or menus)

The only times I use MS Visual C++ are to compile a DLL file that can perform certain tasks at a greater speed than VB6 can, and then I call the DLL function(s) from within VB6 at the points in the code where this speed boost is needed. For everything else (making a program with a GUI, and writing 99% of the program's code) I use VB6.
I think that this would be one of those times where an external C/C++ module would be beneficial. Then you can have the best of both worlds.


By the way, I finally found out where the speed was REALLY taking a hit. It turns out there were 2 places. One was the way I was using classes, where each filter was an instance of a class. VB6's class function calling seems to be HUGELY slow. So instead I replace the classes with calls to standard functions, and just stored the internal state of instance of the function in a structure (known in C++ as a struct, or in VB6 as a User Defined Type).

Below is module's code that contains all of the functions and User Defined Types.
Code:
Private Const K1 As Double = -0.1061
Private Const K3 As Double = 0.31831
Private Const K4 As Double = 0.5
Private Const K5 As Double = K3
Private Const K7 As Double = K1

Public Type LPF_State
    Input1 As Double
    Input2 As Double
    Input3 As Double
    Input4 As Double
    Input5 As Double
    Input6 As Double
    Input7 As Double
End Type

Public Type DCB_State
    LastInput As Double
    LastOutput As Double
End Type

Public Function LPF(ByRef State As LPF_State, ByVal Value As Double) As Double
    With State
        .Input1 = .Input2
        .Input2 = .Input3
        .Input3 = .Input4
        .Input4 = .Input5
        .Input5 = .Input6
        .Input6 = .Input7
        .Input7 = Value
        LPF = .Input1 * K1 + .Input3 * K3 + .Input4 * K4 + .Input5 * K5 + .Input7 * K7
    End With
End Function

Public Function DCBlock(ByRef State As DCB_State, ByVal Value As Double) As Double
    With State
        DCBlock = Value - .LastInput + .LastOutput * 0.9999
        .LastInput = Value
        .LastOutput = DCBlock
    End With
End Function
I also optimized the use of Cos and Sin, since calling trig functions was another major hit to speed. Since there are only 4 possible values for a cosine wave and a sinewave, at half-bandwidth frequency and no phase-shift (the 4 values being 1,0,-1,0 for cosine, and 0,1,0,-1 for sine), I only need to create a 4 element array for Cosine and a 4 element array for Sine. And instead of the inputs being an angle, now the input is simply an integer index into the array.

Below is my new signal processing function with these optimizations.
Code:
Private Sub Process(ByRef WaveIn() As Byte, ByRef WaveOut() As Integer)
    Dim n As Long
    Dim i As Long
    
    For n = 0 To &HFFFF&
        Wave(0, n) = (WaveIn(n * 2) - 128) / 128
        Wave(1, n) = (WaveIn(n * 2 + 1) - 128) / 128
    Next n
    

    For n = 0 To &HFFFF&
        WaveResampled(0, n * 2) = Wave(0, n)
        WaveResampled(1, n * 2) = Wave(1, n)
    Next n
    
    For n = 0 To &H1FFFF
        WaveFiltered(0, n) = DCBlock(DCB_I, LPF(LPF2_I, LPF(LPF_I, WaveResampled(0, n))))
        WaveFiltered(1, n) = DCBlock(DCB_Q, LPF(LPF2_Q, LPF(LPF_Q, WaveResampled(1, n))))
    Next n

    For n = 0 To &H1FFFF
        i = n And 3
        WaveOut(n) = (WaveFiltered(0, n) * Cosine(i) + WaveFiltered(1, n) * Sine(i)) * &H7FFF
    Next n
End Sub
DCB_I and DCB_Q are variables of type DCB_State (one of the above mentioned user defined types)
LPF_I, LPF_Q, LPF2_I, and LPF2_Q are variables of type DCB_State (the other one of the above mentioned user defined types)
Good to hear that things have improved.

BTW, the Wave array serves no purpose, other than to create extra data loads and stores.


I've not stopped using floating point values in favor of integer values, and because this is done in VB6 I've not been able to switch to using faster unsafe arrays. However, the optimizations I have done are enough to keep the program from having an ever increasing lag. Now it runs at least as fast as the signal speed of 2.4 MSPS.

In the past, because the lag would increase the longer the program ran, the result was that after about a minute of processing, you'd have maybe processed only about 10 seconds of signal. This is no longer the case, which is REALLY GOOD. It now runs fast enough to keep up with the signal (not sure by how much though).
Take a look at Task Manager and see how much of a core/hyperthread your process is consuming.


I've got a couple ideas for further improving speed though, but none of them involve using integer values. There's a good reason for this. Integers don't support numbers that contain a fractional part (like 0.5 or 1.8), and values that are not integers routinely occur when performing filtering, which contains a division or multiplying by a non-integer number (check out the kernel values in the FIR filter and you will see that they all have an absolute value less than 1). Loss of the fractional part hurts the filter's functionality, and leaves artifacts in the processed signal.
I see 10610, 31831 and 50000 - all perfectly representable as integers with no loss of precision. An x86 processor has many integer ALUs to throw at integer workloads, but not so with floating point. You're paying a very stiff price.
 

jonwienke

More Info Coming Soon!
Joined
Jul 18, 2014
Messages
13,416
Location
VA
A 32-bit integer with values scaled to ±2147483648 has more precision than a single-precision floating-point variable with values scaled to ±1, takes up the same amount of memory (32 bits), and can calculate much faster.
 

frazpo

Member
Joined
Jan 14, 2007
Messages
1,476
Location
SW Mo
A 32-bit integer with values scaled to ±2147483648 has more precision than a single-precision floating-point variable with values scaled to ±1, takes up the same amount of memory (32 bits), and can calculate much faster.

You seem seasoned in the computer and radio field. What is your field of work?
 
Last edited:
Status
Not open for further replies.
Top